Introduction to Perl's Taint Mode (1/2) | WebReference

Introduction to Perl's Taint Mode (1/2)

current pageTo page 2

Introduction to Perl's Taint Mode

By Dan Ragle

When writing scripts that only you will run on your standalone, password protected PC, security concerns aren't likely to be your highest priority. When writing scripts that others will run--especially if those others consist of folks you don't know (or don't trust)--it's essential that you design your scripts defensively; to ensure that to the best of your knowledge malicious users cannot use your script to damage the server, not to mention yours and/or the server owner's reputation. Since they are by design published and accessed (run) by individuals around the world, CGI scripts are a prime target for malicious users, who regularly scan target sites for the existence of scripts with known security holes that they can exploit to damage the script's host server. Worse, some holes are exploited such that the attacker manipulates the host server itself to install zombie applications and/or participate in Denial of Service attacks against another server elsewhere on the 'Net.

When writing Perl scripts, especially those that will be run as CGI scripts or on public servers (but even when writing scripts in general), it's important to be conscious of the potential ways that attackers can abuse your script to perform malicious actions. Fortunately, Perl provides a built-in mechanism--called Taint Mode--to help you become more aware of potential security problems in your own scripts. Taint Mode is the topic of this brief introductory article.

. >

What is Taint Mode?

In a nutshell, Taint Mode is a collection of specific restrictions in Perl that collectively help you to write safer scripts by forcing you to think more carefully about how your script uses information. Specifically, it will prevent you from using or relying on data that was provided from outside the script within any action that will in turn affect something else outside of your script--unless you take specific steps to "clean" that data, first. For example, the following script would be extremely risky and ill-advised:

# Here's what NOT to do
my $arg=shift;  # get parameter from command line
system($arg);   # and execute it as a system command

In this blatant example, it's easy to see the security problem: a malicious user could provide literally any system command they wanted and your script would blindly execute it. If this type of code were running as a CGI script, the end user would then have access to do anything they wanted on your server--provided the Web server user itself has access to do it. Typically, your Web server user will have restricted access to your system; but even then, are you willing to take this kind of chance?

The simple script above demonstrates the two parts of a potential security problem that Taint Mode checking is designed to help you prevent. First, data was received from outside the application; and second, that data was then used--as is--in a way that it could affect a process outside the application (in this case, a system call). While the security ramifications of the above example are obvious, other potential outside influences (such as the environment's PATH variable) are more subtle; and Perl's Taint Mode checking will help you to see these problems and plan for them in your code.

How Does Taint Mode Work?

You can activate Taint Mode in a Perl script by adding the -T command line switch to the Perl interpreter. In the above example, it would be:

#!/usr/opt/perl -T

Perl will also automatically activate Taint Mode on scripts that are run with differing effective group or user IDs; i.e., setuid or setgid scripts. And if you are using mod_perl, you can force your scripts to run with Taint Mode enabled via either setting the PerlTaintCheck parameter to On (mod_perl 1.x) or PerlSwitches to -T (mod_perl 2.x). However you do it, it's generally a good idea to force Taint Mode in any script that you run on a server, and especially so in a script that will run as a CGI script through a Web server.

Once Taint Mode is activated, the script will internally mark any value that is retrieved from a source external to the script as tainted. (In various Perl documents, such variables might also be called dirty or contaminated.) Taintedness is applied to individual scalar values; thus it's possible that some entries of a hash are tainted while others are not. Ditto with arrays. Additionally, any variable set from an expression that relies on a tainted variable will itself be tainted; whether it was originally tainted or not. The rule is pretty straightforward: anything a tainted variable comes in contact with itself becomes tainted:

# $arg will be tainted since it is 
# retrieved from outside the script
my $arg = shift;
# $foo will now be tainted too, since it depends on $arg
my $foo = substr($arg,0,2);

Tainted variables cannot be used in expressions that will have an effect outside the script; if you attempt to do so, the script will croak with a message similar to the following:

Insecure dependancy in x while running with -T switch

What are these expressions? Per the perlsec docs (perldoc perlsec):

Tainted data may not be used directly or indirectly in any command that invokes a sub-shell, nor in any command that modifies files, directories, or processes...

This includes many potential activities; and among the things you can't do with tainted values are use them with system, exec, or backtick calls (i.e., executing a system command and retrieving its output by enclosing it in backticks, such as:)

$ping_string = `ping $tainted_value`;

Opening a file for output using a tainted value is also a no-no:

open (MYFILE,">$tainted_value");

Again, each of these infractions will cause a fatal run-time error in your script while running in Taint Mode.

As a side, note that older versions of Perl would allow you to supply a command to system so long as you supplied its arguments individually and specifically (i.e., when you used the system LIST syntax); but later versions of Perl (beginning sometime after Perl 5.8) will forbid all system calls that include tainted data.

Using Tainted Data

In some respects I'm tempted to simply end this tutorial here; and leave you with the impression that you simply can't use tainted data in the ways described above. Many security experts will argue that you shouldn't be using commands that invoke the system shell in your CGI scripts at all, as the risk of potential tom-foolery is simply too great. And indeed, this is good advice for many of the scripts that you will write. But in the rare case where you must make use of a system call, or write data to a file the name of which you retrieve from outside your application, etc., Perl does provide a means by which you can untaint (or "clean," or "launder") your tainted variables.

Specifically, you can clean tainted data by running it through a regular expression (of your own definition) and then reassigning the data to the variable based on sub-pattern matches ($1, $2, etc.) from within the original value. (Another more obscure way to clean tainted values is to use them as a hash key; since hash keys themselves are never considered tainted. As we'll see in a moment, to avoid actually checking the variables defeats the purpose of using Taint Mode in the first place.) For example, let's say I'm retrieving from a Web form a value that is supposed to be alphanumeric (underscores are ok) to a maximum of 4 characters. I might untaint it like this:

use strict;
use CGI;
my $cgi    = CGI->new();
my $vendor = $cgi->param('vendor');
# $vendor is now tainted
if ($vendor =~ /^(\w{1,4})$/) {
   $vendor = $1;
   # vendor is now untainted 
else {
   # bad data; perform error processing here...

Perl explicitly lets you handle the sub-pattern matches of the expression without getting dirty; the resulting variables that are set from these matches are considered untainted.

Earlier I stated that Taint Mode helps you to write safer scripts by forcing you to think more carefully about how your script uses information, and this point now bears some elaboration. Notice that I did not say that Taint Mode, in and of itself, secures your scripts; or even that taint mode allows you to write secure scripts. Taint Mode's primary abilities are to force you to think about how you use your input, and protect you from accidentally doing something that you might not have realized was unsafe. In the above example, it's important to note that Perl will not check your regular expression to ensure that it really does clean your input data. For example, I could also have written this:

use strict;
use CGI;
my $cgi    = CGI->new();
my $vendor = $cgi->param('vendor');
# $vendor is now tainted
if ($vendor =~ /^(.*)$/) { $vendor = $1; }
# vendor is now untainted

which does nothing useful and is quite possibly more dangerous than not using Taint Mode at all (since you or another author may at a later time see the Taint Mode switch and assume--incorrectly--that the script is more secure than it would be otherwise.) Or, as the aforementioned perlsec document bluntly puts it: "The tainting mechanism is intended to prevent stupid mistakes, not to remove the need for thought." Even with Taint Mode enabled, you must carefully think through and test your regular expression filters to ensure that--as much as possible--only values that you are expecting are allowed to come through. The difference is that Taint Mode will remind you (and force you) to at least perform some scrubbing on variables if you haven't, as well as watch out for tainted variables that arise as the result of coming in contact with other tainted values that you may have accidentally missed.

Before we leave the topic of cleansing, note that removing potentially harmful characters from your variables via a replacement operation--while still a good idea--is not enough in and of itself to remove the taintedness of a variable:

my $arg = shift;  # $arg is tainted
# remove potential problem characters
# now $arg is STILL tainted

In some cases, you may wish to simply check a variable to see if it is tainted and then possibly refuse to use it, or replace it with a safe, default value, or whatever. You can use the tainted function from the Scalar::Util module to accomplish this:

use Scalar::Util qw(tainted);
my $arg = shift;
print "arg ".(tainted($arg)?"is ":"is not ")."tainted.\n";

Scalar::Util is included with the Perl distribution beginning with Perl 5.8.0.

It's important to note that variable values inputted from the command line (or a Web form) aren't the only possible outside influences on your Perl script. On the next page we consider some environmental issues where Perl tainting is concerned.

current pageTo page 2

Created: April 18, 2006
Revised: May 5, 2006