Simple Comments Release Notes: v.920 (2/2) | WebReference

Simple Comments Release Notes: v.920 (2/2)

To page 1current page

Simple Comments: v.920

Comment Keys and Digest Hashes

As part of the development cycle for version .920 of Simple Comments, it became clear that we needed to store a "comment key" with each comment; a unique key that could be used to identify that particular comment within the system. Previously, we didn't store this key directly, but retrieved it dynamically as we read the comment (i.e., the comment key itself was based on the logical concatenation of several pieces of information from the comment, such as the article key, the user's name, the submitter IP address, etc.). This design was flawed, however. What happens, for example, if the administator chooses to edit the user name appearing on a comment? The comment key for that comment would then be different the next time it was read.

In previous versions of Simple Comments, this wasn't an issue; as the only time these comment keys were actually used was when comments were checked for duplicates as they were added to the live files (so the worst that could happen is a duplicate comment could be inadvertantly posted). With the new version, we need to reference a different comment from within each comment; and that reference needs to still be valid if the original comment changes. We need this as part of our new reply-to capabilities; i.e., when a user submits a comment in reply-to another comment, their submitted comment needs to store the comment key of the original comment so both comments can be properly sorted when they're displayed on the site.

Thus, we set out to add permanent comment keys to comments; i.e., now the comment key is added to the comment immediately when it is submitted; and it remains as it was originally generated throughout the life of the comment. Our original comment-key generation algorithm was adequate (using the article key in combination with multiple individual fields of the comment itself, including submit date), but it produced lengthy comment keys that were difficult to work with. Additionally, and perhaps more importantly, we now needed to be able to pass these keys into the comment display templates (since they would be needed in order to properly assign the reply-to information to submitted comments); and the comment key itself would therefore be visible within every displayed page. Since this comment key included some information that shouldn't be visible within the pages themselves (the submitting user's IP, for example), we set out to find a way to obscure them.

Enter Digest::MD5

A simple means to create a unique, and obscured representation of data is to generate an MD5 digest of the data; and in Perl, this is greatly simplified by using the Digest::MD5 module. A digest is similar to crypt, in that it generates somewhat of a one-way-encrypted representation of the data submitted to it. Unlike crypt, however, the MD5 algorithm utilizes the entire data string submitted to it in its processing, instead of only the first 8 characters.

Using Digest::MD5 in a Perl script is simple:

use strict;
use Digest::MD5 qw(md5_hex);
print "Digest of 'Dan is a lazy slob' is: ", md5_hex('Dan is a lazy slob'), "\n";
# above prints: 
# Digest of 'Dan is a lazy slob' is: b442aa6caec8af56f801b673075c2084

In the above example, I've used the procedural interface to Digest::MD5, (an object-oriented interface is also available), and I've specifically used the md5_hex function, which outputs the digest as a 32 byte hex string. You can also make use of an md5 function that outputs the digest as a straight 128 bit binary value; or md5_base64 that outputs the digest as--you guessed it--a Base64 encoded string.

Using the md5_hex routine proved to be a simple answer both to our lengthy comment keys and the obfuscation problem. We can apply md5_hex to the same data that we were formerly piecing together for our comment key, and it would always be reduced to a 32-character string. Additionally, none of the fields used to build the key can be ascertained from it.

The Catch

To implement the new keys, we first needed to write a routine that would add permanent keys to each of our existing comments in the system (and if you're upgrading from a previous version of Simple Comments, you'll need to run this routine once as soon as you deploy v .920. It's in the administration script, and instructions are in the README.txt of the distribution). This routine ran smoothly on almost all of our test servers; with one important exception.

In our Perl 5.6.1 test server, an interesting thing happened when we assigned comment keys to our existing comments: All of the comments ended up with the same key! Specifically, the digest created for every comment in the system was:


I recognized this as the digest that's created when you supply an empty string to the MD5 algorithm; i.e.:

# perl -MDigest::MD5 -e "print Digest::MD5::md5_hex(q()), qq(\n)"

Some quick testing assured me that I was in fact presenting valid (and unique) strings to the md5_hex routine, so why was I getting this digest value?

It turns out the older versions of Digest::MD5 (prior to 2.20, if I'm not mistaken) don't handle strings with utf8 data properly. And though I wasn't intentionally using utf8 data in my digest input, recall that XML::Parser retrieves data from external XML files flagged as utf8 by default (see the release notes for v .910 for further utf8 related tidbits). Therefore, the comment keys for all my existing comments (which were read in via XML::Simple and XML::Parser) were incorrectly produced as displayed above.

In some contexts, this could be considered a security risk; and the earlier versions of Digest::MD5 are listed as potential security problems in some online security notice repositories. The core of the problem lies in the fact that the MD5 algorithm was intended for use with strings of bytes; and not on strings with characters with ordinal values above 255. Later versions of Digest::MD5 correct the problem by generating an error if any true "wide characters" are encountered in the input string:

Wide character in subroutine entry

But earlier versions produce inconsistent digests as described above.

To correct the problem for the new version of Simple Comments, we do two things. First, any wide charactes in the comment data itself are converted to a non-wide character format (using entity versions of the wide characters) using our existing to_entities function in the module. This prevents the routine from failing with later versions of Digest::MD5 in the event that true wide character data is found in the input. Second, we force the removal of the utf8 flag on the resulting data input in a similar manner to untainting known, good data:

($comment_key =~ /^(.*)$/) && ($comment_key = $1);

(If you are unfamiliar with taint mode in Perl, take a look at our earlier primer on the subject. The above method is recommended only for data known to be safe by some other means; i.e., it was tested separately.) This prevents earlier Digest::MD5 implementations from presenting an inaccurate digest as the result of the string being marked as utf8.

These two fixes corrected the new comment key methodology in the v .920 release of Simple Comments. Unfortunately, the new comment keys was not the only place that we used Digest::MD5.

Passwords and Digests

In the previous version of Simple Comments (v .910) we were also using Digest::MD5 for the encoding of user passwords, for those implementations that were not using a Web server-based means of authenticating administrators for the administration script. Due to the problems described above, a potential security vulnerability was present when v .910 was deployed on Perl 5.6 servers using an older (pre v 2.20) version of Digest::MD5. Specifically, if the administrator innocently added the "empty" digest key into the users.xml file:


that user (or anyone else that knew--or could guess--that user's ID), could conceivably access the administration script using any password that contained a utf8 character. The chances of that vulnerability being exploited seem pretty rare; but nonetheless we've taken steps in v .920 to remove this risk. Specifically, we treat the input password in the same manner as the comment keys above; so it will always be handled properly in both the old and newer versions of Digest::MD5. Further, we don't allow any passwords that match the "empty" digest string presented above. I.E., you can still put those digests within your password file; but Simple Comments will refuse to authenticate against them (any user that attempts to provide the empty digest as their encoded password will be denied access, even if that is the digest stored in the users.xml file).


It's my hope that you will continue to find the Simple Comments script itself useful, and/or the developmental notes to helpful for your own Perl projects. Please feel free to contact me if you have any clarifications, suggestions or requests for improvements!

To page 1current page

Created: December 26, 2006
Revised: December 26, 2006