Sams Teach Yourself XML in 24 Hours, Complete Starter Kit, 3rd Edition. Part 1 | WebReference

Sams Teach Yourself XML in 24 Hours, Complete Starter Kit, 3rd Edition. Part 1

current pageTo page 2To page 3To page 4

Files and Directories in Perl

Excerpted from Sams Teach Yourself Perl in 24 Hours, 3rd Edition by Clinton Pierce. ISBN 0672327937, Copyright © 2005. Used with the permission of Sams Publishing.


Files and Directories

What You'll Learn in This Hour

  • How to get a directory listing

  • How to create and remove files

  • How to create and remove directories

  • How to get information about files

Files in your operating system provide a convenient set of storage concepts for data. The OS enables a name to be given to the data (a filename) and provides an organizational structure, called a file system, so that you can find the data later. Your computer's file system then organizes files into groups called directories—sometimes called folders. These directories can store files or other directories.

This nesting of directories inside directories provides a treelike structure to the file system on your computer. Each file is part of a directory, and each directory is part of a parent directory. In addition to providing an organizational structure for your files, the operating system also stores data about the file: when the file was last read, when it was last modified, who created it, the current size of the file, and so on—called metadata (see Hour 5, "Working with Files"). This organization is true of almost all modern computer operating systems.

In the case of the Macintosh (pre-Mac OS X), this structure still holds true, except that the top-level directory is called a Volume, and the subdirectories area is called Folders. Perl allows you to access this structure, modify the organization, and examine the information about the files. The functions that Perl uses for these tasks are all derived from the Unix operating system, but they work just fine under whatever operating system Perl happens to be running on. Perl's file system manipulation functions are portable, meaning that if you use Perl's functions to manipulate your files and query them, you should have no problems running your code under any operating system Perl supports, providing that the directories are structured similarly.

Getting a Directory Listing

The first step in obtaining directory information from your system is to create a directory handle. A directory handle is something like a filehandle, except that instead of a file's contents, you read the contents of a directory through the directory handle. To open a directory handle, you use the opendir function:

opendir dirhandle, directory

In this syntax, dirhandle is the directory handle you want to open and directory is the name of the directory you want to read. If the directory handle cannot be opened—because you don't have permission to read the directory, the directory doesn't exist, or because of some other reason—the opendir function returns false. Directory handle variable names should be constructed similarly to filehandles— using the rules for variable names outlined in Hour 2, "Perl's Building Blocks:

Numbers and Strings"—and, like filehandles, they should be all uppercase to avoid conflicts with Perl's keywords. The following is an example:

opendir(TEMPDIR, ‘/tmp') || die "Cannot open /tmp: $!";

All the examples in this hour use forward slashes (/) in the Unix style because it is less confusing than the backslashes (\) used by Windows and MSDOS and works just as well with those operating systems as with Unix.

Now that the directory handle is open, you use the readdir function to read it:

readdir dirhandle;

In a scalar context, readdir returns the next entry in the directory, or undef if none are left. In a list context, readdir returns all the (remaining) directory entries. The names returned by readdir include files, directories, and (for Unix) special files; they are returned in no particular order. The directory entries . and .. (representing the current directory and its parent directory) are also returned by readdir. The directory entries returned by readdir do not include the pathname as part of the name returned.

The following example shows how to read a directory:

opendir(TEMP, ‘/tmp') || die "Cannot open /tmp: $!";
@FILES=readdir TEMP;

In this preceding snippet, the entire directory is read into @FILES. Most of the time, however, you're not interested in the . and .. files. To read the filehandle and eliminate those files, you can enter the following:

@FILES=grep(!/^\.\.?}$/, readdir TEMP);

The regular expression (/^\.\.?$/) matches a leading literal dot (or two) that is also at the end of the line, and grep eliminates them. To get all the files with a particular extension, you use the following:

@FILES=grep(/\.txt$/i, readdir TEMP);

The filenames returned by readdir do not contain the pathname used by opendir. Thus, the following example will probably not work:

opendir(TD, "/tmp") || die "Cannot open /tmp: $!";
while($file = readdir TD) {
    # The following is WRONG
    open(FILEH, $file) || die "Cannot open $file: $!\n";
    # Process the file here…

Unless you happen to be working in the /tmp directory when you run this code, the open(FILEH, $file) statement will fail. For example, if the file myfile.txt exists in /tmp, readdir returns myfile.txt. When you open myfile.txt, you actually need to open /tmp/myfile.txt using the full pathname. The corrected code is as follows:

opendir(TD, "/tmp") || die "Cannot open /tmp: $!";
while($file=readdir TD) {
    # Right!
    open(FILEH, "/tmp/$file") || die "Cannot open $file: $!\n";
    # Process the file here…


The other method of reading the names of files in a directory is called globbing. If you're familiar with the command prompt in MS-DOS, you know that the command dir *.txt prints a directory listing of all the files that end in .txt. In Unix, the globbing (sometimes called wildcard matching) is done by the shell, but ls *.txt has nearly the same result: The files whose names end in .txt are listed.

Perl has an operator for doing just this job; it's called glob. The syntax for glob is

glob pattern

where pattern is the filename pattern you want to match. The pattern can contain directory names and portions of filenames. In addition, the pattern can contain any of the special characters listed in Table 10.1. In a list context, glob returns all the files (and directories) that match the pattern. In a scalar context, the files are returned one at a time each time glob is queried.

Now check these examples of globbing:

# All of the .h files in /usr/include
my @hfiles=glob(‘/usr/include/*.h');
# Text or document files that contain 1999
my @curfiles=glob(‘*1999*.{txt,doc}')
# Printing a numbered list of filenames
while( $name=glob(‘*') ) {
   print "$count. $name\n";

An important difference between glob and opendir/readdir/closedir is that glob returns the pathname used in the pattern, whereas the opendir/readdir/closedir functions do not. For example, glob(‘/usr/include/*.h') returns ‘/usr/include' as part of any matches; readdir does not.

So which should you use? It's completely up to you. However, using the opendir/readdir/closedir functions tends to be a much more flexible solution and will be used in most of the examples throughout this book.

Perl offers an alternative way to write pattern globs. Simply placing the pattern inside the angle operator () makes the angle operator behave like glob:

@cfiles = ; # All files ending in .c

The syntax that uses the angle operator for globbing is older and can be confusing. In this book, I will continue to use the glob operator instead for clarity.

Created: March 27, 2003
Revised: February 3, 2006