Unix-like operating systems provide an easy means of creating files from any program that has an output. Often, you won’t even need to worry about creating files, you’ll just redirect to a file and let the operating system handle it for you.
./show --exact --artist foreigner --format raw songs.txt > foreigner.txt
Because you can pipe directly from one program to another on the command line, you sometimes won’t even need to create files to store temporary data. If you want to count up how many songs Foreigner has in songs.txt, you can:
./show --exact --artist foreigner songs.txt | wc -l
Or, one of my favorites,
./show --exact --artist foreigner songs.txt | rev
But sometimes we do need to create our own files, and Perl makes this easy. Suppose we wanted to be able to create multiple files, perhaps one for each album, or one for each artist?
We can add a switch for this easily enough.
} elsif ($switch eq "export") {
$exportField = shift;
if (!grep(/^$exportField$/, @validFields)) {
print "\nI can only export by $validFields.\n\n";
help();
exit;
}
This switch is exactly like our sort switch. It accepts a valid field; if the user tries to export by something other than a valid field, the script will warn them and exit.
If the data is being sorted, we are going to have to wait until the end to export the files. So to make it easier, we’ll simply always wait until the end to export the files. This lets us re-use some of the code for sorting. Change:
if ($sortby) {
$matches[$#matches+1]{'text'} = $text;
$matches[$#matches]{'sort'} = $$sortby;
} else {
to:
if ($sortby || $exportField) {
$matches[$#matches+1]{'text'} = $text;
$matches[$#matches]{'sort'} = $$sortby if $sortby;
$matches[$#matches]{'file'} = $$exportField if $exportField;
} else {
The script will now remember the matches if either $sortby or $exportField has something in it. We only store the ‘sort’ association if $sortby has something in it, and we only store the ‘file’ association if $exportField has something in it. If $exportField is “album” and $album is “Head Games”, ‘file’ will associate with “Head Games” for this record.
So now we need to change the code that deals with @matches. Change this:
} elsif (@matches) {
@matches = sort byCustom @matches;
foreach $match (@matches) {
print $$match{'text'};
}
}
to:
} elsif (@matches) {
@matches = sort byCustom @matches if $sortby;
foreach $match (@matches) {
if ($exportField) {
$filename = $$match{'file'};
#open the file if we haven't already
if (!$files{$filename}) {
if (!open($files{$filename}, ">$filename")) {
print "Unable to open $filename: $!\n";
exit;
}
}
$filehandle = $files{$filename};
print $filehandle $$match{'text'};
} else {
print $$match{'text'};
}
}
#close all open files
foreach $filehandle (values %files) {
close($filehandle);
}
}
Note that in the second line we now only sort if $sortby has something in it. Otherwise, there’s nothing to sort on.
We’ve added a new section for “if ($exportField)”, so that if $exportField has something in it we will print to a file instead of to the “standard output” (usually the screen).
Before writing to a file, the file has to be “opened”. We need to get a “handle” on the file. Since we need to have a number of files opened it makes sense to store the file handles in an array. This script stores them in an associative array called %files, associating them with the filename.
Before opening the file with that filename, the script checks to see if there is already a handle associated with that filename in %files. The script only opens the file if there is not an existing handle associated with that filename.
If a file needs to be opened, the script opens it within an if, so that if there’s an error opening the file it can print an error and exit. Perl always stores the most recent error in a special variable called “$!”. So, if there’s a problem opening $filename, we have the script print “Unable to open $filename” and then “$!”. The error message is often very useful. For example, if you don’t have permission to open a file, the error message will say this.
The important new part is “open($files{$filename}, ">$filename")”. The open subroutine accepts two parameters. The first is the variable where we want to store the handle to the file. The second is the name of (or path to) the file we want to open. If we want to be able to write to the file, we need to prepend a greater than symbol to the filename. (We can also append to files by prepending two greater than symbols to the filename.)
So, if the script can successfully open the file, we now have a handle to it in $files{$filename}. All that remains is to get it (with “$filehandle = $files{$filename}”) and print to it.
If you look at some of the previous print commands, they have multiple variables or multiple pieces of text, separated by commas. Print can accept any number of pieces of text, separated by commas. However, if the first variable is not separated by the rest of the variables or text by a comma, print assumes that this is a handle to a file, and redirects its output to that file handle.
That’s why there is only a space between $filehandle and $$match{'text'} in “print $filehandle $$match{'text'}”.
Finally, after looping through all matches, we grab every value out of %files—each of which is a file handle—and close that file. The phrase “values %files” is the same as “keys %files” except that it gets a simple array of %file’s values, rather than a simple array of %file’s keys.
Perl will close files for us automatically as soon as the script ends or exits. But I like to close them explicitly as soon as they are no longer needed. Otherwise they hang around, open, until the script ends. Here that’s not a big deal but later on we might alter this script and add more functionality at the end. If that functionality involves opening files too, we might run up against the operating system’s limit: most operating systems limit the number of files any one program can open.
Having done all of this, we can now grab, say, all albums by foreigner and create a separate file for each one:
./show --exact --artist foreigner --export album songs.txt
Of course, you’re going to want to make sure that no album has the same name as a file you don’t want to erase: every time Perl opens a file, it will happily erase an existing file with the same name. We’ll see if we can do something about that in the next section.
And, of course, add this to the help subroutine:
print "\t--export <$validFields>: export to files named after the specified field\n"
You probably don’t want to play around too much making export files. It will be very easy to create hundreds of files in your current directory. We’ll fix this next.
- Creating folders
- It’s easy enough to change directory when exporting files in order to ensure that the new files go into a specific folder, but if you’re using this as part of a cron job it will be easier if you can tell the script which folder you want the export files to go to.
- Replacing text
- If you play around with export now, you’ll find that some exports don’t work. Go ahead and try:
- Try to break it
- One of the most important skills to learn when you’re programming is learning how to break your scripts. You’ll want to do lots of tests with lots of different kinds of data, but tests can only find errors that you test on. You will also need to think about where will this break? and fix those errors before they happen. We’ve done a little of this already, without calling it that. This is why we put “open(…)” in an “if” statement and match it…
- Creating files: Timestamps
- Some data is time-sensitive. The file came in at a specific time, and you want the exported files to keep that timestamp. Under Unix, you can see a file’s last modified time using “ls -l”. If you look at songs.txt you’ll probably see that it was last modified on April 25, 2005. If you look at the export files you’ve created, their last modified time is today, or the day you exported them.
- The current script
- This script is getting pretty big, but we’re almost done with it. Here is how it stands so far.