Last Modified 1.0 new!T.David's Script Shop

Not only is this script freeware, but we show you how to build it from scratch!

A Directory Walker that adds the "Last Modified" Date to the bottom of each page
by TDavid @ http://www.tdscripts.com/

THE MISSION: You'd like to add the date and time a page was last modified to every html page on your site so that surfers will know when your page was last updated. Here are 3 possible solutions:

Solution #1: Use JavaScript to insert the date on each page
Pros: It will be dynamic in the sense that with every surfer to your page, they will get the current last modify date.
Cons: Not all browsers can run JavaScript and the surfer has the ability to disable JavaScript, thus disabling your code. Also you'll need to add the same code to every page that you want this displayed. Somewhat intrusive to your existing page design.

Solution #2: Use Server Side Includes (SSI) tags to insert the date dynamically on the page.
Pros: It's one line of code at the bottom of the page and very easy to do.
Cons: It requires you to parse the entire webpage for SSI which may require you to parse ALL your html pages (not a very wise thing to do if you don't need to do it) -- or worse, change the page name and need to redirect your traffic from .html extension pages to .shtml pages. Not all hosts allow SSI. Most intrusive to your existing pages.

Solution #3: Use a server side language like Perl or PHP to scan the pages and "find and replace" the info on each page.
Pros: You can walk the directory and change many pages with one execution of the script from the server side and it will work across all browsers and all platforms. Lease intrusive to your existing pages because the script can walk around your directory and change the files for you.
Cons: You have to either manually run the script or set a cron job to execute the script. You also need to set a list of pages to "not change" otherwise the script will blindly append the last modified file date to ALL html pages with the last modify date. You have to give the script permission to alter each of these files (chmod 666). If you run the script multiple times and the page hasn't changed then the script will reflect that there are changes.

The server side solution

As you can probably guess by the title of this article I'm going to show you how to create solution #3 and I'm going to use Perl to do this. It can be done using PHP as well, but setting up Crontab to run PHP script is a little more tricky than Perl. How to do that would likely be a good article for another day.

STEP 1. The first thing we need to do is to create our path to Perl. On most servers this will be:

#!/usr/bin/perl

STEP 2. Set the absolute path to the directory we are going to walk through. The absolute path is the one that starts at the root and works its way forward to where your public directory is. Do not use the trailing slash / at the end of the path. I'll show you why in a little bit. The directory permissions need to be set to 777 so the script has permission to read the directory structure in this directory.

$datadir = "/usr/home/public_html/tdscripts/testing_it";

STEP 3. Set the URL to the directory above so that you can view the changed pages when the script is done

$urlpath = 'http://www.tdscripts.com/testing_it';

STEP 4. Next we need to create an array of html pages NOT to change inside the directory defined above. This list will be short or long depending on your individual directory needs. Any page with the .html, shtml, phtml, or .htm that isn't listed here WILL be changed by the script.

@dontchange = ('testing.html', 'testhtm.htm', 'testp.phtml', 'tests.shtml');

STEP 5. Set the mime time so we can print to the browser

print "Content-type: text/html\n\n";

STEP 6. Now we need to open the data directory and fill an array every filename in this directory. You will note if we cannot open the directory we are referencing a subroutine called file_error. We will insert this subroutine at the very end of this script. We pass to this subroutine an argument explaining where we couldn't open. This is useful for debugging our script should something go wrong.

opendir(THEDIR, "$datadir") || &file_error("Could not open the datadir: <b>$datadir</b>");
@filenames = readdir(THEDIR);
closedir(THEDIR);

STEP 7. Now we need to walk through the @filenames array and match against the @dontchange criteria. If there is no match we call the final &file_change subroutine to actually change the filename. This is probably the most complex section of code so we'll take it one line at a time. The first line records the size of the @dontchange array. We need this size so we can create an inner loop to compare against each of the filenames in the directory and not try to open non-html files. The second line is a foreach statement to begin the iteratation through the list of @filenames array. This is the outside loop. The third line splits the extension from the filename by the period. If you have periods in your filename, then you'll need to either add a routine which reverses the order of the array or change the filenames so there are no periods in the title. The inner loop is line four which starts an $i index loop for the @dontchange array. On line 6 we are checking to make sure the extension is .html, htm, shtml or phtml. Feel free to add additional page extensions if they apply to you separated by the logical OR (the pipe | located on the keyboard above the \) symbol. Lines 7-20: if any of the names match the current name the $setflag variable is set and this means that the filechange will take place for this particular filename if it remains set to 1 (true). However, if the name is found in the @dontchage array then setflag is set to zero and the last statement is executed to break from the while loop and ignore future iterations.

$sizeof = $#dontchange;
foreach $name (@filenames) {
   @extension = split(/\./, $name);
   $setflag = 0;
   $i=0;
    while ($i < $sizeof) {
        if ($extension[1] =~ /(html|htm|shtml|phtml)/) {
             $setflag = 1;
             if($name eq $dontchange[$i]) {
                 $setflag = 0;
                 last;
             }
         }
         $i++;
     }
    if ($setflag == 1) {
       &file_change($name);
       $setflag = 0;
    }
}

STEP 8. Now let's create the file_change subroutine. We start by printing to the browser the file we are modifying. Then we open the actual page and get the date the page was last updated in timestamp since epoch using the perl stat function which is similar to the C function of the same name. We store the html page in the array named @contents.

sub file_change {
print "Modifying ... <a href=\"$urlpath/$_[0]\" target=\"new\"><b>$_[0]</b> [VIEW]</a><br>";
# go get the page and prepare to change it
open(FILEHANDLE, "<$datadir/$_[0]") || &file_error("Could not open the file: <b>$datadir/$_[0]</b>");
@contents = <FILEHANDLE>;
@info = stat FILEHANDLE;
@date = localtime($info[9]);
close(FILEHANDLE);

STEP 9. We take the @date array and split it out so we can format the file last modified time into something that looks nice on the page. In this case it is the format: 2-19-2001 18:54 Lastly, we start the $newhtml replacement string with the <!--swapstart --> comment. We need this so when we RE-update the page we can delete the old time. Without it, we'd just keep writing the date repeatedly on the bottom of the page.

$year = sprintf("%d-%d-%d\n",$date[4]+1,$date[3],$date[5]+1900);
$time = "$date[2]:$date[1]";
$lastmodified = "$year $time";
$newhtml = '<!--swapstart -->';

STEP 10. This is where the user can customize what goes at the very bottom just before the </body> tag of each page. It is very important to leave the </body> on a line by itself directly after a carriage return so the script properly replaces it. If you want a banner or other HTML code down here you can have whatever you want, just make sure you leave the </body> at the very end directly after a carriage return. Where you want the date the page was last modified you use the $lastmodified variable. Remember to put a backslash in front of any double quotes or the script will bark at you :)

## CUSTOMIZATION REQUIRED! ##
# CHANGE the format below to what you want to swap out -- be sure to backslash any double quotes

$newhtml .= "<font face=\"Arial\"><small>Copyright 1999-2001 All Rights Reserved <em>Last Modified $lastmodified</em></font></small>
</body>";

## (end) CUSTOMIZATION ##

STEP 11. It's time for the regular expression that essentially finds and replaces the old last modified date/time format -- or if it is a new page it creates the code for the first time.

$sflag = 0;
foreach $html (@contents) {
   if($html =~ /<\!--swapstart -->/i) {
      $html =~ s/(<\!--swapstart -->.+)//i;
      $sflag = 1;
    }
    if($html =~ /<\/body>/) {
       $html =~ s/<\/body>/$newhtml/i;
       $sflag = 0;
    }
    if($sflag) { $html = ''; }
}

STEP 12. This writes the changed HTML page with the updated $lastmodified date at the bottom.

open(FILEHANDLE, ">$datadir/$_[0]") || &file_error("Could not open the datadir: <b>$datadir/$_[0]</b>");
print FILEHANDLE @contents;
close(FILEHANDLE);
}

STEP 13. Lastly, let's create the file_error subroutine. This will output any errors or problems so that you know which file was unsuccessfully changed. We need the mime line or the browser won't know how to render the ouput so that is line two. Line three simply echoes the argument that was passed to the subroutine and a <br> tag to wait for the other file errors, if applicable.

sub file_error {
print "$_[0] check that the permissions are 666 for files and 777 for the directory<p>";
}

STEP 14. Save the file as last_modified.cgi and upload in ASCII mode to your cgi-bin. If you want to check it against the source code at TD Scripts you can do so by clicking the links below. If you don't want to type it out, just click the download button and save it to your local file system!

SOURCE CODE
DEMO last_modified.cgi
DOWNLOAD

NOTE: Something to keep in mind, if you run this TWICE then every page will show the date you last ran the script, not necessarily when you last updated something on the page. This is because the script is rewriting the page. This isn't an error, this is actually pretty logical, if you think about it. So the idea with this script is you run it say once per week via cron. If the page hasn't actually changed during the week it will still show it has because the script rewrote the page date during the last execution of the script. If you use this script please put up a reciprocal link to TD Scripts by using a graphic or text link at http://www.tdscripts.com/contact.html

TDavid is co-owner, programmer and webmaster for several sites devoted to programming including his own http://www.tdscripts.com/ He has done custom programming in various programming languages for companies all over the world. Every Friday at 2pm PST you can catch his weekly radio show dedicated to the technical side of webmastering and programming at http://www.scriptschool.com/radio

HOME | Script School | php-scripts.com