Free Online Articles Directory
18.11.2008 Sign In Register Hello Guest
Email:
Password:
Remember Me 
forgot your password?


Using Perl and Regular Expressions to Process Html Files - Part 2

Author: John Dixon Author Ranking Blue | Posted: 17-03-2008 | Comments: 0 | Views: 18 | Rating:  (53) Article Popularity - Blue (?) Got a Question? Ask.
Sign Up Now!

In this article we will discuss how to change the contents of an HTML file by running a Perl script on it.

The file we are going to process is called file1.htm:

Note: To ensure that the code is displayed correctly, in the example code shown in this article, square brackets '[..]' are used in HTML tags instead of angle brackets ''.

[html]
[head][title]Sample HTML File[/title]
[link rel="stylesheet" type="text/css" href="style.css"]
[/head]
[body]
[h1]Introduction[/h1]
[p]Welcome to the world of Perl and regular expressions[/p]
[h2]Programming Languages[/h2]
[table border="1" width="400"]
[tr][th colspan="2"]Programming Languages[/th][/tr]
[tr][td]Language[/td][td]Typical use[/td][/tr]
[tr][td]JavaScript[/td][td]Client-side scripts[/td][/tr]
[tr][td]Perl[/td][td]Processing HTML files[/td][/tr]
[tr][td]PHP[/td][td]Server-side scripts[/td][/tr]
[/table]
[h1]Summary[/h1]
[p]JavaScript, Perl, and PHP are all interpreted programming languages.[/p]
[/body]
[/html]

Imagine that we need to change both occurrences of [h1]heading[/h1] to [h1 class="big"]heading[/h1]. Not a big change and something that could be easily done manually or by doing a simple search and replace. But we're just getting started here.

To do this, we could use the following Perl script (script1.pl):

1 open (IN, "file1.htm");
2 open (OUT, ">new_file1.htm");
3 while ($line = [IN]) {
4 $line =~ s/[h1]/[h1 class="big"]/;
5 (print OUT $line);
6 }
7 close (IN);
8 close (OUT);

Note: You don't need to enter the line numbers. I've included them simply so that I can reference individual lines in the script.

Let's look at each line of the script.

Line 1
In this line file1.htm is opened so that it can be processed by the script. In order to process the file, Perl uses something called a filehandle, which provides a kind of link between the script and the operating system, containing information about the file that is being processed. I've called this "opening" filehandle 'IN', but I could have used anything within reason. Filehandles are normally in capitals.

Line 2
This line creates a new file called 'new_file1.htm', which is written to by using another filehandle, OUT. The '>' just before the filename indicates that the file will be written to.

Line 3
This line sets up a loop in which each line in file1.htm will be examined individually.

Line 4
This is the regular expression. It searches for one occurrence of [h1] on each line of file1.htm and, if it finds it, changes it to [h1 class="big"].

Looking at Line 4 in more detail:





  • $line - This is a variable that contains a line of text. It gets modified if the substitution is successful.




  • =~ is called the comparison operator.




  • s is the substitution operator.




  • [h1] is what needs to be substituted (replaced).




  • [h1 class="big"] is what [h1] has to be changed to.







Line 5
This line takes the contents of the $line variable and, via the OUT file handle, writes the line to new_file1.htm.

Line 6
This line closes the 'while' loop. The loop is repeated until all the lines in file1.htm have been examined.

Lines 7 and 8
These two lines close the two file handles that have been used in the script. If you missed off these two lines the script would still work, but it's good programming practice to close file handles, thus freeing up the file handle names so they can be used, for example, by another file.

Running the Script

As the purpose of this article is to explain how to use regular expressions to process HTML files, and not necessarily how to use Perl, I don't want to spend too long describing how to run Perl scripts. Suffice to say that you can run them in various ways, for example, from within a text editor such as TextPad, by double-clicking the perl script (script1.pl), or by running the script from an MS-DOS window.

(The location of the Perl interpreter will need to be in your PATH statement so that you can run Perl scripts from any location on your computer and not just from within the directory where the interpreter (perl.exe) itself is installed.)

So, to run our script we could open an MS-DOS window and navigate to the location where the script and the HTML file are located. To keep life simple I've assumed that these two files are in the same folder (or directory). The command to run the script is:

C:>perl script1.pl

If the script does work (and hopefully it will), a new file (new_file1.htm) is created in the same folder as file1.htm. If you open the file you'll see the the two lines that contained [h1] tags have been modified so that they now read [h1 class="big"].

In Part 3 we'll look at how to handle multiple files.

Rate this Article: Current: 0 / 5 stars - 0 vote(s).

Article Source: http://www.articlesbase.com/programming-articles/using-perl-and-regular-expressions-to-process-html-files-part-2-362029.html

Print this Article Print article   Email to a Friend Send to friend   Publish this Article on your Website Publish this Article   Send Author Feedback Author feedback  
John DixonAbout the Author:

John is a web developer working for My Health Questions Matter, a company dedicated to helping patients to get the most out of their interaction with health care professionals such as doctors, midwives, and consultants by generating a set of health questions a patient can ask at an appointment.

Submitting articles has become one of the most popular means to drive traffic to your website and promote yourself and your business. Join us today - It's Free!

Article Comments

Comment on this article Comment on this article
Your Name
Your Email:
Comment Body
Enter Validation Code: Captcha


Related Articles

Using Perl and Regular Expressions to Process Html Files - Part 1
By: John Dixon | 17/03/2008 | Programming
Like many web content authors, over the past few years I've had many occasions when I've needed to clean up a bunch of HTML files that have been generated by a word processor or publishing package. Initially, I used to clean up the files manually, opening each one in turn, and making the same set of updates to each one. This works fine when you only have a few files to fix, but when you have hundreds or even thousands to do, you can very quickly be looking at weeks or even months of work.

Pdf to Word: Text Flow Vs. Original Format
By: Intelligent Converters | 28/08/2007 | Software
This article is about tips and tricks of PDF to Word conversion

Black and White Photo Conversion
By: Peter Horner | 18/02/2006 | Technology
In this article I will share the technique I use that will help you create beautiful, striking and moody black and white images from your colour photographs.

Automatically Running a Script From Within a Web Page
By: John Dixon | 12/03/2008 | Web Design
It is sometimes useful to automatically run a script when a web page is opened.

Web Designing as a Profession
By: Sally | 18/07/2007 | Internet
Internet, technology of computer, is a boon to the 21st century. New Technologies, latest softwares are coming out daily to cop up with the increasing demand of the consumers. Nobody can even think of successful life without the help and assistance of with these technologies. Internet is the largest medium that provides you access to various website and in turn to the profession of web designing.

The Ultimate Solution for a Client Conversion System That You Can Learn About Today
By: Tracey Lawton | 18/05/2008 | Internet
One of the best time-saving strategies I've implemented is my client conversion system. You know the drill; a potential client comes along and emails/calls you asking about your services. You spend time responding, and then you never hear from them again! In this article I reveal three simple steps for creating your own client conversion system, so you only spend your time on the 'serious' clients.

Free Ipod Video Converter Tips
By: Mary Markell | 17/12/2006 | Software
An iPod video converter is a software that will allow you to quickly convert your computer media files (avi,mpeg etc) to a format supported by your iPod. This way you can download, transfer, convert and watch all kind of movies, TV shows and other video files on your iPod. There are a lot of free iPod converters out there and most of them include extra software that will allow you to download video before converting it and playing it in your iPod.

Checklist for the Tenant Selection Process
By: Don Conrad | 22/01/2007 | Real Estate
Things go a lot easier when you have a way to document the progress. This is especially true with rental property. The following checklist is designed to be used from the time you first meet with the tenant prospects and continue through the lease signing process. Proper use of this checklist will help to keep you on track as you evaluate and choose your tenant.

Got a Question? Ask.

Ask the community a question about this article:

Frequently Asked Questions

What need to make connection from dial up modem ...
By: yehya | 17-10-2008
what need to make connection from dial up modem that use fixed line to connect another device that use SIM card (this device act as modem but it use mobile SIM card with Fax/data services enabled ).?

Connection from dialup modem to another modem that use mobile SIM card?
By: yehya | 17-10-2008
what need to make connection from dial up modem that use fixed line to connect another device that use SIM card (this device act as modem but it use mobile SIM card with Fax/data services enabled ).?

I'm an Vietnamese citizen studing to Australia ...
By: Thuy | 17-10-2008
I'm an Vietnamese citizen studing to Australia. How i can apply US visa for travel?

I've been using Duofem pills for 2 months but I ...
By: sexxygal | 16-10-2008
I've been using Duofem pills for 2 months but I stopped, but when i was schedule to get my period the following month it didnt come and now I'm 9 days late, could I be pregnant?

Why do I have to use regular flour when making ...
By: aubydoby | 16-10-2008
Why do I have to use regular flour when making whole wheat bread?

Converting a queen frame to a double
By: Sue | 16-10-2008
My spare bedroom is small. I'd like to know if can I convert a queen matress frame to a double size? The wooden headboard and foot board are queen and my matress is a double. The frame hangs out 3-4 inches on each side and doesn't look good. Any ideas? Can I buy a hook type piece for the headboard and footboard and screw it into the existing frame? Is such a piece available?

Q&A Powered by:
Powered by Yedda 

Latest Programming Articles

Cool Desktop Wallpapers
By: Danny | 18/11/2008
Cool desktop wallpaper is accomplishments arrangement that displayed in the computer operating system. The wallpapers usually be acclimated in JPEG, BMP and GIF book formats. That wallpaper can be acclimated with Microsoft Windows, Linux and Macintosh Mac OS. Each adviser can be altered requirements, alike admitting wallpaper images advised for accepted monitors can be scaled up or bottomward to the fit size. Those are accessible on the internet for free. Some categories of wallpapers are a

Tips for Buying Software Online
By: Daniel Jowssey | 17/11/2008
Buying software online not only helps save the planet, it also has other benefits, including: * Ease and Simplicity. You can purchase software in your underwear at 4am if you really want to. Shopping online doesn’t have to be done within regular business hours, nor do you need to look your best to do it. It’s also easy to shop around for the best prices and takes less time than driving to the shops.

Mvc Design Pattern
By: TuVinhSoft .,JSC | 14/11/2008
Model-view-controller (MVC) is an architectural pattern used in software engineering. In complex computer applications that present a large amount of data to the user, a developer often wishes to separate data (model) and user interface (View) concerns, so that changes to the user interface will not affect data handling, and that the data can be reorganized without changing the user interface.

Advantages of Low Cost Contract Programmers in Freelance Programming
By: Joanna Gadel | 12/11/2008
It observed that web industry is getting tougher thus the necessity of freelance contract programmer is required for developing more effective website with flexible features. This article states the fruitful advantages of freelance programmers in contract programming.

A Guide to Cnc Kits
By: Martin Applebaum | 09/11/2008
CNC kits are a way in which to construct your CNC machine. This article will provide some information on these machines.

A Guide to Cnc Tube Bending Machines
By: Martin Applebaum | 08/11/2008
Are you familiar with a CNC tube bending machine? This article will shed some light on the main function and components of this machine.

Ways to Hire Dedicated Php Programmers
By: Jucick | 08/11/2008
It’s not at all easy to hire dedicated PHP programmers unless you know where and how to find them. Whether you need to fix, update or enhance your website you naturally want the job done quick and right.

Top 4 Reasons Why Addressing Web Accessibility is Important
By: Matt Cave | 05/11/2008
There are very high chances that web accessibility is more important to the performance of your web site than you realize. Article takes a look at the top 4 reasons why it would be important to address the issue of web accessibility.

More from John Dixon

Using Php to Populate a Drop Down List Box From a Mysql Database Table
By: John Dixon | 05/09/2008 | Web Design
Drop down list boxes provide a great way to enable visitors to your web site to select an item on a form. Normally, you hard code the items on the drop down list box - but what about if you want to get the items from a database table.

Using Php to Validate Form Fields
By: John Dixon | 24/07/2008 | Web Design
This article explains how to use PHP to validate data entered in form fields on a web page.

Finding Hidden Characters in a File
By: John Dixon | 27/06/2008 | Programming
It is sometimes necessary to find hidden characters within one or more files.

Web Site Promotion Tips
By: John Dixon | 20/06/2008 | SEO
When trying to get to the top of the search engine rankings there are certain things you should do, and other things you should not, in order to increase your chances of getting a top ten placement.

Exploiting Google Adsense
By: John Dixon | 08/04/2008 | Internet Marketing
Google Adsense provides a great way to generate revenue from a web site.

Using Perl and Regular Expressions to Process Html Files - Part 1
By: John Dixon | 17/03/2008 | Programming
Like many web content authors, over the past few years I've had many occasions when I've needed to clean up a bunch of HTML files that have been generated by a word processor or publishing package. Initially, I used to clean up the files manually, opening each one in turn, and making the same set of updates to each one. This works fine when you only have a few files to fix, but when you have hundreds or even thousands to do, you can very quickly be looking at weeks or even months of work.

Size Really Does Matter
By: John Dixon | 14/03/2008 | SEO
I believe that by following three basic rules, it is relatively easy to achieve a high ranking with the major search engines: 1. Add lots of relevant content; 2. Build up plenty of good quality inbound links; 3. Be patient.

Running a Cgi Script on a Web Server
By: John Dixon | 12/03/2008 | Web Design
Getting a CGI script to run properly on a web server is sometimes easier said than done. In this article I'll describe two versions of a Perl script - one that is designed to run locally on a computer, and a second that is designed to run on a webserver.

Article Categories





Give Feedback

Sign up for our email newsletter

Receive updates, enter your email below