Perl Reading XML


Today I had to do a Perl program to poke inside an XML file and read out some values. I had a quick go with the CPAN module XML::Simple but found it to be horrible. Yuck. I ended up using XML::MyXML instead and I must say I’m much more impressed with it. I also wrote some test code just to be sure everything was ok and guarantee it doesn’t break if someone changes the XML. Here is a quick roundup of what I did.

This was the XML I had:

Param1
domain
Filetype1
machine1

machine1
machine2

/export/home/user/file1.tmp

Thursday, 29 January 2009 08:45:05 GMT

FLOWING

Only the attributes “Destinations” and “File_Names” are more complicated because they can contain multiple entries. Everything else is pretty straight forward, even using XML::Simple.

===== The Main Program =====

Here is the main file I did. It consists basically of opening the XML file, reading it and calling one of the functions that reads one of the attributes of the XML, in this case, get_filetype.


#!/usr/bin/perl -w

use strict;
use XML::MyXML qw(tidy_xml xml_to_object);
use XML::MyXML qw(:all);

require "readXML.pl";

sub main() {
my $file = "test.xml";

# this is needed because we read the whole file at once
my $holdTerminator = $/;
undef $/;

open(FILE, "<$file") || die "Could not open file: $1";
my $xml = ;
#print tidy_xml($xml); # uncomment to get a nice view of the XML
my $obj = xml_to_object($xml);
print get_filetype($obj) . "\n";
}

main();

===== The XML Reading =====

The next file is here I’ve abstracted all the functions that are reading values from the XML object we created earlier. It’s basically the XML reading part. I like the way XML::MyXML allows the reading of values by path. That’s cool. The first two sub-routines return arrays and the rest returns a string.


#!/usr/bin/perl -w

use strict;
use XML::MyXML qw(tidy_xml xml_to_object);
use XML::MyXML qw(:all);

# returns an array with the destinations
sub get_destinations {
my $obj = shift;
my @destinations;

foreach my $d ($obj->path('Mandatory_Info/Destinations/Destination')) {
push @destinations, $d->value;
}

return @destinations;
}

# returns an array with the filenames
sub get_filenames {
my $obj = shift;
my @filenames;

foreach my $d ($obj->path('Mandatory_Info/File_Names/File')) {
push @filenames, $d->value;
}

return @filenames;
}
sub get_filetype {
my $obj = shift;
return $obj->path('Mandatory_Info/File_Type')->value;
}

sub get_monitor {
my $obj = shift;
return $obj->path('Mandatory_Info/Monitor')->value;
}

sub get_domain {
my $obj = shift;
return $obj->path('Mandatory_Info/Domain')->value;
}

sub get_source {
my $obj = shift;
return $obj->path('Mandatory_Info/Source')->value;
}

sub get_submissiontime {
my $obj = shift;
return $obj->path('Mandatory_Info/Submission_Time')->value;
}

sub get_filestatus {
my $obj = shift;
return $obj->path('Optional_Info/File_Status')->value;
}

# needed to use a require when including this file
return 1;

===== Test Code and Data =====

And following is the test code I wrote. Not a big deal but it makes easy to check if the functions are working correctly. I choosed to embed a test XML in this file instead of reading it from a file.


#!/usr/bin/perl -w

use strict;
use XML::MyXML qw(tidy_xml xml_to_object);
use XML::MyXML qw(:all);

require "readXML.pl";

my $xml = "

Param1
domain
tmp
machine1

machine1
machine2

/export/home/user/file1.tmp

Thursday, 29 January 2009 08:45:05 GMT

FLOWING

";

my $obj = xml_to_object($xml);
run_tests($obj);

# Test sub-routine to assert that we are reading the correct
# values from the XML. Use only with $test_xml.
sub run_tests {
my $obj = shift;
my @destinations = get_destinations($obj);

assert ($destinations[0] , "machine1", "get_destinations");
assert ($destinations[1] , "machine2", "get_destinations");

my @file_names = get_filenames($obj);

assert ($file_names[0] , "/export/home/user/file1.tmp",
"get_filenames");

my $filetype = get_filetype($obj);
assert ($filetype , "tmp", "get_filetype");

my $monitor = get_monitor($obj);
assert ($monitor , "Param1", "get_monitor");

my $domain = get_domain($obj);
assert ($domain , "domain", "get_domain");

my $source = get_source($obj);
assert ($source , "machine1", "get_source");

my $submission_time = get_submissiontime($obj);
assert ($submission_time , "Thursday, 29 January 2009 08:45:05 GMT",
"get_submissiontime");

my $file_status = get_filestatus($obj);
assert ($file_status , "FLOWING", "get_filestatus");
}

# Test if the first two arguments are the same string
# Used in run_tests from testing
sub assert {
my $arg1 = shift;
my $arg2 = shift;
my $descr = shift;

if ($arg1 eq $arg2) {
print "PASSED\n";
return;
}

print $descr . "\tFAILED!!!\n";
return;
}

Hope it helps someone. It will surely help me the next time I try to do it. My Perl skills were all rusty.