Today I had to do a Perl program to poke inside an XML file and read out some values. I had a quick go with the CPAN module XML::Simple but found it to be horrible. Yuck. I ended up using XML::MyXML instead and I must say I’m much more impressed with it. I also wrote some test code just to be sure everything was ok and guarantee it doesn’t break if someone changes the XML. Here is a quick roundup of what I did.
This was the XML I had:
Param1
domain
Filetype1
machine1
machine1
machine2
/export/home/user/file1.tmp
Thursday, 29 January 2009 08:45:05 GMT
FLOWING
Only the attributes “Destinations” and “File_Names” are more complicated because they can contain multiple entries. Everything else is pretty straight forward, even using XML::Simple.
===== The Main Program =====
Here is the main file I did. It consists basically of opening the XML file, reading it and calling one of the functions that reads one of the attributes of the XML, in this case, get_filetype.
#!/usr/bin/perl -w
use strict;
use XML::MyXML qw(tidy_xml xml_to_object);
use XML::MyXML qw(:all);
require "readXML.pl";
sub main() {
my $file = "test.xml";
# this is needed because we read the whole file at once
my $holdTerminator = $/;
undef $/;
open(FILE, "<$file") || die "Could not open file: $1";
my $xml = ;
#print tidy_xml($xml); # uncomment to get a nice view of the XML
my $obj = xml_to_object($xml);
print get_filetype($obj) . "\n";
}
main();
===== The XML Reading =====
The next file is here I’ve abstracted all the functions that are reading values from the XML object we created earlier. It’s basically the XML reading part. I like the way XML::MyXML allows the reading of values by path. That’s cool. The first two sub-routines return arrays and the rest returns a string.
#!/usr/bin/perl -w
use strict;
use XML::MyXML qw(tidy_xml xml_to_object);
use XML::MyXML qw(:all);
# returns an array with the destinations
sub get_destinations {
my $obj = shift;
my @destinations;
foreach my $d ($obj->path('Mandatory_Info/Destinations/Destination')) {
push @destinations, $d->value;
}
return @destinations;
}
# returns an array with the filenames
sub get_filenames {
my $obj = shift;
my @filenames;
foreach my $d ($obj->path('Mandatory_Info/File_Names/File')) {
push @filenames, $d->value;
}
return @filenames;
}
sub get_filetype {
my $obj = shift;
return $obj->path('Mandatory_Info/File_Type')->value;
}
sub get_monitor {
my $obj = shift;
return $obj->path('Mandatory_Info/Monitor')->value;
}
sub get_domain {
my $obj = shift;
return $obj->path('Mandatory_Info/Domain')->value;
}
sub get_source {
my $obj = shift;
return $obj->path('Mandatory_Info/Source')->value;
}
sub get_submissiontime {
my $obj = shift;
return $obj->path('Mandatory_Info/Submission_Time')->value;
}
sub get_filestatus {
my $obj = shift;
return $obj->path('Optional_Info/File_Status')->value;
}
# needed to use a require when including this file
return 1;
===== Test Code and Data =====
And following is the test code I wrote. Not a big deal but it makes easy to check if the functions are working correctly. I choosed to embed a test XML in this file instead of reading it from a file.
#!/usr/bin/perl -w
use strict;
use XML::MyXML qw(tidy_xml xml_to_object);
use XML::MyXML qw(:all);
require "readXML.pl";
my $xml = "
Param1
domain
tmp
machine1
machine1
machine2
/export/home/user/file1.tmp
Thursday, 29 January 2009 08:45:05 GMT
FLOWING
";
my $obj = xml_to_object($xml);
run_tests($obj);
# Test sub-routine to assert that we are reading the correct
# values from the XML. Use only with $test_xml.
sub run_tests {
my $obj = shift;
my @destinations = get_destinations($obj);
assert ($destinations[0] , "machine1", "get_destinations");
assert ($destinations[1] , "machine2", "get_destinations");
my @file_names = get_filenames($obj);
assert ($file_names[0] , "/export/home/user/file1.tmp",
"get_filenames");
my $filetype = get_filetype($obj);
assert ($filetype , "tmp", "get_filetype");
my $monitor = get_monitor($obj);
assert ($monitor , "Param1", "get_monitor");
my $domain = get_domain($obj);
assert ($domain , "domain", "get_domain");
my $source = get_source($obj);
assert ($source , "machine1", "get_source");
my $submission_time = get_submissiontime($obj);
assert ($submission_time , "Thursday, 29 January 2009 08:45:05 GMT",
"get_submissiontime");
my $file_status = get_filestatus($obj);
assert ($file_status , "FLOWING", "get_filestatus");
}
# Test if the first two arguments are the same string
# Used in run_tests from testing
sub assert {
my $arg1 = shift;
my $arg2 = shift;
my $descr = shift;
if ($arg1 eq $arg2) {
print "PASSED\n";
return;
}
print $descr . "\tFAILED!!!\n";
return;
}
Hope it helps someone. It will surely help me the next time I try to do it. My Perl skills were all rusty.