Perl Reading XML

Today I had to do a Perl program to poke inside an XML file and read out some values. I had a quick go with the CPAN module XML::Simple but found it to be horrible. Yuck. I ended up using XML::MyXML instead and I must say I’m much more impressed with it. I also wrote some test code just to be sure everything was ok and guarantee it doesn’t break if someone changes the XML. Here is a quick roundup of what I did.

This was the XML I had:




    Param1
    domain
    Filetype1
    machine1

      machine1
      machine2


      /export/home/user/file1.tmp

    Thursday, 29 January 2009 08:45:05 GMT


  FLOWING


Only the attributes “Destinations” and “File_Names” are more complicated because they can contain multiple entries. Everything else is pretty straight forward, even using XML::Simple.

The Main Program

Here is the main file I did. It consists basically of opening the XML file, reading it and calling one of the functions that reads one of the attributes of the XML, in this case, get_filetype.

#!/usr/bin/perl -w

use strict;
use XML::MyXML qw(tidy_xml xml_to_object);
use XML::MyXML qw(:all);

require "readXML.pl";

sub main() {
    my $file = "test.xml";

    # this is needed because we read the whole file at once
    my $holdTerminator = $/;
    undef $/;

    open(FILE, "<$file") || die "Could not open file: $1";
    my $xml = ;
    #print tidy_xml($xml); # uncomment to get a nice view of the XML
    my $obj = xml_to_object($xml);
    print get_filetype($obj) . "\n";
}

main();

The XML Reading

The next file is here I’ve abstracted all the functions that are reading values from the XML object we created earlier. It’s basically the XML reading part. I like the way XML::MyXML allows the reading of values by path. That’s cool. The first two sub-routines return arrays and the rest returns a string.

#!/usr/bin/perl -w

use strict;
use XML::MyXML qw(tidy_xml xml_to_object);
use XML::MyXML qw(:all);

# returns an array with the destinations
sub get_destinations {
    my $obj = shift;
    my @destinations;

    foreach my $d ($obj->path('Mandatory_Info/Destinations/Destination')) {
        push @destinations, $d->value;
    }

    return @destinations;
}

# returns an array with the filenames
sub get_filenames {
    my $obj = shift;
    my @filenames;

    foreach my $d ($obj->path('Mandatory_Info/File_Names/File')) {
        push @filenames, $d->value;
    }

    return @filenames;
}
sub get_filetype {
    my $obj = shift;
    return $obj->path('Mandatory_Info/File_Type')->value;
}

sub get_monitor {
    my $obj = shift;
    return $obj->path('Mandatory_Info/Monitor')->value;
}

sub get_domain {
    my $obj = shift;
    return $obj->path('Mandatory_Info/Domain')->value;
}

sub get_source {
    my $obj = shift;
    return $obj->path('Mandatory_Info/Source')->value;
}

sub get_submissiontime {
    my $obj = shift;
    return $obj->path('Mandatory_Info/Submission_Time')->value;
}

sub get_filestatus {
    my $obj = shift;
    return $obj->path('Optional_Info/File_Status')->value;
}

# needed to use a require when including this file
return 1;

Test Code and Data

And following is the test code I wrote. Not a big deal but it makes easy to check if the functions are working correctly. I choosed to embed a test XML in this file instead of reading it from a file.

#!/usr/bin/perl -w

use strict;
use XML::MyXML qw(tidy_xml xml_to_object);
use XML::MyXML qw(:all);

require "readXML.pl";

my $xml = "



    Param1
    domain
    tmp
    machine1

      machine1
      machine2


      /export/home/user/file1.tmp

    Thursday, 29 January 2009 08:45:05 GMT


  FLOWING


";

my $obj = xml_to_object($xml);
run_tests($obj);

# Test sub-routine to assert that we are reading the correct
# values from the XML. Use only with $test_xml.
sub run_tests {
    my $obj = shift;
    my @destinations = get_destinations($obj);

    assert ($destinations[0] ,  "machine1", "get_destinations");
    assert ($destinations[1] ,  "machine2", "get_destinations");

    my @file_names = get_filenames($obj);

    assert ($file_names[0] ,  "/export/home/user/file1.tmp",
            "get_filenames");

    my $filetype = get_filetype($obj);
    assert ($filetype ,  "tmp", "get_filetype");

    my $monitor = get_monitor($obj);
    assert ($monitor ,  "Param1", "get_monitor");

    my $domain = get_domain($obj);
    assert ($domain ,  "domain", "get_domain");

    my $source = get_source($obj);
    assert ($source ,  "machine1", "get_source");

    my $submission_time = get_submissiontime($obj);
    assert ($submission_time ,  "Thursday, 29 January 2009 08:45:05 GMT",
            "get_submissiontime");

    my $file_status = get_filestatus($obj);
    assert ($file_status ,  "FLOWING", "get_filestatus");
}

# Test if the first two arguments are the same string
# Used in run_tests from testing
sub assert {
    my $arg1 = shift;
    my $arg2 = shift;
    my $descr = shift;

    if ($arg1 eq $arg2) {
        print "PASSED\n";
        return;
    }

    print $descr . "\tFAILED!!!\n";
    return;
}

Hope it helps someone. It will surely help me the next time I try to do it. My Perl skills were all rusty.

Comments are closed.