Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

XMLGAWK help required
To help me get my head around XMLGAWK can someone solve the following.
I have a XMLTV data file from which I want to extract certain data and
write to a tab-delimited flat file.

The XMLTV data is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<tv><programme start="20041218204000 +1000" stop="20041218225000
+1000" channel="Network TEN Brisbane"><title>The
Frighteners</title><sub-title/><desc>A psychic private detective, who
consorts with deceased souls, becomes engaged in a mystery as members
of the town community begin dying mysteriously.</desc><rating
system="ABA"><value>M</value></rating><length
units="minutes">130</length><category>Horror</category></programme><programm
e
start="20041218080000 +1000" stop="20041218083000 +1000"
channel="Network TEN Brisbane"><title>Worst Best
Friends</title><sub-title>Better Than Glen</sub-title><desc>Life's
like that for Roger Thesaurus - two of his best friends are also his
worst enemies!</desc><rating
system="ABA"><value>C</value></rating><length
units="minutes">30</length><category>Children</category></programme></tv>

The flate file needs to be as follows:

channel<tab>programme
start<tab>length<tab>title<tab>description<tab>rating value

So the first record would read:

Network TEN Brisbane<tab>2004-12-18 hh:mm<tab>130<tab>The
Frighteners<tab>A psychic private detective, who consorts with
deceased souls, becomes engaged in a mystery as members of the town
community begin dying mysteriously.<tab>M

The start time, which I've just shown as hh:mm, is obviously derived
from the start record but the +1000 does not need to be taken into
consideration.

Thanks for any and all help.

Report this thread to moderator Post Follow-up to this message
Old Post
RipBurn
12-20-04 01:55 PM


Re: XMLGAWK help required
Jürgen Kahrs wrote:

> William James wrote:
> ... 
>
> The pattern/action is the basis of course.
> But I sometimes wonder how difficult it is
> for users of XMLgawk that they *also* have
> to keep track of the sequence of XMLSTARTELEM,
> XMLCHARDATA and XMLENDELEM.
>
> For example, when writing the doc, I wonder
> if I have to keep explaining this fragment
> each time it occurs:
>
> XMLCHARDATA                { data  = $0    }
> XMLENDELEM  == "desc"      { desc  = data  }

This recurrent code fragment can by simplified by using the 'xmllib.awk'
library (included in XMLgawk). It becomes:

EE == "desc" { ... CDATA ...  }

(use CDATA instead of 'desc')

--
To reply by e-mail, please remove the extra dot
in the given address:  m.collado -> mcollado


Report this thread to moderator Post Follow-up to this message
Old Post
Manuel Collado
12-20-04 01:55 PM


Re: XMLGAWK help required
Manuel Collado wrote:

> Jürgen Kahrs wrote:
> ... 
>
> This recurrent code fragment can by simplified by using the 'xmllib.awk'
> library (included in XMLgawk). It becomes:
>
> EE == "desc" { ... CDATA ...  }
>
> (use CDATA instead of 'desc')

Sorry. Better explained as --> It becomes:

EE == "desc"    { desc  = CDATA }

or use CDATA directly instead of 'desc'.
--
To reply by e-mail, please remove the extra dot
in the given address:  m.collado -> mcollado


Report this thread to moderator Post Follow-up to this message
Old Post
Manuel Collado
12-20-04 08:55 PM


Re: XMLGAWK help required
William James <w_a_x_man@yahoo.com> wrote: 
>
> I hope  you're right. Since it seemed that a mere shell could do it so
> easily, I hesitated to post an awk solution.

Hee, hee... You can read up on Expat binding for Bash at
http://home.eol.ca/~parkw/park-january.html
near the bottom.  It's part of tutorials I wrote.  The other articles
are at
http://linuxgazette.net/108/park.html
http://linuxgazette.net/109/park.html

--
William Park <opengeometry@yahoo.ca>
Open Geometry Consulting, Toronto, Canada
Linux solution for data processing.

Report this thread to moderator Post Follow-up to this message
Old Post
William Park
12-21-04 01:56 AM


Re: XMLGAWK help required
Hello,

> To help me get my head around XMLGAWK can someone solve the following.
> I have a XMLTV data file from which I want to extract certain data and
> write to a tab-delimited flat file.

I think this one will do:

BEGIN { XMLMODE=1 }

XMLSTARTELEM  == "programme" {
channel = XMLATTR["channel"]
start   = XMLATTR["start"]
}

XMLCHARDATA                { data  = $0    }
XMLENDELEM  == "desc"      { desc  = data  }
XMLENDELEM  == "length"    { leng  = data  }
XMLENDELEM  == "title"     { title = data  }
XMLENDELEM  == "programme" {
print channel "\t" substr(start,1,4) "-" substr(start,5,2) "-" substr(start,
7,2),
substr(start,9,2) ":" substr(start,11,2) "\t" leng "\t" title "\t" desc "\tM
"
}

> So the first record would read:
>
> Network TEN Brisbane<tab>2004-12-18 hh:mm<tab>130<tab>The
> Frighteners<tab>A psychic private detective, who consorts with
> deceased souls, becomes engaged in a mystery as members of the town
> community begin dying mysteriously.<tab>M

I have tested the script above and the output
is as expected.

> The start time, which I've just shown as hh:mm, is obviously derived
> from the start record but the +1000 does not need to be taken into
> consideration.

The script currently ignores it, but it could also be
taken into account.

William Park was faster in supplying a solution,
but I think the AWK solution is more readable.
Anyway, William is a tough competitor.

Report this thread to moderator Post Follow-up to this message
Old Post
Jürgen Kahrs
12-22-04 01:55 PM


Re: XMLGAWK help required
William James wrote:

> I hope  you're right. Since it seemed that a mere shell could do it so
> easily, I hesitated to post an awk solution.

Thanks for posting your solution in pure GNU Awk.
It is interesting to see the amount of overhead
needed to process XML. When comparing our solutions,
do you think that the simplification which XMLgawk
introduces justifies the effort of extending GNU Awk
once more ? Does XMLgawk's approach of signaling the
occurence of a tag with special variables make sense
to you ?

> (How do you keep this #!@% google from removing indentation?)

Is it really Google who removes the indentation ?
When I post a script (not via Google) someone
replaces two leading blank with one leading blank.
This is not the same that happened to your script.

Report this thread to moderator Post Follow-up to this message
Old Post
Jürgen Kahrs
12-22-04 01:55 PM


Re: XMLGAWK help required
J=FCrgen Kahrs wrote:

> Thanks for posting your solution in pure GNU Awk.

In Kernighan's "One True Awk".

> It is interesting to see the amount of overhead
> needed to process XML.

I already had and frequently used the functions Match()
and _match(), so the only bothersome thing was writing
bookends().

> When comparing our solutions,
> do you think that the simplification which XMLgawk
> introduces justifies the effort of extending GNU Awk
> once more ?

I don't know; I'm not very familiar with gawk.

> Does XMLgawk's approach of signaling the
> occurence of a tag with special variables make sense
> to you ?

Yes.  It works well with the  <test> { <actions> }  pairs
of awk.
 
>
> Is it really Google who removes the indentation ?

Before google changed its newsgroup handling recently,
the leading blanks were kept.


Report this thread to moderator Post Follow-up to this message
Old Post
William James
12-22-04 01:55 PM


Re: XMLGAWK help required
Jürgen Kahrs wrote:

> William James wrote:
> ... 
>
> The pattern/action is the basis of course.
> But I sometimes wonder how difficult it is
> for users of XMLgawk that they *also* have
> to keep track of the sequence of XMLSTARTELEM,
> XMLCHARDATA and XMLENDELEM.
>
> For example, when writing the doc, I wonder
> if I have to keep explaining this fragment
> each time it occurs:
>
> XMLCHARDATA                { data  = $0    }
> XMLENDELEM  == "desc"      { desc  = data  }

This recurrent code fragment can by simplified by using the 'xmllib.awk'
library (included in XMLgawk). It becomes:

EE == "desc" { ... CDATA ...  }

(use CDATA instead of 'desc')

--
To reply by e-mail, please remove the extra dot
in the given address:  m.collado -> mcollado


Report this thread to moderator Post Follow-up to this message
Old Post
Manuel Collado
12-22-04 01:55 PM


Re: XMLGAWK help required
Manuel Collado wrote:

> Jürgen Kahrs wrote:
> ... 
>
> This recurrent code fragment can by simplified by using the 'xmllib.awk'
> library (included in XMLgawk). It becomes:
>
> EE == "desc" { ... CDATA ...  }
>
> (use CDATA instead of 'desc')

Sorry. Better explained as --> It becomes:

EE == "desc"    { desc  = CDATA }

or use CDATA directly instead of 'desc'.
--
To reply by e-mail, please remove the extra dot
in the given address:  m.collado -> mcollado


Report this thread to moderator Post Follow-up to this message
Old Post
Manuel Collado
12-23-04 01:55 PM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

AWK archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 08:02 AM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.