Home > Archive > AWK > March 2007 > xml and awk
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
| Mag Gam 2007-03-03, 6:57 pm |
| Hi All,
I have noticed a lot of people using awk for XML extracts. I am able
to get my extracts by using '/ /' in an awk statement however, is it
possible to use a 'push' and 'pop' method similar to stack data
structure?
For example, input looks like this:
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
<CATALOG>
I am trying to get
Artist: Bob Dylan
Company: Columbia
TIA!
| |
| Mag Gam 2007-03-03, 6:57 pm |
| On Mar 3, 11:04 am, "Mag Gam" <magaw...@gmail.com> wrote:
> Hi All,
>
> I have noticed a lot of people using awk for XML extracts. I am able
> to get my extracts by using '/ /' in an awk statement however, is it
> possible to use a 'push' and 'pop' method similar to stack data
> structure?
>
> For example, input looks like this:
>
> <CATALOG>
> <CD>
> <TITLE>Empire Burlesque</TITLE>
> <ARTIST>Bob Dylan</ARTIST>
> <COUNTRY>USA</COUNTRY>
> <COMPANY>Columbia</COMPANY>
> <PRICE>10.90</PRICE>
> <YEAR>1985</YEAR>
> </CD>
> <CATALOG>
>
> I am trying to get
> Artist: Bob Dylan
> Company: Columbia
>
> TIA!
Correction:
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
<TAPE>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>6.99</PRICE>
<YEAR>1985</YEAR>
<TAPE>
<CATALOG>
I am trying to get
Artist: Bob Dylan
Company: Columbia
CD Price: 10.90
Tape Price: 6.99
Sorry for the confusion. I am able to get the first part, fairly easy,
however the second part is a lot harder. Thats the reason why I am
asking about the 'stack' and 'pop' option.
TIA!
| |
|
|
| Mag Gam 2007-03-03, 6:57 pm |
| On Mar 3, 11:23 am, Klaus Alexander Seistrup <k...@seistrup.dk> wrote:
> Mag Gam wrote:
>
> You could try XMLgawk from
> http://home.vrweb.de/~juergen.kahrs/gawk/XML/
>
> Cheers,
>
> --
> Klaus Alexander Seistrup
> Tv-fri medielicensbetalerhttp://klaus.seistrup.dk/
Klaus:
Thanks! Do I need to have gawk installed? I only have awk and nawk
installed on my system.
| |
| Mag Gam 2007-03-03, 6:57 pm |
| On Mar 3, 11:29 am, "Mag Gam" <magaw...@gmail.com> wrote:
> On Mar 3, 11:23 am, Klaus Alexander Seistrup <k...@seistrup.dk> wrote:
>
>
>
>
>
> Klaus:
> Thanks! Do I need to have gawk installed? I only have awk and nawk
> installed on my system.
Actually, I installed gawk for Windows now, looks like this won't work
on windows since -l option is not avaliable. Also, I prefer going with
'stock' as much as possible. Any other suggestions?
| |
| Jürgen Kahrs 2007-03-03, 6:57 pm |
| Mag Gam wrote:
> Actually, I installed gawk for Windows now, looks like this won't work
> on windows since -l option is not avaliable. Also, I prefer going with
> 'stock' as much as possible. Any other suggestions?
'stock' wont do. At the moment, the XML extension
that comes with the xgawk distribution only runs
with xgawk. A back-port of the XML extension to
'stock' gawk is possible but hasnt been done yet.
Try xgawk. We are working towards a convergence
of source code for gawk and xgawk.
| |
| Ted Davis 2007-03-03, 6:57 pm |
| On 3 Mar 2007 08:12:54 -0800, "Mag Gam" <magawake@gmail.com> wrote:
>On Mar 3, 11:04 am, "Mag Gam" <magaw...@gmail.com> wrote:
>
>Correction:
> <CATALOG>
> <CD>
> <TITLE>Empire Burlesque</TITLE>
> <ARTIST>Bob Dylan</ARTIST>
> <COUNTRY>USA</COUNTRY>
> <COMPANY>Columbia</COMPANY>
> <PRICE>10.90</PRICE>
> <YEAR>1985</YEAR>
> </CD>
><TAPE>
> <TITLE>Empire Burlesque</TITLE>
> <ARTIST>Bob Dylan</ARTIST>
> <COUNTRY>USA</COUNTRY>
> <COMPANY>Columbia</COMPANY>
> <PRICE>6.99</PRICE>
> <YEAR>1985</YEAR>
> <TAPE>
> <CATALOG>
>
> I am trying to get
> Artist: Bob Dylan
> Company: Columbia
> CD Price: 10.90
> Tape Price: 6.99
>
>Sorry for the confusion. I am able to get the first part, fairly easy,
>however the second part is a lot harder. Thats the reason why I am
>asking about the 'stack' and 'pop' option.
>
>TIA!
That would be a very un-awk way of doing it. Consider this instead:
BEGIN{
FS = "[><]"
Flag = ""
}
{
if( NF == 3 ) Flag = $2
Table[ Flag $2 ] = $3
}
END{
print "Artist: " Table[ "CDARTIST" ]
print "Company: " Table[ "CDCOMPANY" ]
print "CD Price: " Table[ "CDPRICE" ]
print "Tape Price: " Table[ "TAPEPRICE" ]
}
Screen dump (XP):
E:\MyFiles>awk -fawktest.awk test.txt
Artist: Bob Dylan
Company: Columbia
CD Price: 10.90
Tape Price: 6.99
It would produce a neater output if the END block were done this way
(and a fixed width font is used):
END{
printf( "%-14s%s\n", "Artist:", Table[ "CDARTIST" ] )
printf( "%-14s%s\n", "Company:", Table[ "CDCOMPANY" ] )
printf( "%-14s$%5s\n", "CD Price:", Table[ "CDPRICE" ] )
printf( "%-14s$%5s\n", "Tape Price:", Table[ "TAPEPRICE" ] )
}
Screen dump:
E:\MyFiles>awk -fawktest.awk test.txt
Artist: Bob Dylan
Company: Columbia
CD Price: $10.90
Tape Price: $ 6.99
--
T.E.D. (tdavis@gearbox.maem.umr.edu) Remove "gearbox.maem" to get real address - that one is dead
| |
| Klaus Alexander Seistrup 2007-03-03, 6:57 pm |
| Mag Gam wrote:
>
> Do I need to have gawk installed?
You should install xgawk from the link I gave you and use that one.
Xgawk, in XML mode, uses XML elements rather than whitespace separated
text fields (see the numerous examples on the website).
Cheers,
--
Klaus Alexander Seistrup
Tv-fri medielicensbetaler
http://klaus.seistrup.dk/
| |
| Klaus Alexander Seistrup 2007-03-03, 6:57 pm |
| I wrote:
> Xgawk, in XML mode, uses XML elements rather than whitespace
> separated text fields (see the numerous examples on the website).
You could use a script along these lines:
#v+
#!/usr/local/bin/xgawk -f
@load xml
XMLCHARDATA { data = $0; }
XMLENDELEM == "ARTIST" {
what["ARTIST"] = data
}
XMLENDELEM == "COMPANY" {
what["COMPANY"] = data
}
END {
if (what["ARTIST"])
print " Artist: " what["ARTIST"]
if (what["COMPANY"])
print "Company: " what["COMPANY"]
}
# eof
#v-
Run with your sample XML file yields:
Artist: Bob Dylan
Company: Columbia
Cheers,
--
Klaus Alexander Seistrup
http://klaus.seistrup.dk/
| |
| William Park 2007-03-09, 6:57 pm |
| Mag Gam <magawake@gmail.com> wrote:
> Correction:
> <CATALOG>
> <CD>
> <TITLE>Empire Burlesque</TITLE>
> <ARTIST>Bob Dylan</ARTIST>
> <COUNTRY>USA</COUNTRY>
> <COMPANY>Columbia</COMPANY>
> <PRICE>10.90</PRICE>
> <YEAR>1985</YEAR>
> </CD>
> <TAPE>
> <TITLE>Empire Burlesque</TITLE>
> <ARTIST>Bob Dylan</ARTIST>
> <COUNTRY>USA</COUNTRY>
> <COMPANY>Columbia</COMPANY>
> <PRICE>6.99</PRICE>
> <YEAR>1985</YEAR>
> <TAPE>
> <CATALOG>
>
> I am trying to get
> Artist: Bob Dylan
> Company: Columbia
> CD Price: 10.90
> Tape Price: 6.99
>
> Sorry for the confusion. I am able to get the first part, fairly easy,
> however the second part is a lot harder. Thats the reason why I am
> asking about the 'stack' and 'pop' option.
Since you asked, shell approach would go something like :-)
data ()
{
case ${XML_TAG_STACK[0]}.${XML_TAG_STACK[1]}.${XML_TAG_STACK[2]} in
ARTIST.CD.CATALOG) echo "Artist: $1" ;;
COMPANY.CD.CATALOG) echo "Company: $1" ;;
PRICE.CD.CATALOG) echo "CD Price: $1" ;;
PRICE.TAPE.CATALOG) echo "Tape Price: $1" ;;
esac
}
expat -d data < file.xml
Ref:
http://home.eol.ca/~parkw/index.html#expat
--
William Park <opengeometry@yahoo.ca>, Toronto, Canada
ThinFlash: Linux thin-client on USB key (flash) drive
http://home.eol.ca/~parkw/thinflash.html
BashDiff: Super Bash shell
http://freshmeat.net/projects/bashdiff/
| |
| Jürgen Kahrs 2007-03-10, 6:57 pm |
| Hello William,
> data ()
> {
> case ${XML_TAG_STACK[0]}.${XML_TAG_STACK[1]}.${XML_TAG_STACK[2]} in
> ARTIST.CD.CATALOG) echo "Artist: $1" ;;
> COMPANY.CD.CATALOG) echo "Company: $1" ;;
> PRICE.CD.CATALOG) echo "CD Price: $1" ;;
> PRICE.TAPE.CATALOG) echo "Tape Price: $1" ;;
> esac
> }
> expat -d data < file.xml
A short solution. But the nesting of XML tags
is "hardwired" into the code. This improves
readability, but may become inconvenient if
the nesting is ever changed.
Anyway, you found a simple bash-conformant solution
for passing attributes of tags.
|
|
|
|
|