Code Comments
Programming Forum and web based access to our favorite programming groups.hi to all, i've a xml document as: <?xml version="1.0"?> <!DOCTYPE ElencoNomi> <ElencoNomi> <!-- commento --> <!-- --> <!-- commento --> <ch:names> <ch:name>one</ch:name> <ch:name>two</ch:name> <ch:name>freel</ch:name> <ch:name>four</ch:name> <ch:name>five</ch:name> <ch:name>six</ch:name> <ch:name>seven</ch:name> </ch:names> </ElencoNomi> i would get the word between <ch:name> and </ch:name>. how can i do the get it in awk? thanks
Post Follow-up to this messageOn Apr 2, 4:00 pm, Patrick <patrick.mor...@gmail.com> wrote:
> hi to all,
> i've a xml document as:
> <?xml version="1.0"?>
> <!DOCTYPE ElencoNomi>
> <ElencoNomi>
> <!-- commento -->
> <!-- -->
> <!-- commento -->
>
> <ch:names>
> <ch:name>one</ch:name>
> <ch:name>two</ch:name>
> <ch:name>freel</ch:name>
> <ch:name>four</ch:name>
> <ch:name>five</ch:name>
> <ch:name>six</ch:name>
> <ch:name>seven</ch:name>
> </ch:names>
> </ElencoNomi>
>
> i would get the word between <ch:name> and </ch:name>.
> how can i do the get it in awk?
> thanks
One dirty quick hack would be: awk -F'[<>]' '/<ch:name>/{print $3}'
This does not work if the "word" contains a ">" character.
Hermann
Post Follow-up to this messagePatrick wrote:
> hi to all,
> i've a xml document as:
> <?xml version="1.0"?>
> <!DOCTYPE ElencoNomi>
> <ElencoNomi>
> <!-- commento -->
> <!-- -->
> <!-- commento -->
>
> <ch:names>
> <ch:name>one</ch:name>
> <ch:name>two</ch:name>
> <ch:name>freel</ch:name>
> <ch:name>four</ch:name>
> <ch:name>five</ch:name>
> <ch:name>six</ch:name>
> <ch:name>seven</ch:name>
> </ch:names>
> </ElencoNomi>
>
> i would get the word between <ch:name> and </ch:name>.
> how can i do the get it in awk?
> thanks
Note that XML documents are mostly free-form, and an awk program that
correctly processes the document in the format you show will fail if
presented with the same XML document in a different format.
That said:
- if you are sure your XML input will *always* have the *exact same* format
you show above, then
awk -F '[<>]' '/<ch:name>/ {print $3}' yourfile.xml
is a way to extract the names;
- otherwise, you probabyl need to look for more XML-oriented processing
tools, like xgawk or xml starlet (google for more).
--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.
Post Follow-up to this messageHermann Peifer wrote:
>
> One dirty quick hack would be: awk -F'[<>]' '/<ch:name>/{print $3}'
>
> This does not work if the "word" contains a ">" character.
In xml, it can't afaik. If there is a > it must be written >
--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.
Post Follow-up to this messageOn 4/2/2008 9:00 AM, Patrick wrote: > hi to all, > i've a xml document as: > <?xml version="1.0"?> > <!DOCTYPE ElencoNomi> > <ElencoNomi> > <!-- commento --> > <!-- --> > <!-- commento --> > > <ch:names> > <ch:name>one</ch:name> > <ch:name>two</ch:name> > <ch:name>freel</ch:name> > <ch:name>four</ch:name> > <ch:name>five</ch:name> > <ch:name>six</ch:name> > <ch:name>seven</ch:name> > </ch:names> > </ElencoNomi> > > i would get the word between <ch:name> and </ch:name>. > how can i do the get it in awk? > thanks awk 'gsub(/[[:space:]]*<[/]?ch:name>/,"")' file Ed.
Post Follow-up to this messageOn Apr 2, 5:28 pm, Ed Morton <mor...@lsupcaemnt.com> wrote: > On 4/2/2008 9:00 AM, Patrick wrote: > > > > > > > awk 'gsub(/[[:space:]]*<[/]?ch:name>/,"")' file > > Ed. This is indeed smarter than my dirty quick hack. I will remember it. It would however eat [[:space:]] at the end of the "word" (if any and I would be interested in keeping it). Hermann
Post Follow-up to this messageOn 4/2/2008 10:55 AM, Hermann Peifer wrote: > On Apr 2, 5:28 pm, Ed Morton <mor...@lsupcaemnt.com> wrote: > > > > This is indeed smarter than my dirty quick hack. I will remember it. > It would however eat [[:space:]] at the end of the "word" (if any and > I would be interested in keeping it). > > Hermann Just tweak the RE if you want to preserve any trailing white space: awk 'gsub(/([[:space:]]*<|<[/])ch:name>/,"")' file Ed.
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.