Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

get data in xml document
hi to all,
i've a xml document as:
<?xml version="1.0"?>
<!DOCTYPE ElencoNomi>
<ElencoNomi>
<!-- commento -->
<!-- -->
<!-- commento  -->

<ch:names>
<ch:name>one</ch:name>
<ch:name>two</ch:name>
<ch:name>freel</ch:name>
<ch:name>four</ch:name>
<ch:name>five</ch:name>
<ch:name>six</ch:name>
<ch:name>seven</ch:name>
</ch:names>
</ElencoNomi>

i would get the word between <ch:name> and </ch:name>.
how can i do the get it in awk?
thanks

Report this thread to moderator Post Follow-up to this message
Old Post
Patrick
04-03-08 12:12 AM


Re: get data in xml document
On Apr 2, 4:00 pm, Patrick <patrick.mor...@gmail.com> wrote:
> hi to all,
> i've a xml document as:
> <?xml version="1.0"?>
> <!DOCTYPE ElencoNomi>
> <ElencoNomi>
>  <!-- commento -->
>  <!-- -->
>  <!-- commento  -->
>
> <ch:names>
>         <ch:name>one</ch:name>
>         <ch:name>two</ch:name>
>         <ch:name>freel</ch:name>
>         <ch:name>four</ch:name>
>         <ch:name>five</ch:name>
>         <ch:name>six</ch:name>
>         <ch:name>seven</ch:name>
> </ch:names>
> </ElencoNomi>
>
> i would get the word between <ch:name> and </ch:name>.
> how can i do the get it in awk?
> thanks

One dirty quick hack would be: awk -F'[<>]' '/<ch:name>/{print $3}'

This does not work if the "word" contains a ">" character.

Hermann

Report this thread to moderator Post Follow-up to this message
Old Post
Hermann Peifer
04-03-08 12:12 AM


Re: get data in xml document
Patrick wrote:

> hi to all,
> i've a xml document as:
> <?xml version="1.0"?>
> <!DOCTYPE ElencoNomi>
> <ElencoNomi>
>  <!-- commento -->
>  <!-- -->
>  <!-- commento  -->
>
> <ch:names>
>         <ch:name>one</ch:name>
>         <ch:name>two</ch:name>
>         <ch:name>freel</ch:name>
>         <ch:name>four</ch:name>
>         <ch:name>five</ch:name>
>         <ch:name>six</ch:name>
>         <ch:name>seven</ch:name>
> </ch:names>
> </ElencoNomi>
>
> i would get the word between <ch:name> and </ch:name>.
> how can i do the get it in awk?
> thanks

Note that XML documents are mostly free-form, and an awk program that
correctly processes the document in the format you show will fail if
presented with the same XML document in a different format.
That said:

- if you are sure your XML input will *always* have the *exact same* format
you show above, then

awk -F '[<>]' '/<ch:name>/ {print $3}' yourfile.xml

is a way to extract the names;

- otherwise, you probabyl need to look for more XML-oriented processing
tools, like xgawk or xml starlet (google for more).

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.

Report this thread to moderator Post Follow-up to this message
Old Post
pk
04-03-08 12:12 AM


Re: get data in xml document
Hermann Peifer wrote:
 
>
> One dirty quick hack would be: awk -F'[<>]' '/<ch:name>/{print $3}'
>
> This does not work if the "word" contains a ">" character.

In xml, it can't afaik. If there is a > it must be written &gt;

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.

Report this thread to moderator Post Follow-up to this message
Old Post
pk
04-03-08 12:12 AM


Re: get data in xml document

On 4/2/2008 9:00 AM, Patrick wrote:
> hi to all,
> i've a xml document as:
> <?xml version="1.0"?>
> <!DOCTYPE ElencoNomi>
> <ElencoNomi>
>  <!-- commento -->
>  <!-- -->
>  <!-- commento  -->
>
> <ch:names>
>         <ch:name>one</ch:name>
>         <ch:name>two</ch:name>
>         <ch:name>freel</ch:name>
>         <ch:name>four</ch:name>
>         <ch:name>five</ch:name>
>         <ch:name>six</ch:name>
>         <ch:name>seven</ch:name>
> </ch:names>
> </ElencoNomi>
>
> i would get the word between <ch:name> and </ch:name>.
> how can i do the get it in awk?
> thanks

awk 'gsub(/[[:space:]]*<[/]?ch:name>/,"")' file

Ed.


Report this thread to moderator Post Follow-up to this message
Old Post
Ed Morton
04-03-08 12:12 AM


Re: get data in xml document
On Apr 2, 5:28 pm, Ed Morton <mor...@lsupcaemnt.com> wrote:
> On 4/2/2008 9:00 AM, Patrick wrote:
>
>
> 
> 
> 
>
> awk 'gsub(/[[:space:]]*<[/]?ch:name>/,"")' file
>
>         Ed.

This is indeed smarter than my dirty quick hack. I will remember it.
It would however eat [[:space:]] at the end of the "word" (if any and
I would be interested in keeping it).

Hermann

Report this thread to moderator Post Follow-up to this message
Old Post
Hermann Peifer
04-03-08 12:12 AM


Re: get data in xml document
On 4/2/2008 10:55 AM, Hermann Peifer wrote:
> On Apr 2, 5:28 pm, Ed Morton <mor...@lsupcaemnt.com> wrote:
> 
>
>
> This is indeed smarter than my dirty quick hack. I will remember it.
> It would however eat [[:space:]] at the end of the "word" (if any and
> I would be interested in keeping it).
>
> Hermann

Just tweak the RE if you want to preserve any trailing white space:

awk 'gsub(/([[:space:]]*<|<[/])ch:name>/,"")' file

Ed.



Report this thread to moderator Post Follow-up to this message
Old Post
Ed Morton
04-03-08 12:12 AM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

AWK archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 03:33 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.