Home > Archive > Tcl > November 2006 > How can I ensure that I always have a list?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
How can I ensure that I always have a list?
|
|
| comp.lang.tcl 2006-11-24, 7:02 pm |
| [TCL]
set contentsList [string trim $contentsList]
if {[string length $contentsList] == 0} { lappend contentsList {?} }
return [lrange $contentsList [expr {[lsearch -exact $contentsList
"?"] + 1}] end]
[/TCL]
I have a case where $contentsList might be empty or somehow, just plain
not a list. How can I enforce $contentsList to always be a list no
matter what I have as a value?
Here is the error:
list element in braces followed by "]" instead of space while executing
"lsearch -exact $contentsList "?"" (procedure
"XML_GET_ALL_ELEMENT_ATTRS" line 24) invoked from within
"XML_GET_ALL_ELEMENT_ATTRS "$userPath/xml/event" row" (procedure
"getEvents" line 18) invoked from within "getEvents" (procedure
"displayEvents" line 3) invoked from within "displayEvents" invoked
from within "append html "
I could really use some help with this one, it's affecting live data
right now
Thanx
Phil
| |
| George Petasis 2006-11-24, 7:02 pm |
| What you should do, actually depends on the format of your input data.
Since you want to treat a string directly as a list, this means that you
probably want to convert your string into a list by breaking at white
space. A better approach will be to use split:
set contentsList [split $contentsList]
This will ensure that contentsList will always contain a valid list.
But perhaps it is not the kind of list you want. But for this, you must
present more details about the expected data...
George
O/H comp.lang.tcl _γραψε:
> [TCL]
> set contentsList [string trim $contentsList]
> if {[string length $contentsList] == 0} { lappend contentsList {?} }
> return [lrange $contentsList [expr {[lsearch -exact $contentsList
> "?"] + 1}] end]
> [/TCL]
>
> I have a case where $contentsList might be empty or somehow, just plain
> not a list. How can I enforce $contentsList to always be a list no
> matter what I have as a value?
>
> Here is the error:
>
> list element in braces followed by "]" instead of space while executing
> "lsearch -exact $contentsList "?"" (procedure
> "XML_GET_ALL_ELEMENT_ATTRS" line 24) invoked from within
> "XML_GET_ALL_ELEMENT_ATTRS "$userPath/xml/event" row" (procedure
> "getEvents" line 18) invoked from within "getEvents" (procedure
> "displayEvents" line 3) invoked from within "displayEvents" invoked
> from within "append html "
>
> I could really use some help with this one, it's affecting live data
> right now
>
> Thanx
> Phil
>
| |
| comp.lang.tcl 2006-11-24, 7:02 pm |
|
George Petasis wrote:
> What you should do, actually depends on the format of your input data.
> Since you want to treat a string directly as a list, this means that you
> probably want to convert your string into a list by breaking at white
> space. A better approach will be to use split:
>
> set contentsList [split $contentsList]
>
> This will ensure that contentsList will always contain a valid list.
> But perhaps it is not the kind of list you want. But for this, you must
> present more details about the expected data...
>
> George
Thanx but I'm . I just want a list; what do you mean by "kind
of list"? There is just a list in TCL or it's not a list.
I'll keep that command in mind, thanx.. however, it appears that it
adds junk characters like "]" to the end of the "list" causing it to be
a mal-formed list.
Phil
[color=darkred]
>
> O/H comp.lang.tcl =CE=AD=CE=B3=CF=81=CE=B1=CF=88=CE=B5:
} }[color=darkred]
| |
| Bryan Oakley 2006-11-24, 7:02 pm |
| comp.lang.tcl wrote:
> [TCL]
> set contentsList [string trim $contentsList]
> if {[string length $contentsList] == 0} { lappend contentsList {?} }
> return [lrange $contentsList [expr {[lsearch -exact $contentsList
> "?"] + 1}] end]
> [/TCL]
>
> I have a case where $contentsList might be empty or somehow, just plain
> not a list. How can I enforce $contentsList to always be a list no
> matter what I have as a value?
>
You enforce $contentList to always be a list by first being certain it
is created as a list, then never perform string transformations on the
string representation of the list.
> Here is the error:
>
> list element in braces followed by "]" instead of space while executing
> "lsearch -exact $contentsList "?"" ...
That to me means that $contentsList wasn't a list to begin with. If the
data starts out as a string there is no way to guarantee it "always be a
list" since it was never a list to begin with.
Most likely, the first step is to explicitly convert your data from its
string representation into a bona fide list (generally, with the "split"
command). Once you've done that, as long as you never transform the list
with string commands it will always remain a list and can contain any
characters in any sequence that you want.
Never assume data from an external source is a valid tcl list -- always
do an explicit conversion from the original data into a valid tcl list.
Once it is a bona fide list it will remain as such as long as you never
transform the data with string commands.
| |
| Bryan Oakley 2006-11-24, 7:02 pm |
| comp.lang.tcl wrote:
> George Petasis wrote:
>
>
>
> Thanx but I'm . I just want a list; what do you mean by "kind
> of list"? There is just a list in TCL or it's not a list.
You've hit the nail on the head. It's either a list or it's not a list.
If it's a list, you'll never get the errors you say you're getting.
Thus, your data isn't a list. The code snippet you gave showed no
commands that would transform a list to a string.
>
> I'll keep that command in mind, thanx.. however, it appears that it
> adds junk characters like "]" to the end of the "list" causing it to be
> a mal-formed list.
Split will never add "junk characters" or any other type of character.
That is a certainty.
Note, however, that if you print out a list, what you *see* may contain
extra characters such as backslashes and curly braces. These characters
are not part of the list, but are what Tcl adds to the string form of
the list to guarantee the string can be converted back to the original list.
Bottom line: it appears your data is not a list so there's no way to
avoid errors like you're getting, unless you take the first step of
making sure your data is a valid tcl list to begin with. And note that
"valid tcl list" isn't synonymous with "words separated by spaces". It's
a bit more complicated than that.
| |
| comp.lang.tcl 2006-11-24, 7:02 pm |
|
Bryan Oakley wrote:
> comp.lang.tcl wrote:
>
> You enforce $contentList to always be a list by first being certain it
> is created as a list, then never perform string transformations on the
> string representation of the list.
>
>
> That to me means that $contentsList wasn't a list to begin with. If the
> data starts out as a string there is no way to guarantee it "always be a
> list" since it was never a list to begin with.
>
> Most likely, the first step is to explicitly convert your data from its
> string representation into a bona fide list (generally, with the "split"
> command). Once you've done that, as long as you never transform the list
> with string commands it will always remain a list and can contain any
> characters in any sequence that you want.
>
> Never assume data from an external source is a valid tcl list -- always
> do an explicit conversion from the original data into a valid tcl list.
> Once it is a bona fide list it will remain as such as long as you never
> transform the data with string commands.
Ok I used [split] but it mangled $contentsList into this format:
? id {{1}} event_id {{354}} event_name \{No Those Who Mourn study
tonight\} event_img_path
{{http://www.livejournal.com/userpic/52983975/11318118}} event_img_alt
\{Those Who Mourn\}
When it should be
? id {{1}} event_id {{354}} event_name {{No Those Who Mourn study
tonight}} event_img_path
{{http://www.livejournal.com/userpic/52983975/11318118}} event_img_alt
{{Those Who Mourn}}
By doing this:
[TCL]
proc XML_GET_ALL_ELEMENT_ATTRS {fileName parseString {switch {}}} {
set cannotOpenFile [catch {
set fileID [open ${fileName}.xml r]
fconfigure $fileID -buffering full -buffersize 32768
} cannotOpenFileErrMsg]
if {$cannotOpenFile} {
return {}
} else {
set contents [read $fileID [file size ${fileName}.xml]]; close
$fileID
regsub -all "[format %c 123]" $contents {\&lbr;} contents
regsub -all "[format %c 125]" $contents {\&rbr;} contents
if {[string equal $switch -body]} {
regsub -all {(> )[^<]+(< )} $contents "\\1\\2" contents
}
regsub -all {([a-zA-Z0-9]+)="([^">]+)"} $contents "\\1 {\\2}"
contentsList
regsub -all {<[a-zA-Z0-9\?\n/]+} $contentsList {} contentsList
regsub -all -nocase -- {<!\-\-.+\-\->} $contentsList {} contentsList
regsub -all {<|>} $contentsList {} contentsList
if {[string equal $switch -body]} {
regsub -all {( )[^= ]+=[^= ]+( )} $contentsList "\\1" contentsList
}
set contentsList [string trim $contentsList]
if {[string length $contentsList] == 0} { lappend contentsList {?} }
regsub {[\]]+$} $contentsList {} contentsList; # 11/24/2006 FOR SOME
REASON IT ADDS "]" TO THE END OF $contentsList - NEED FURTHER
INVESTIGATION
set contentsList [split $contentsList]; # PER comp.lang.tcl
CONVERSATION THIS WILL ENSURE THAT $contentsList IS A LIST
puts $contentsList
return [lrange $contentsList [expr {[lsearch -exact $contentsList
"?"] + 1}] end]
}
}
[/TCL]
This proc will convert an entire XML file into a TCL list by grabbing
all of its attributes and doing the following:
attr {{value}} attr {{value}}
Phil
| |
| Bryan Oakley 2006-11-24, 7:02 pm |
| comp.lang.tcl wrote:
> Ok I used [split] but it mangled $contentsList into this format:
>
> ? id {{1}} event_id {{354}} event_name \{No Those Who Mourn study
> tonight\} event_img_path
> {{http://www.livejournal.com/userpic/52983975/11318118}} event_img_alt
> \{Those Who Mourn\}
>
> When it should be
>
> ? id {{1}} event_id {{354}} event_name {{No Those Who Mourn study
> tonight}} event_img_path
> {{http://www.livejournal.com/userpic/52983975/11318118}} event_img_alt
> {{Those Who Mourn}}
>
Like I said, what you *see* will be different than what is actually in
the list. Do [lindex $contentsList 2], for example, and you will see
one level of curly braces magically disappearing.
If you want a way to visualize the data without the extra curly braces,
try doing "puts [join $contentsList { }]". You'll get something like
this, with all those extra curly braces and backslashes gone:
? id {1} event_id {354} event_name {No Those Who Mourn study tonight}
event_img_path {http://www.livejournal.com/userpic/52983975/11318118}
event_img_alt {Those Who Mourn}
Same data, just a different way to look at it.
> By doing this:
>
> [TCL]
> proc XML_GET_ALL_ELEMENT_ATTRS {fileName parseString {switch {}}} {
> set cannotOpenFile [catch {
> set fileID [open ${fileName}.xml r]
> fconfigure $fileID -buffering full -buffersize 32768
> } cannotOpenFileErrMsg]
> if {$cannotOpenFile} {
> return {}
> } else {
> set contents [read $fileID [file size ${fileName}.xml]]; close
> $fileID
> regsub -all "[format %c 123]" $contents {\&lbr;} contents
> regsub -all "[format %c 125]" $contents {\&rbr;} contents
> if {[string equal $switch -body]} {
> regsub -all {(> )[^<]+(< )} $contents "\\1\\2" contents
> }
> regsub -all {([a-zA-Z0-9]+)="([^">]+)"} $contents "\\1 {\\2}"
> contentsList
> regsub -all {<[a-zA-Z0-9\?\n/]+} $contentsList {} contentsList
> regsub -all -nocase -- {<!\-\-.+\-\->} $contentsList {} contentsList
> regsub -all {<|>} $contentsList {} contentsList
> if {[string equal $switch -body]} {
> regsub -all {( )[^= ]+=[^= ]+( )} $contentsList "\\1" contentsList
> }
> set contentsList [string trim $contentsList]
> if {[string length $contentsList] == 0} { lappend contentsList {?} }
> regsub {[\]]+$} $contentsList {} contentsList; # 11/24/2006 FOR SOME
> REASON IT ADDS "]" TO THE END OF $contentsList - NEED FURTHER
> INVESTIGATION
> set contentsList [split $contentsList]; # PER comp.lang.tcl
> CONVERSATION THIS WILL ENSURE THAT $contentsList IS A LIST
> puts $contentsList
> return [lrange $contentsList [expr {[lsearch -exact $contentsList
> "?"] + 1}] end]
> }
> }
> [/TCL]
>
> This proc will convert an entire XML file into a TCL list by grabbing
> all of its attributes and doing the following:
>
> attr {{value}} attr {{value}}
Wow. Is there any reason you're not using a real XML parser to do the
job? Given that you're parsing structured XML, perhaps even the "HTML
parser in 10 lines of code" [1] technique might make the job easier.
[1] http://wiki.tcl.tk/14517
| |
| comp.lang.tcl 2006-11-24, 7:02 pm |
|
Bryan Oakley wrote:
> comp.lang.tcl wrote:
>
> Like I said, what you *see* will be different than what is actually in
> the list. Do [lindex $contentsList 2], for example, and you will see
> one level of curly braces magically disappearing.
>
> If you want a way to visualize the data without the extra curly braces,
> try doing "puts [join $contentsList { }]". You'll get something like
> this, with all those extra curly braces and backslashes gone:
>
> ? id {1} event_id {354} event_name {No Those Who Mourn study tonight}
> event_img_path {http://www.livejournal.com/userpic/52983975/11318118}
> event_img_alt {Those Who Mourn}
>
> Same data, just a different way to look at it.
>
>
> Wow. Is there any reason you're not using a real XML parser to do the
> job? Given that you're parsing structured XML, perhaps even the "HTML
> parser in 10 lines of code" [1] technique might make the job easier.
>
> [1] http://wiki.tcl.tk/14517
Simple. I don't understand the XML parsers for TCL. They're way over
my head. I only know how to do it in PHP and only using simple
PHP-based XML parsers to do the work for me, otherwise, it's over my
head like a satellite.
I went to the site and understand none of it, sorry. Please make it
simple as if I were 9 years old.
Thanx
Phil
| |
| Darren New 2006-11-24, 7:02 pm |
| Bryan Oakley wrote:
> Wow. Is there any reason you're not using a real XML parser to do the
> job? Given that you're parsing structured XML, perhaps even the "HTML
> parser in 10 lines of code" [1] technique might make the job easier.
Stop me if I'm wrong, but none of those "tiny" XML parsers seem to
handle things like escaped characters (" or xxx;) or CDATA or
other XML things that a real third-party XML-producing source might
produce, yes? I'm looking for something to replace TclXML for some
simple small-document parsing, but I have to work with the full XML
spec. (TclXML is OK, but the version numbers seem all messed up,
leaving me with things like the sgml package providing 3.0 and requiring
3.1 both in the same file, and breaking differently on different
platforms.) If I could get something small enough to include right in
the code, I could make my own libraries even more portable.
--
Darren New / San Diego, CA, USA (PST)
Scruffitarianism - Where T-shirt, jeans,
and a three-day beard are "Sunday Best."
| |
| Bryan Oakley 2006-11-24, 7:02 pm |
| comp.lang.tcl wrote:
> Bryan Oakley wrote:
>
>
> Simple. I don't understand the XML parsers for TCL. They're way over
> my head. I only know how to do it in PHP and only using simple
> PHP-based XML parsers to do the work for me, otherwise, it's over my
> head like a satellite.
>
> I went to the site and understand none of it, sorry. Please make it
> simple as if I were 9 years old.
Hmmm. Ok, well, I'm not sure what other advice I can give if regsub is
your tool of choice. You asked how to guarantee a list stays a list yet
you clearly show an example where you data was never a list to begin with.
A couple of us have suggested you use split to convert your string to a
list, and that appears to work for some value of "work". I guess the
next step is for you to tell us how or why that doesn't do what you want
it to do.
| |
| Bryan Oakley 2006-11-24, 7:02 pm |
| Darren New wrote:
> Bryan Oakley wrote:
>
>
>
> Stop me if I'm wrong, but none of those "tiny" XML parsers seem to
> handle things like escaped characters (" or xxx;) or CDATA or
> other XML things that a real third-party XML-producing source might
> produce, yes?
That is correct. That's why they are tiny and real XML parsers are not.
Creating a full-featured XML parser is not for the feint of heart.
| |
| comp.lang.tcl 2006-11-24, 7:02 pm |
|
Bryan Oakley wrote:
> comp.lang.tcl wrote:
>
>
> Hmmm. Ok, well, I'm not sure what other advice I can give if regsub is
> your tool of choice. You asked how to guarantee a list stays a list yet
> you clearly show an example where you data was never a list to begin with.
>
> A couple of us have suggested you use split to convert your string to a
> list, and that appears to work for some value of "work". I guess the
> next step is for you to tell us how or why that doesn't do what you want
> it to do.
[ADD rant]
Sorry, but I am not sure how to more clearly convey my thoughts than
this:
I want to take an XML file with attributes and no < /> tags
And convert the attributes into the following TCL list:
id 1 trivia_id 255 question {How much wood would a woodchuck chuck?}
answer_id 1 answer {A lot} expDate {116494926}
>From the row
<trivia id="1" trivia_id="255" question="How much wood would a
woodchuck chuck?" answer_id="1" answer="A lot"
expDate="116494926"></trivia>
I hope that makes it a bit more clear in light of my
XML_GET_ALL_ELEMENT_ATTRS proc I wrote (posted earlier). I am very
simply only trying to convert an XML row into a TCL list. I am not
always able to do so in light of the fact that sometimes I don't get
what TCL considers to be a list, thus the errors.
Please tell me if that's clear to you, I simply can't explain it any
better.
Phil
[/ADD rant]
| |
| Bryan Oakley 2006-11-24, 7:02 pm |
| comp.lang.tcl wrote:
> [ADD rant]
> Sorry, but I am not sure how to more clearly convey my thoughts than
> this:
>
I'm not sure why you are ranting. Your original question was simply how
to guarantee a list stays as a list. We've tried to help with that by
pointing out your data was never a list to begin with.
When you pointed out your real problem was to parse XML I simply
suggested you use a real XML parser rather than use a pile of regsubs.
> I want to take an XML file with attributes and no < /> tags
>
> And convert the attributes into the following TCL list:
>
> id 1 trivia_id 255 question {How much wood would a woodchuck chuck?}
> answer_id 1 answer {A lot} expDate {116494926}
>
>
> <trivia id="1" trivia_id="255" question="How much wood would a
> woodchuck chuck?" answer_id="1" answer="A lot"
> expDate="116494926"></trivia>
>
> I hope that makes it a bit more clear in light of my
> XML_GET_ALL_ELEMENT_ATTRS proc I wrote (posted earlier). I am very
> simply only trying to convert an XML row into a TCL list. I am not
> always able to do so in light of the fact that sometimes I don't get
> what TCL considers to be a list, thus the errors.
>
> Please tell me if that's clear to you, I simply can't explain it any
> better.
It's clear. The proper solution is to use an XML parser. I'm sorry if
you don't like that advice.
However, perhaps this will work for you, assuming the above data is
*exactly* the format you expect (no attributes that are missing values,
everything uses double quotes, etc)
proc getAttrs {string} {
set matches [regexp -inline -all \
{([a-zA-Z_]+)=\"([^\"]*)\"} $string]
set result {}
foreach {match name value} $matches {
lappend result $name $value
}
return $result
}
puts [getAttrs {
<trivia id="1" trivia_id="255" question="How much wood would a
woodchuck chuck?" answer_id="1" answer="A lot"
expDate="116494926"></trivia>
}]
This is what I get when I run the above code with the data you give:
id 1 trivia_id 255 question {How much wood would a
woodchuck chuck?} answer_id 1 answer {A lot} expDate 116494926
This solution will work as long as all attribute/value pairs match the
pattern [a-zA-Z_]+="[^"]". It's possible to improve upon that (for
example, to allow other characters for quoting, allow for spaces around
the equals, etc) but I'm not sure you are concerned with that.
| |
| comp.lang.tcl 2006-11-24, 7:02 pm |
|
Bryan Oakley wrote:
> comp.lang.tcl wrote:
>
> I'm not sure why you are ranting. Your original question was simply how
> to guarantee a list stays as a list. We've tried to help with that by
> pointing out your data was never a list to begin with.
>
Dude, I have ADD. Attention Deficit Disorder. Please understand that
in light of what you're saying.
> When you pointed out your real problem was to parse XML I simply
> suggested you use a real XML parser rather than use a pile of regsubs.
Right. But I do not understand XML parsers in TCL, I don't understand
them. They don't make sense to me. It's like as if I wanted you to
build a car in one hour.
>
>
> It's clear. The proper solution is to use an XML parser. I'm sorry if
> you don't like that advice.
I never said I didn't like it, I said I didn't understand it. I read
the information on XML parsers and can't fathom what they are saying;
it might as well be written in Korean, because it's just that hard for
me to understand.
I don't know how else to explain this to you than that, I'm sorry, I
simply can't.
>
> However, perhaps this will work for you, assuming the above data is
> *exactly* the format you expect (no attributes that are missing values,
> everything uses double quotes, etc)
>
> proc getAttrs {string} {
> set matches [regexp -inline -all \
> {([a-zA-Z_]+)=\"([^\"]*)\"} $string]
> set result {}
> foreach {match name value} $matches {
> lappend result $name $value
> }
>
> return $result
> }
>
> puts [getAttrs {
> <trivia id="1" trivia_id="255" question="How much wood would a
> woodchuck chuck?" answer_id="1" answer="A lot"
> expDate="116494926"></trivia>
> }]
>
> This is what I get when I run the above code with the data you give:
>
> id 1 trivia_id 255 question {How much wood would a
> woodchuck chuck?} answer_id 1 answer {A lot} expDate 116494926
>
>
> This solution will work as long as all attribute/value pairs match the
> pattern [a-zA-Z_]+="[^"]". It's possible to improve upon that (for
> example, to allow other characters for quoting, allow for spaces around
> the equals, etc) but I'm not sure you are concerned with that.
I will try that as much as I can and let you know, thanx
Phil
| |
| Gerald W. Lester 2006-11-24, 7:02 pm |
| comp.lang.tcl wrote:
> Bryan Oakley wrote:
>
> Ok I used [split] but it mangled $contentsList into this format:
>
> ? id {{1}} event_id {{354}} event_name \{No Those Who Mourn study
> tonight\} event_img_path
> {{http://www.livejournal.com/userpic/52983975/11318118}} event_img_alt
> \{Those Who Mourn\}
>
> When it should be
>
> ? id {{1}} event_id {{354}} event_name {{No Those Who Mourn study
> tonight}} event_img_path
> {{http://www.livejournal.com/userpic/52983975/11318118}} event_img_alt
> {{Those Who Mourn}}
>
> By doing this:
>
> [TCL]
> proc XML_GET_ALL_ELEMENT_ATTRS {fileName parseString {switch {}}} {
> set cannotOpenFile [catch {
> set fileID [open ${fileName}.xml r]
> fconfigure $fileID -buffering full -buffersize 32768
> } cannotOpenFileErrMsg]
> if {$cannotOpenFile} {
> return {}
> } else {
> set contents [read $fileID [file size ${fileName}.xml]]; close
> $fileID
> regsub -all "[format %c 123]" $contents {\&lbr;} contents
> regsub -all "[format %c 125]" $contents {\&rbr;} contents
> if {[string equal $switch -body]} {
> regsub -all {(> )[^<]+(< )} $contents "\\1\\2" contents
> }
> regsub -all {([a-zA-Z0-9]+)="([^">]+)"} $contents "\\1 {\\2}"
> contentsList
> regsub -all {<[a-zA-Z0-9\?\n/]+} $contentsList {} contentsList
> regsub -all -nocase -- {<!\-\-.+\-\->} $contentsList {} contentsList
> regsub -all {<|>} $contentsList {} contentsList
> if {[string equal $switch -body]} {
> regsub -all {( )[^= ]+=[^= ]+( )} $contentsList "\\1" contentsList
> }
> set contentsList [string trim $contentsList]
> if {[string length $contentsList] == 0} { lappend contentsList {?} }
> regsub {[\]]+$} $contentsList {} contentsList; # 11/24/2006 FOR SOME
> REASON IT ADDS "]" TO THE END OF $contentsList - NEED FURTHER
> INVESTIGATION
> set contentsList [split $contentsList]; # PER comp.lang.tcl
> CONVERSATION THIS WILL ENSURE THAT $contentsList IS A LIST
> puts $contentsList
> return [lrange $contentsList [expr {[lsearch -exact $contentsList
> "?"] + 1}] end]
> }
> }
> [/TCL]
>
> This proc will convert an entire XML file into a TCL list by grabbing
> all of its attributes and doing the following:
>
> attr {{value}} attr {{value}}
It appears you are dealing with XML -- STOP, I repeat STOP.
Do not attempt to parse the XML yourself, you will make mistakes (as
evidenced by your post.
Instead use either tDOM or TclDOM extensions to parse and extract values
from the XML.
--
+--------------------------------+---------------------------------------+
| Gerald W. Lester |
|"The man who fights for his ideals is the man who is alive." - Cervantes|
+------------------------------------------------------------------------+
| |
| comp.lang.tcl 2006-11-24, 7:02 pm |
|
Gerald W. Lester wrote:
> comp.lang.tcl wrote:
>
> It appears you are dealing with XML -- STOP, I repeat STOP.
>
> Do not attempt to parse the XML yourself, you will make mistakes (as
> evidenced by your post.
>
> Instead use either tDOM or TclDOM extensions to parse and extract values
> from the XML.
>
>
I installed TclXML, or at least the directory is there, but the
instructions from there onward are beyond my understanding, nor do the
Tcl samples make an ounce of sense. I sincerely appreciate the help so
far, however, I don't understand what to do at this point.
Phil
> --
> +--------------------------------+---------------------------------------+
> | Gerald W. Lester |
> |"The man who fights for his ideals is the man who is alive." - Cervantes|
> +------------------------------------------------------------------------+
| |
| Gerald W. Lester 2006-11-24, 7:02 pm |
| comp.lang.tcl wrote:
>...
>
> I installed TclXML, or at least the directory is there, but the
> instructions from there onward are beyond my understanding, nor do the
> Tcl samples make an ounce of sense. I sincerely appreciate the help so
> far, however, I don't understand what to do at this point.
What exactly are you attempting to do?
--
+--------------------------------+---------------------------------------+
| Gerald W. Lester |
|"The man who fights for his ideals is the man who is alive." - Cervantes|
+------------------------------------------------------------------------+
| |
| Mark Smithfield 2006-11-25, 4:06 am |
| >
> I installed TclXML, or at least the directory is there, but the
> instructions from there onward are beyond my understanding, nor do the
> Tcl samples make an ounce of sense. I sincerely appreciate the help so
> far, however, I don't understand what to do at this point.
>
> Phil
Phil,
I am not a professional programmer like the others that are helping
you. What that means is that I completely feel your pain with regards
the docs for the tcl xml parser. They rebuffed my advances several
times. But recently I completed a lovely little project and feel rather
confident with them.
The difference? Documentation supplemented with a very clear example
from the wiki. And so I share with you. http://wiki.tcl.tk/3884
This made the difference for me. It still took a little further poking,
but it clued me into the logic of the whole package. The pro's are
probably right about making the extra effort to use this. Unless you
have complete control of the data, you are going to get bad data and by
the time you patch all the holes... you know where the story goes.
Good luck
Mark.
(ps.. searching the wiki.tcl.tk for 'xml'. Anything that says 'a little
xml...' is going to help alot)
| |
| comp.lang.tcl 2006-11-25, 4:06 am |
|
Mark Smithfield wrote:
>
> Phil,
>
> I am not a professional programmer like the others that are helping
> you. What that means is that I completely feel your pain with regards
> the docs for the tcl xml parser. They rebuffed my advances several
> times. But recently I completed a lovely little project and feel rather
> confident with them.
>
> The difference? Documentation supplemented with a very clear example
> from the wiki. And so I share with you. http://wiki.tcl.tk/3884
Thank you for the example, but I'm sorry, I don't understand XML
parsing enough to know what this is, I'm sorry, I simply don't
understand any of this page either.
Phil
>
> This made the difference for me. It still took a little further poking,
> but it clued me into the logic of the whole package. The pro's are
> probably right about making the extra effort to use this. Unless you
> have complete control of the data, you are going to get bad data and by
> the time you patch all the holes... you know where the story goes.
>
In my case, in abject failure.
Phil
> Good luck
>
> Mark.
>
> (ps.. searching the wiki.tcl.tk for 'xml'. Anything that says 'a little
> xml...' is going to help alot)
| |
| comp.lang.tcl 2006-11-25, 4:06 am |
|
Gerald W. Lester wrote:
> comp.lang.tcl wrote:
>
> What exactly are you attempting to do?
It's so easy. All I want to do is convert an XML file into a TCL list,
that's it, just a TCL list:
attr1 {val1} attr2 {val2}
Phil
>
> --
> +--------------------------------+---------------------------------------+
> | Gerald W. Lester |
> |"The man who fights for his ideals is the man who is alive." - Cervantes|
> +------------------------------------------------------------------------+
| |
| Cameron Laird 2006-11-25, 8:02 am |
| In article <1164440504.571107.104710@h54g2000cwb.googlegroups.com>,
comp.lang.tcl <phillip.s.powell@gmail.com> wrote:
| |
| comp.lang.tcl 2006-11-25, 7:02 pm |
|
Cameron Laird wrote:
> In article <1164440504.571107.104710@h54g2000cwb.googlegroups.com>,
> comp.lang.tcl <phillip.s.powell@gmail.com> wrote:
> .
> .
> .
> .
> .
> .
> Should we recommend a SAX or XQuery approach for this?
> A SAX accumulator used to be canonical for something
> this simple; I haven't worked with Rolf's XPath, but I
> suspect it affords a one-liner that satisfies the
> requirements.
Per http://www.stylusstudio.com/xmldev/...post40240.html# I read on
SAX and.. it makes absolutely no sense to me
Per http://www.w3.org/TR/xquery/ ... second verse, same as the first
Dude, this makes no sense whatsoever, way too hard for me to figure
out! I've never, ever understood XML parsing because I never had to:
This is what I know, in PHP only:
$parser = @xml_parser_create();
@xml_parser_set_options($parser, XML_OPTION_SKIP_WHITE, true);
@xml_parse_into_struct($parser, $xml, $xmlArray, $tags);
@xml_parser_free($parser);
That's it. That's all I've ever had to know how to do to parse an XML
file. Which is why I got so desparate I wrote a PHP function to
convert the XML file into a TCL list, however, I'm having no love
getting TCL, PHP and XML to talk to one another and return the TCL
list.
Phil
| |
| Bryan Oakley 2006-11-25, 7:02 pm |
| comp.lang.tcl wrote:
> Gerald W. Lester wrote:
>
>
>
> It's so easy. All I want to do is convert an XML file into a TCL list,
> that's it, just a TCL list:
>
> attr1 {val1} attr2 {val2}
It is conceptually easy, but actually rather difficult to implement, as
you are now learning. This is why we keep recommending real XML parsers
over hand-hacked solutions using regexp.
Let's try again. You say "all I want to do is convert an XML file into a
TCL list". A little digging on the Tcler's wiki gives a proc that does
just that.
See if this tool will do the job for you: http://wiki.tcl.tk/3919
Unfortunately in all your posts you've yet to show us a complete example
of the actual data you are trying to parse before you've done a lot of
text processing, so I can't be sure if it will work for your particular
dataset. As long as your XML isn't too fancy, maybe the above solution
will work.
Don't read the page; just copy the "xml2list" proc from that page into
your script. Then, assuming that the XML is in a variable, call it like
this:
set list [xml2list $xml]
When I test it with this xml:
<trivia id="1" trivia_id="255" question="How much wood would a woodchuck
chuck?" answer_id="1" answer="A lot" expDate="116494926"></trivia>
I get this output:
trivia {id 1 trivia_id 255 question {How much wood would a woodchuck
chuck?} answer_id 1 answer {A lot} expDate 116494926} {}
Does that solve your problem?
| |
| comp.lang.tcl 2006-11-25, 7:02 pm |
|
Bryan Oakley wrote:
> comp.lang.tcl wrote:
>
> It is conceptually easy, but actually rather difficult to implement, as
> you are now learning. This is why we keep recommending real XML parsers
> over hand-hacked solutions using regexp.
>
> Let's try again. You say "all I want to do is convert an XML file into a
> TCL list". A little digging on the Tcler's wiki gives a proc that does
> just that.
>
> See if this tool will do the job for you: http://wiki.tcl.tk/3919
Upon implementing xml2list proc as-is I get the following error:
att's not paired: version 1.0 encoding utf-8 ? while executing "error
"att's not paired: $rest"" (procedure "xml2list" line 30)
It is failing for *all* of my XML files that I have on my site!
>
> Unfortunately in all your posts you've yet to show us a complete example
> of the actual data you are trying to parse before you've done a lot of
> text processing, so I can't be sure if it will work for your particular
> dataset. As long as your XML isn't too fancy, maybe the above solution
> will work.
I will try to duplicate it here:
<?xml version="1.0" encoding="utf-8" ?><trivia><entry id="1101"
triviaID="233" question="Who wrote "Trilogy of Knowledge"?"
answerID="1" correctAnswerID="1" answer="Believer"
expDate="1139634000"></entry><entry id="1102" triviaID="233"
question="Who wrote "Trilogy of Knowledge"?" answerID="2"
correctAnswerID="1" answer="Saviour Machine"
expDate="1139634000"></entry><entry id="1103" triviaID="233"
question="Who wrote "Trilogy of Knowledge"?" answerID="3"
correctAnswerID="1" answer="Seventh Avenue"
expDate="1139634000"></entry><entry id="1104" triviaID="233"
question="Who wrote "Trilogy of Knowledge"?" answerID="4"
correctAnswerID="1" answer="Inevitable End"
expDate="1139634000"></entry><entry id="1105" triviaID="233"
question="Who wrote "Trilogy of Knowledge"?" answerID="5"
correctAnswerID="1" answer="No such song existed"
expDate="1139634000"></entry></trivia>
>
> Don't read the page; just copy the "xml2list" proc from that page into
> your script. Then, assuming that the XML is in a variable, call it like
> this:
>
> set list [xml2list $xml]
>
> When I test it with this xml:
>
> <trivia id="1" trivia_id="255" question="How much wood would a woodchuck
> chuck?" answer_id="1" answer="A lot" expDate="116494926"></trivia>
>
> I get this output:
>
> trivia {id 1 trivia_id 255 question {How much wood would a woodchuck
> chuck?} answer_id 1 answer {A lot} expDate 116494926} {}
>
> Does that solve your problem?
| |
| Bryan Oakley 2006-11-25, 7:02 pm |
| comp.lang.tcl wrote:
> Bryan Oakley wrote:
>
>
> Upon implementing xml2list proc as-is I get the following error:
>
> att's not paired: version 1.0 encoding utf-8 ? while executing "error
> "att's not paired: $rest"" (procedure "xml2list" line 30)
>
> It is failing for *all* of my XML files that I have on my site!
>
See another post I made for a working example. xml2list is likely
choking on the leading <?xml ...?> data. Strip that out and see if you
get any further.
This is precisely why we recommend using XML parsers. Parsing XML is not
trivial, and doing it with text processing is fraught with peril. There
are many hacks and workarounds that can *usually* parse *some* forms of
XML, but there will always be valid forms of XML which defy simple
parsing strategies.
(I understand you claim an inability to learn how to parse XML, so talk
about real XML parsers is only serving to frustrate you at this point. I
apologize; the above paragraph is more to drive the point home to other
less experienced programmers who may stumble on this thread in the future)
Forunately, your XML data looks pretty simple, and I'm even wondering if
you really need to "parse the xml" or if all you really need is to
pattern match on 'foo="bar"', ignoring all XML tags (which I think I did
in an earlier post).
| |
| comp.lang.tcl 2006-11-25, 7:02 pm |
|
Bryan Oakley wrote:
> comp.lang.tcl wrote:
>
> See another post I made for a working example. xml2list is likely
> choking on the leading <?xml ...?> data. Strip that out and see if you
> get any further.
>
No I'm not getting any further, even with the well-formed XML I showed
you, it is producing errors now on xml2list itself:
unmatched open quote in list while executing "lindex $item 0"
("default" arm line 2) invoked from within "switch -regexp -- $item {
^# {append res "{[lrange $item 0 end]} " ; #text item} ^/ { regexp
{/(.+)} $item -> ..." (procedure "xml2list" line 9)
> This is precisely why we recommend using XML parsers. Parsing XML is not
> trivial, and doing it with text processing is fraught with peril. There
> are many hacks and workarounds that can *usually* parse *some* forms of
> XML, but there will always be valid forms of XML which defy simple
> parsing strategies.
>
> (I understand you claim an inability to learn how to parse XML, so talk
> about real XML parsers is only serving to frustrate you at this point. I
> apologize; the above paragraph is more to drive the point home to other
> less experienced programmers who may stumble on this thread in the future)
>
> Forunately, your XML data looks pretty simple, and I'm even wondering if
> you really need to "parse the xml" or if all you really need is to
> pattern match on 'foo="bar"', ignoring all XML tags (which I think I did
> in an earlier post).
Do you think at this point I will ever figure this out?
| |
| Earl Greida 2006-11-25, 7:02 pm |
|
"Darren New" <dnew@san.rr.com> wrote in message
news:KE%9h.47070$si3.23846@tornado.socal.rr.com...
> comp.lang.tcl wrote:
>
> Been there, done that. Try this?
>
What are you attaching? As much as I "trust" the Tcl newsgroup I certainly
am not going to download and unzip an attachment.
| |
| Darren New 2006-11-25, 7:02 pm |
| Earl Greida wrote:
> What are you attaching? As much as I "trust" the Tcl newsgroup I certainly
> am not going to download and unzip an attachment.
It's a zip file with Tcl source code and test cases, with embedded
documentation, that parses XML (using TclXML) and which takes the result
and gives you a nested list, which you can then pull apart with various
functions. Basically "Xtremely Simple Xml Parser".
It's text files in a zip file. Unzip it.
--
Darren New / San Diego, CA, USA (PST)
Scruffitarianism - Where T-shirt, jeans,
and a three-day beard are "Sunday Best."
| |
| comp.lang.tcl 2006-11-25, 7:02 pm |
|
Darren New wrote:
> Earl Greida wrote:
>
> It's a zip file with Tcl source code and test cases, with embedded
> documentation, that parses XML (using TclXML) and which takes the result
> and gives you a nested list, which you can then pull apart with various
> functions. Basically "Xtremely Simple Xml Parser".
There is no zip file sorry I see nothing
>
> It's text files in a zip file. Unzip it.
>
> --
> Darren New / San Diego, CA, USA (PST)
> Scruffitarianism - Where T-shirt, jeans,
> and a three-day beard are "Sunday Best."
| |
| Mark Janssen 2006-11-25, 7:02 pm |
|
On Nov 25, 8:29 pm, "comp.lang.tcl" <phillip.s.pow...@gmail.com> wrote:[color=darkred]
> Bryan Oakley wrote:
>
>
>
Does this help:
package require tdom
# proc to get a list of attributes - values from a node
proc get_attr_list {node} {
set attr_list {}
foreach attr [$node attributes] {lappend attr_list $attr [$node
getAttribute $attr]}
return $attr_list
}
set xml { <?xml version="1.0" encoding="utf-8" ?><trivia><entry
id="1101"
triviaID="233" question="Who wrote "Trilogy of Knowledge"?"
answerID="1" correctAnswerID="1" answer="Believer"
expDate="1139634000"></entry><entry id="1102" triviaID="233"
question="Who wrote "Trilogy of Knowledge"?" answerID="2"
correctAnswerID="1" answer="Saviour Machine"
expDate="1139634000"></entry><entry id="1103" triviaID="233"
question="Who wrote "Trilogy of Knowledge"?" answerID="3"
correctAnswerID="1" answer="Seventh Avenue"
expDate="1139634000"></entry><entry id="1104" triviaID="233"
question="Who wrote "Trilogy of Knowledge"?" answerID="4"
correctAnswerID="1" answer="Inevitable End"
expDate="1139634000"></entry><entry id="1105" triviaID="233"
question="Who wrote "Trilogy of Knowledge"?" answerID="5"
correctAnswerID="1" answer="No such song existed"
expDate="1139634000"></entry></trivia> }
set doc [dom parse $xml doc]
foreach node [$doc getElementsByTagName entry] {
puts [get_attr_list $node]
}
And the output:
id 1101 triviaID 233 question {Who wrote "Trilogy of Knowledge"?}
answerID 1 correctAnswerID 1 answer Believer expDate 1139634000
id 1102 triviaID 233 question {Who wrote "Trilogy of Knowledge"?}
answerID 2 correctAnswerID 1 answer {Saviour Machine} expDate
1139634000
id 1103 triviaID 233 question {Who wrote "Trilogy of Knowledge"?}
answerID 3 correctAnswerID 1 answer {Seventh Avenue} expDate 1139634000
id 1104 triviaID 233 question {Who wrote "Trilogy of Knowledge"?}
answerID 4 correctAnswerID 1 answer {Inevitable End} expDate 1139634000
id 1105 triviaID 233 question {Who wrote "Trilogy of Knowledge"?}
answerID 5 correctAnswerID 1 answer {No such song existed} expDate
1139634000
Mark
| |
| comp.lang.tcl 2006-11-25, 7:02 pm |
|
Mark Janssen wrote:
> On Nov 25, 8:29 pm, "comp.lang.tcl" <phillip.s.pow...@gmail.com> wrote:
>
>
> Does this help:
>
> package require tdom
>
Produced the following error:
can't find package tdom while executing "package require tdom " (file
"xml_procs.tcl" line 1)
I don't know how to find that package or what do with one, sorry, help!
Phil
> # proc to get a list of attributes - values from a node
> proc get_attr_list {node} {
> set attr_list {}
> foreach attr [$node attributes] {lappend attr_list $attr [$node
> getAttribute $attr]}
> return $attr_list
> }
>
> set xml { <?xml version="1.0" encoding="utf-8" ?><trivia><entry
> id="1101"
> triviaID="233" question="Who wrote "Trilogy of Knowledge"?"
> answerID="1" correctAnswerID="1" answer="Believer"
> expDate="1139634000"></entry><entry id="1102" triviaID="233"
> question="Who wrote "Trilogy of Knowledge"?" answerID="2"
> correctAnswerID="1" answer="Saviour Machine"
> expDate="1139634000"></entry><entry id="1103" triviaID="233"
> question="Who wrote "Trilogy of Knowledge"?" answerID="3"
> correctAnswerID="1" answer="Seventh Avenue"
> expDate="1139634000"></entry><entry id="1104" triviaID="233"
> question="Who wrote "Trilogy of Knowledge"?" answerID="4"
> correctAnswerID="1" answer="Inevitable End"
> expDate="1139634000"></entry><entry id="1105" triviaID="233"
> question="Who wrote "Trilogy of Knowledge"?" answerID="5"
> correctAnswerID="1" answer="No such song existed"
> expDate="1139634000"></entry></trivia> }
>
> set doc [dom parse $xml doc]
>
> foreach node [$doc getElementsByTagName entry] {
> puts [get_attr_list $node]
> }
>
> And the output:
>
> id 1101 triviaID 233 question {Who wrote "Trilogy of Knowledge"?}
> answerID 1 correctAnswerID 1 answer Believer expDate 1139634000
> id 1102 triviaID 233 question {Who wrote "Trilogy of Knowledge"?}
> answerID 2 correctAnswerID 1 answer {Saviour Machine} expDate
> 1139634000
> id 1103 triviaID 233 question {Who wrote "Trilogy of Knowledge"?}
> answerID 3 correctAnswerID 1 answer {Seventh Avenue} expDate 1139634000
> id 1104 triviaID 233 question {Who wrote "Trilogy of Knowledge"?}
> answerID 4 correctAnswerID 1 answer {Inevitable End} expDate 1139634000
> id 1105 triviaID 233 question {Who wrote "Trilogy of Knowledge"?}
> answerID 5 correctAnswerID 1 answer {No such song existed} expDate
> 1139634000
>
>
> Mark
| |
| Gerald W. Lester 2006-11-25, 7:02 pm |
| The following converts your XML to a list, I also posted it on another
thread. It took 5 minutes to write:
##
## A node will be the following list of name value pairs:
## NAME nodeName
## TEXT text
## ATTRIBUTES attributeNameValueList
## CHILDREN childNodeList
##
package require tdom
set xml {<?xml version="1.0" encoding="utf-8" ?>
<trivia>
<entry id="1101" triviaID="233" question="Who wrote "Trilogy
of Knowledge"?" answerID="1" correctAnswerID="1" answer="Believer"
expDate="1139634000"></entry>
<entry id="1102" triviaID="233" question="Who wrote "Trilogy
of Knowledge"?" answerID="2" correctAnswerID="1" answer="Saviour
Machine" expDate="1139634000"> </entry>
<entry id="1103" triviaID="233" question="Who wrote "Trilogy
of Knowledge"?" answerID="3" correctAnswerID="1" answer="Seventh
Avenue" expDate="1139634000"></entry>
<entry id="1104" triviaID="233" question="Who wrote "Trilogy
of Knowledge"?" answerID="4" correctAnswerID="1" answer="Inevitable
End" expDate="1139634000"></entry>
<entry id="1105" triviaID="233" question="Who wrote "Trilogy
of Knowledge"?" answerID="5" correctAnswerID="1" answer="No such song
existed" expDate="1139634000"></entry>
</trivia>
}
##
## Convert a node to a list
##
proc NodeToList {node} {
##
## Get the name and text value of the node
##
set name [$node nodeName]
set text [$node text]
##
## Get the attributes of the node
##
set attrList {}
foreach attribute [$node attributes] {
lappend attrList $attribute [$node getAttribute $attribute]
}
##
## Get the children of the node
##
set childrenList {}
foreach child [$node childNodes] {
if {![string equal [$child nodeType] TEXT_NODE]} then {
lappend childrenList [NodeToList $child]
}
}
##
## All done so return the list representing this subtree
##
return [list NAME $name TEXT $text ATTRIBUTES $attrList CHILDREN
$childrenList]
}
##
## Convert the XML to a DOM tree
##
dom parse $xml doc
##
## No get the root element
##
$doc documentElement root
##
## Convert the tree to a list
##
set results [NodeToList $root]
$doc delete
--
+--------------------------------+---------------------------------------+
| Gerald W. Lester |
|"The man who fights for his ideals is the man who is alive." - Cervantes|
+------------------------------------------------------------------------+
| |
| Gerald W. Lester 2006-11-25, 7:02 pm |
| comp.lang.tcl wrote:
> Mark Janssen wrote:
>
> Produced the following error:
> can't find package tdom while executing "package require tdom " (file
> "xml_procs.tcl" line 1)
>
> I don't know how to find that package or what do with one, sorry, help!
> ...
What OS are you on?
Who installed Tcl for you?
--
+--------------------------------+---------------------------------------+
| Gerald W. Lester |
|"The man who fights for his ideals is the man who is alive." - Cervantes|
+------------------------------------------------------------------------+
| |
| comp.lang.tcl 2006-11-25, 7:02 pm |
|
Gerald W. Lester wrote:
> The following converts your XML to a list, I also posted it on another
> thread. It took 5 minutes to write:
>
> ##
> ## A node will be the following list of name value pairs:
> ## NAME nodeName
> ## TEXT text
> ## ATTRIBUTES attributeNameValueList
> ## CHILDREN childNodeList
> ##
> package require tdom
>
You already lost me. "package require tdom" = HUH?
Phil
| |
| Mark Janssen 2006-11-25, 7:02 pm |
|
On Nov 25, 9:26 pm, "comp.lang.tcl" <phillip.s.pow...@gmail.com> wrote:
> Mark Janssen wrote:
>
>
>
>
>
> can't find package tdom while executing "package require tdom " (file
> "xml_procs.tcl" line 1)
>
> I don't know how to find that package or what do with one, sorry, help!
>
> Phil
>
This means your Tcl installation doesn't include tdom. You can get tdom
in several ways.
The easiest solution would be to install ActiveTcl
[http://www.activestate.com/Products/ActiveTcl/?tn=1] which includes
tdom TclXML and a lot of other stuff you might need. The other solution
would be to install tdom from www.tdom.org
http://www.tdom.org/#SECTid80ac508
Mark
[color=darkred]
>
>
>
>
>
>
| |
| Darren New 2006-11-25, 7:02 pm |
| comp.lang.tcl wrote:
> There is no zip file sorry I see nothing
Then your ISP is blocking attachments or something.
--
Darren New / San Diego, CA, USA (PST)
Scruffitarianism - Where T-shirt, jeans,
and a three-day beard are "Sunday Best."
| |
| Michael A. Cleverly 2006-11-25, 10:05 pm |
| On Sat, 25 Nov 2006, comp.lang.tcl wrote:
> Gerald W. Lester wrote:
>
> You already lost me. "package require tdom" = HUH?
tdom is an XML parsing extension for Tcl. It's home is at
http://www.tdom.org and there are lots of pages on the Tcl'ers wiki that
use it when dealing with XML data.
Compared to TclXML it is much simpler to build and install (in my
experience). It is definitely a worthy investment time-wise to learn how
to use it.
However, I understand from other messages in this thread that you have ADD
and just want to get the job done as quickly as possible. I gather that
suggestions that involve using code outside of vanilla Tcl do not qualify
as sufficiently quick due to the time it would take to download and
install and understand these packages.
Here is some plain-vanilla Tcl code that should meet your parsing needs.
I've tried to comment it heavily, including the regular expressions it
uses (expanded regular expression syntax is our friend in this regard).
That said, I echo the advice that so many others in this thread have given
you: for dealing with XML data it behooves you to use a real XML parser.
But if this works to solve your immediate problems then great! Once you
have this off your plate perhaps you can come back and we can help you
understand XML parsing itself with more leisure...
#!/bin/sh
#\
exec tclsh "$0" ${1+"$@"}
# Tcl 8.0 and earlier did not support expanded regexp syntax which we
# use in the code below
package require Tcl 8.1
proc get-all-xml-attributes {xml} {
set all_attributes [list]
# Match one tag. This will ignore tags that are commented out or
# literal text that looks like a tag within a <![CDATA[ ... ]]> section
set RE(tag) {<([^<>]*)>}
set RE(name-attribs) {(?x)^ # This is an expanded regexp w/comments
(\S+) # non-whitespace chars (i.e., the tag name)
\s* # maybe followed by some white-space
(\S.*)? # everything else (i.e., the attributes)
$}
set RE(next-attrib) {(?x)^ # This is another expanded regexp w/comments
(\S+) # attribute name
\s*=\s* # equals
(["'"].+) # everything else (attr val + other attr(s))
$}
# One version for each of the two possible quoting conventions--single
# quotes (which could contain double quotes), or double quotes (which
# could contain single quotes). In both cases the second set of
# capturing parenthesis will get the rest of the remaining attribute
# data (if any remains)
set RE(single-quote) {^'([^'']*)'(.*)$}
set RE(double-quote) {^"([^""]*)"(.*)$}
# Iterate over each tag in the XML provided
foreach {whole_tag contents} [regexp -inline -all -- $RE(tag) $xml] {
# Start with an empty list of attributes for this tag
set attributes_this_tag [list]
# Trim off any extraneous whitespace to make life easier
set contents [string trim $contents]
# Ignore a completely empty tag (which would be invalid xml
# to begin with)
if {[string length $contents] == 0} then continue
# Ignore closing tags; they aren't supposed to have attributes
if {[string index $contents 0] == "/"} then continue
# Ignore processing instructions; they don't have attributes
if {[string index $contents 0] == "?"} then continue
# Ignore comments and CDATA tags; they don't have attributes
if {[string index $contents 0] == "!"} then continue
# Separate out the tag name and the data of the attribute(s)
regexp -- $RE(name-attribs) $contents => tag_name data
# If the string length of data is zero then there were no attributes
if {[string length $data] == 0} then continue
# Now we will get the name of an attribute (key), see what
# type of quoting is used (single or double), then get the value
# of the attribute (val), and save the rest of the data (additional
# key/value pair(s)) for further processing the next time we go
# through the while loop.
#
# Processing ends when we run out of key/value pairs (data is
# exhausted and our regexp fails to match any more) or when
# we encounter an attribute that is improperly quoted (i.e.,
# no closing single or double quote) which is definitely invalid xml.
while {[regexp -- $RE(next-attrib) $data => key data]} {
# Which type of quoting was used, single or double?
if {[string match '* $data]} then {
set quote_type single-quote
} else {
set quote_type double-quote
}
# There should be a corresponding close $quote_type; between
# the opening & closing quote will be the value of this attrib
# if there is no closing quote of the appropriate type then
# this is invalid xml and we ignore any further processing
# of attributes for this tag
if {![regexp -- $RE($quote_type) $data => val data]} then break
# We now know a key/val attribute pair; add it to the list
# we are accumulating for this tag
lappend attributes_this_tag $key $val
# Trim off leading whitespace that separated this key/val
# attribute pair from any that follow it
set data [string trimleft $data]
}
# Did we find any attributes for this tag?
if {[llength $attributes_this_tag]} then {
# If so, append to the overall list-of-lists we're accumulating
# for the entire XML document
lappend all_attributes $attributes_this_tag
}
}
# Return the list-of-lists of key/val attribute pairs that were found
return $all_attributes
}
# The sample XML included in an earlier post in this thread
set xml { <?xml version="1.0" encoding="utf-8" ?><trivia><entry
id="1101"
triviaID="233" question="Who wrote "Trilogy of Knowledge"?"
answerID="1" correctAnswerID="1" answer="Believer"
expDate="1139634000"></entry><entry id="1102" triviaID="233"
question="Who wrote "Trilogy of Knowledge"?" answerID="2"
correctAnswerID="1" answer="Saviour Machine"
expDate="1139634000"></entry><entry id="1103" triviaID="233"
question="Who wrote "Trilogy of Knowledge"?" answerID="3"
correctAnswerID="1" answer="Seventh Avenue"
expDate="1139634000"></entry><entry id="1104" triviaID="233"
question="Who wrote "Trilogy of Knowledge"?" answerID="4"
correctAnswerID="1" answer="Inevitable End"
expDate="1139634000"></entry><entry id="1105" triviaID="233"
question="Who wrote "Trilogy of Knowledge"?" answerID="5"
correctAnswerID="1" answer="No such song existed"
expDate="1139634000"></entry></trivia> }
# Find the attributes and print them out
foreach set_of_attributes [get-all-xml-attributes $xml] {
puts $set_of_attributes
foreach {key val} $set_of_attributes {
puts " $key = $val"
}
}
Michael
| |
| comp.lang.tcl 2006-11-25, 10:05 pm |
|
Michael A. Cleverly wrote:
> On Sat, 25 Nov 2006, comp.lang.tcl wrote:
>
>
> tdom is an XML parsing extension for Tcl. It's home is at
> http://www.tdom.org and there are lots of pages on the Tcl'ers wiki that
> use it when dealing with XML data.
>
> Compared to TclXML it is much simpler to build and install (in my
> experience). It is definitely a worthy investment time-wise to learn how
> to use it.
>
> However, I understand from other messages in this thread that you have ADD
> and just want to get the job done as quickly as possible. I gather that
> suggestions that involve using code outside of vanilla Tcl do not qualify
> as sufficiently quick due to the time it would take to download and
> install and understand these packages.
>
> Here is some plain-vanilla Tcl code that should meet your parsing needs.
> I've tried to comment it heavily, including the regular expressions it
> uses (expanded regular expression syntax is our friend in this regard).
> That said, I echo the advice that so many others in this thread have given
> you: for dealing with XML data it behooves you to use a real XML parser.
> But if this works to solve your immediate problems then great! Once you
> have this off your plate perhaps you can come back and we can help you
> understand XML parsing itself with more leisure...
>
> #!/bin/sh
> #\
> exec tclsh "$0" ${1+"$@"}
I got an error message right at this line:
can't read "0": no such variable while executing "exec tclsh "$0"
${1+"$@"}" (file "xml_procs.tcl" line 1)
Still studying the rest of the proc, it uses TCL code I've never seen
before!
Phil
> # Tcl 8.0 and earlier did not support expanded regexp syntax which we
> # use in the code below
> package require Tcl 8.1
>
> proc get-all-xml-attributes {xml} {
> set all_attributes [list]
>
> # Match one tag. This will ignore tags that are commented out or
> # literal text that looks like a tag within a <![CDATA[ ... ]]> section
> set RE(tag) {<([^<>]*)>}
>
> set RE(name-attribs) {(?x)^ # This is an expanded regexp w/comments
> (\S+) # non-whitespace chars (i.e., the tag name)
> \s* # maybe followed by some white-space
> (\S.*)? # everything else (i.e., the attributes)
> $}
>
> set RE(next-attrib) {(?x)^ # This is another expanded regexp w/comments
> (\S+) # attribute name
> \s*=\s* # equals
> (["'"].+) # everything else (attr val + other attr(s))
> $}
>
> # One version for each of the two possible quoting conventions--single
> # quotes (which could contain double quotes), or double quotes (which
> # could contain single quotes). In both cases the second set of
> # capturing parenthesis will get the rest of the remaining attribute
> # data (if any remains)
> set RE(single-quote) {^'([^'']*)'(.*)$}
> set RE(double-quote) {^"([^""]*)"(.*)$}
>
>
> # Iterate over each tag in the XML provided
> foreach {whole_tag contents} [regexp -inline -all -- $RE(tag) $xml] {
> # Start with an empty list of attributes for this tag
> set attributes_this_tag [list]
>
> # Trim off any extraneous whitespace to make life easier
> set contents [string trim $contents]
>
> # Ignore a completely empty tag (which would be invalid xml
> # to begin with)
> if {[string length $contents] == 0} then continue
>
> # Ignore closing tags; they aren't supposed to have attributes
> if {[string index $contents 0] == "/"} then continue
>
> # Ignore processing instructions; they don't have attributes
> if {[string index $contents 0] == "?"} then continue
>
> # Ignore comments and CDATA tags; they don't have attributes
> if {[string index $contents 0] == "!"} then continue
>
> # Separate out the tag name and the data of the attribute(s)
> regexp -- $RE(name-attribs) $contents => tag_name data
>
> # If the string length of data is zero then there were no attributes
> if {[string length $data] == 0} then continue
>
> # Now we will get the name of an attribute (key), see what
> # type of quoting is used (single or double), then get the value
> # of the attribute (val), and save the rest of the data (additional
> # key/value pair(s)) for further processing the next time we go
> # through the while loop.
> #
> # Processing ends when we run out of key/value pairs (data is
> # exhausted and our regexp fails to match any more) or when
> # we encounter an attribute that is improperly quoted (i.e.,
> # no closing single or double quote) which is definitely invalid xml.
> while {[regexp -- $RE(next-attrib) $data => key data]} {
> # Which type of quoting was used, single or double?
> if {[string match '* $data]} then {
> set quote_type single-quote
> } else {
> set quote_type double-quote
> }
>
> # There should be a corresponding close $quote_type; between
> # the opening & closing quote will be the value of this attrib
> # if there is no closing quote of the appropriate type then
> # this is invalid xml and we ignore any further processing
> # of attributes for this tag
> if {![regexp -- $RE($quote_type) $data => val data]} then break
>
> # We now know a key/val attribute pair; add it to the list
> # we are accumulating for this tag
> lappend attributes_this_tag $key $val
>
> # Trim off leading whitespace that separated this key/val
> # attribute pair from any that follow it
> set data [string trimleft $data]
> }
>
> # Did we find any attributes for this tag?
> if {[llength $attributes_this_tag]} then {
> # If so, append to the overall list-of-lists we're accumulating
> # for the entire XML document
> lappend all_attributes $attributes_this_tag
> }
> }
>
> # Return the list-of-lists of key/val attribute pairs that were found
> return $all_attributes
> }
>
>
> # The sample XML included in an earlier post in this thread
> set xml { <?xml version="1.0" encoding="utf-8" ?><trivia><entry
> id="1101"
> triviaID="233" question="Who wrote "Trilogy of Knowledge"?"
> answerID="1" correctAnswerID="1" answer="Believer"
> expDate="1139634000"></entry><entry id="1102" triviaID="233"
> question="Who wrote "Trilogy of Knowledge"?" answerID="2"
> correctAnswerID="1" answer="Saviour Machine"
> expDate="1139634000"></entry><entry id="1103" triviaID="233"
> question="Who wrote "Trilogy of Knowledge"?" answerID="3"
> correctAnswerID="1" answer="Seventh Avenue"
> expDate="1139634000"></entry><entry id="1104" triviaID="233"
> question="Who wrote "Trilogy of Knowledge"?" answerID="4"
> correctAnswerID="1" answer="Inevitable End"
> expDate="1139634000"></entry><entry id="1105" triviaID="233"
> question="Who wrote "Trilogy of Knowledge"?" answerID="5"
> correctAnswerID="1" answer="No such song existed"
> expDate="1139634000"></entry></trivia> }
>
>
> # Find the attributes and print them out
> foreach set_of_attributes [get-all-xml-attributes $xml] {
> puts $set_of_attributes
> foreach {key val} $set_of_attributes {
> puts " $key = $val"
> }
> }
>
> Michael
| |
| Michael A. Cleverly 2006-11-25, 10:05 pm |
| On Sat, 25 Nov 2006, comp.lang.tcl wrote:
> Michael A. Cleverly wrote:
>
> I got an error message right at this line:
>
> can't read "0": no such variable while executing "exec tclsh "$0"
> ${1+"$@"}" (file "xml_procs.tcl" line 1)
For Tcl to see that third line means that there was somehow (either
through copy & pasting or my newsreader, or your newsreader, or saving
from Windows and uploading to Unix--due to different line endings)
whitespace added after the trainling backslash. Remove it (or those three
lines entirely) and the script should run.
The history of this idiom for avoiding the need to hardcode the path to
tclsh into a script on Unix can be found part way down on:
http://wiki.tcl.tk/812
> Still studying the rest of the proc, it uses TCL code I've never seen
> before!
Once you remove those lines (which merely serve to launch tclsh and run
the script) I'm quite curious to know if the code meets your needs.
> Phil
Michael
| |
| comp.lang.tcl 2006-11-25, 10:05 pm |
|
Michael A. Cleverly wrote:
> On Sat, 25 Nov 2006, comp.lang.tcl wrote:
>
>
> For Tcl to see that third line means that there was somehow (either
> through copy & pasting or my newsreader, or your newsreader, or saving
> from Windows and uploading to Unix--due to different line endings)
> whitespace added after the trainling backslash. Remove it (or those three
> lines entirely) and the script should run.
>
> The history of this idiom for avoiding the need to hardcode the path to
> tclsh into a script on Unix can be found part way down on:
> http://wiki.tcl.tk/812
>
>
> Once you remove those lines (which merely serve to launch tclsh and run
> the script) I'm quite curious to know if the code meets your needs.
>
Me too, but I can't figure out if it works because of this:
[TCL]
proc XML_GET_ALL_ELEMENT_ATTRS {fileName {parseString {}} {switch {}}}
{
# NEW 11/25/2006: WILL BE PHASING OUT $parseString IN FAVOR OF
EXTERNAL XML PARSING METHODOLOGY WHICH DOES NOT NEED $parseString
set cannotOpenFile [catch {
set fileID [open ${fileName}.xml r]
fconfigure $fileID -buffering full -buffersize 32768
} cannotOpenFileErrMsg]
if {$cannotOpenFile} { puts $cannotOpenFileErrMsg; return {} }; #
CANNOT OPEN THE FILE THUS RETURN AN EMPTY LIST
set contents [read $fileID [file size ${fileName}.xml]]; close $fileID
set contents [string trim $contents]; # LOB OFF ANY EXTRANEOUS
WHITESPACE
regsub -all {[\n\r\s\t]+$} $contents {} contents; # LOB OFF ANY
EXTRANEOUS SPACES, LINE FEEDS OR CARRIAGE RETURNS
# USE get-all-xml-attributes PROC WITHIN THIS LIBRARY AS YOUR DEFAULT
MEANS OF PARSING XML INTO TCL LIST
if {![string equal $switch -body] && [string length [info procs
{get-all-xml-attributes}]] > 0} {
set cannotUseProc [catch {return [get-all-xml-attributes
$contents]}}]
}
# USE xml2list PROC WITHIN THIS LIBRARY AS YOUR DEFAULT MEANS OF
PARSING XML INTO TCL LIST
if {![string equal $switch -body] && [info exists cannotUseProc] &&
[string length [info procs {xml2list}]] > 0} {
regexp {<\?.*?\?>(.*$)} $contents "\\1" contents
set cannotUseProc [catch {return [xml2list $contents]}]
}
}
[/TCL]
Producing this:
can't read "switch": no such variable while executing "string equal
$switch -body" (file "xml_procs.tcl" line 1)
>
> Michael
| |
| Michael A. Cleverly 2006-11-25, 10:05 pm |
| On Sat, 25 Nov 2006, comp.lang.tcl wrote:
>
> Me too, but I can't figure out if it works because of this:
>
> [TCL]
[ ~2 dozen lines of other code elided ]
> Producing this:
>
> can't read "switch": no such variable while executing "string equal
> $switch -body" (file "xml_procs.tcl" line 1)
It looks like there is a problem in your code before you ever get to the
point where you'd call the proc I wrote for you. (My code doesn't contain
[string equal] or a variable named $switch.)
I do intend to study the code you posted (that I elided above) to see if I
can find the error.
However, it would be helpful to begin at the beginning and FIRST confirm
that you can run the script I provided by itself BEFORE we work on
integrating it with your code base.
When I run the script I included in my first reply, I get the following
output--please compare with what you get on your system:
id 1101 triviaID 233 question {Who wrote "Trilogy of
Knowledge"?} answerID 1 correctAnswerID 1 answer Believer expDate
1139634000
id = 1101
triviaID = 233
question = Who wrote "Trilogy of Knowledge"?
answerID = 1
correctAnswerID = 1
answer = Believer
expDate = 1139634000
id 1102 triviaID 233 question {Who wrote "Trilogy of
Knowledge"?} answerID 2 correctAnswerID 1 answer {Saviour Machine}
expDate 1139634000
id = 1102
triviaID = 233
question = Who wrote "Trilogy of Knowledge"?
answerID = 2
correctAnswerID = 1
answer = Saviour Machine
expDate = 1139634000
id 1103 triviaID 233 question {Who wrote "Trilogy of
Knowledge"?} answerID 3 correctAnswerID 1 answer {Seventh Avenue}
expDate 1139634000
id = 1103
triviaID = 233
question = Who wrote "Trilogy of Knowledge"?
answerID = 3
correctAnswerID = 1
answer = Seventh Avenue
expDate = 1139634000
id 1104 triviaID 233 question {Who wrote "Trilogy of
Knowledge"?} answerID 4 correctAnswerID 1 answer {Inevitable End}
expDate 1139634000
id = 1104
triviaID = 233
question = Who wrote "Trilogy of Knowledge"?
answerID = 4
correctAnswerID = 1
answer = Inevitable End
expDate = 1139634000
id 1105 triviaID 233 question {Who wrote "Trilogy of
Knowledge"?} answerID 5 correctAnswerID 1 answer {No such song
existed} expDate 1139634000
id = 1105
triviaID = 233
question = Who wrote "Trilogy of Knowledge"?
answerID = 5
correctAnswerID = 1
answer = No such song existed
expDate = 1139634000
Michael
| |
| comp.lang.tcl 2006-11-25, 10:05 pm |
|
Michael A. Cleverly wrote:
> On Sat, 25 Nov 2006, comp.lang.tcl wrote:
>
>
> [ ~2 dozen lines of other code elided ]
>
>
> It looks like there is a problem in your code before you ever get to the
> point where you'd call the proc I wrote for you. (My code doesn't contain
> [string equal] or a variable named $switch.)
>
> I do intend to study the code you posted (that I elided above) to see if I
> can find the error.
>
> However, it would be helpful to begin at the beginning and FIRST confirm
> that you can run the script I provided by itself BEFORE we work on
> integrating it with your code base.
>
> When I run the script I included in my first reply, I get the following
> output--please compare with what you get on your system:
>
> id 1101 triviaID 233 question {Who wrote "Trilogy of
> Knowledge"?} answerID 1 correctAnswerID 1 answer Believer expDate
> 1139634000
> id = 1101
> triviaID = 233
> question = Who wrote "Trilogy of Knowledge"?
> answerID = 1
> correctAnswerID = 1
> answer = Believer
> expDate = 1139634000
> id 1102 triviaID 233 question {Who wrote "Trilogy of
> Knowledge"?} answerID 2 correctAnswerID 1 answer {Saviour Machine}
> expDate 1139634000
> id = 1102
> triviaID = 233
> question = Who wrote "Trilogy of Knowledge"?
> answerID = 2
> correctAnswerID = 1
> answer = Saviour Machine
> expDate = 1139634000
> id 1103 triviaID 233 question {Who wrote "Trilogy of
> Knowledge"?} answerID 3 correctAnswerID 1 answer {Seventh Avenue}
> expDate 1139634000
> id = 1103
> triviaID = 233
> question = Who wrote "Trilogy of Knowledge"?
> answerID = 3
> correctAnswerID = 1
> answer = Seventh Avenue
> expDate = 1139634000
> id 1104 triviaID 233 question {Who wrote "Trilogy of
> Knowledge"?} answerID 4 correctAnswerID 1 answer {Inevitable End}
> expDate 1139634000
> id = 1104
> triviaID = 233
> question = Who wrote "Trilogy of Knowledge"?
> answerID = 4
> correctAnswerID = 1
> answer = Inevitable End
> expDate = 1139634000
> id 1105 triviaID 233 question {Who wrote "Trilogy of
> Knowledge"?} answerID 5 correctAnswerID 1 answer {No such song
> existed} expDate 1139634000
> id = 1105
> triviaID = 233
> question = Who wrote "Trilogy of Knowledge"?
> answerID = 5
> correctAnswerID = 1
> answer = No such song existed
> expDate = 1139634000
>
> Michael
Thanx, however, there is a problem:
puts [llength [get-all-xml-attributes $contents]]; # I GET 0 EVEN
THOUGH I SEE A LIST IN FRONT OF ME
Phil
| |
| comp.lang.tcl 2006-11-25, 10:05 pm |
|
Michael A. Cleverly wrote:
> On Sat, 25 Nov 2006, comp.lang.tcl wrote:
>
>
> [ ~2 dozen lines of other code elided ]
>
>
> It looks like there is a problem in your code before you ever get to the
> point where you'd call the proc I wrote for you. (My code doesn't contain
> [string equal] or a variable named $switch.)
>
> I do intend to study the code you posted (that I elided above) to see if I
> can find the error.
>
> However, it would be helpful to begin at the beginning and FIRST confirm
> that you can run the script I provided by itself BEFORE we work on
> integrating it with your code base.
>
> When I run the script I included in my first reply, I get the following
> output--please compare with what you get on your system:
>
> id 1101 triviaID 233 question {Who wrote "Trilogy of
> Knowledge"?} answerID 1 correctAnswerID 1 answer Believer expDate
> 1139634000
> id = 1101
> triviaID = 233
> question = Who wrote "Trilogy of Knowledge"?
> answerID = 1
> correctAnswerID = 1
> answer = Believer
> expDate = 1139634000
> id 1102 triviaID 233 question {Who wrote "Trilogy of
> Knowledge"?} answerID 2 correctAnswerID 1 answer {Saviour Machine}
> expDate 1139634000
> id = 1102
> triviaID = 233
> question = Who wrote "Trilogy of Knowledge"?
> answerID = 2
> correctAnswerID = 1
> answer = Saviour Machine
> expDate = 1139634000
> id 1103 triviaID 233 question {Who wrote "Trilogy of
> Knowledge"?} answerID 3 correctAnswerID 1 answer {Seventh Avenue}
> expDate 1139634000
> id = 1103
> triviaID = 233
> question = Who wrote "Trilogy of Knowledge"?
> answerID = 3
> correctAnswerID = 1
> answer = Seventh Avenue
> expDate = 1139634000
> id 1104 triviaID 233 question {Who wrote "Trilogy of
> Knowledge"?} answerID 4 correctAnswerID 1 answer {Inevitable End}
> expDate 1139634000
> id = 1104
> triviaID = 233
> question = Who wrote "Trilogy of Knowledge"?
> answerID = 4
> correctAnswerID = 1
> answer = Inevitable End
> expDate = 1139634000
> id 1105 triviaID 233 question {Who wrote "Trilogy of
> Knowledge"?} answerID 5 correctAnswerID 1 answer {No such song
> existed} expDate 1139634000
> id = 1105
> triviaID = 233
> question = Who wrote "Trilogy of Knowledge"?
> answerID = 5
> correctAnswerID = 1
> answer = No such song existed
> expDate = 1139634000
>
> Michael
OK an update
[TCL]
# USE GET_ALL_XML_ATTRIBUTES PROC WITHIN THIS LIBRARY AS YOUR DEFAULT
MEANS OF PARSING XML INTO TCL LIST
if {[string length [info procs {GET_ALL_XML_ATTRIBUTES}]] > 0 &&
![string equal $switch {-body}]} {
set cannotUseProc [catch {return [GET_ALL_XML_ATTRIBUTES $contents]}
contentsList]
if {$cannotUseProc && [info exists contentsList] && [IS_LIST
$contentsList]} {
for {set i 0} {$i < [llength $contentsList]} {incr i} { append
newContentsList [lindex $contentsList $i] }
if {[info exists newContentsList]} { return [split $newContentsList]
} else { return $contentsList }
}
}
[/TCL]
Apparently what happens is that GET_ALL_XML_ATTRIBUTES (your proc I
rewrote it to fit coding standards here) returns a list like this:
{id 1 author {Phil Powell}} {id 2 author {Joe Blow}}
When in fact it should be
id 1 author {Phil Powell} id 2 author {Joe Blow}
And I tried to manhandle it to the correct format, to no avail, it
produces a very maligned list that none of my other procs can use :(
But I'm closer than I was before! Thanx
Phil
| |
| Michael A. Cleverly 2006-11-26, 4:21 am |
| On Sat, 25 Nov 2006, comp.lang.tcl wrote:
> Apparently what happens is that GET_ALL_XML_ATTRIBUTES (your proc I
> rewrote it to fit coding standards here) returns a list like this:
>
> {id 1 author {Phil Powell}} {id 2 author {Joe Blow}}
>
> When in fact it should be
>
> id 1 author {Phil Powell} id 2 author {Joe Blow}
>
> And I tried to manhandle it to the correct format, to no avail, it
> produces a very maligned list that none of my other procs can use :(
>
> But I'm closer than I was before! Thanx
The proc I wrote returns a list of lists. I now see that the version you
want should return one long list of key value pairs instead of a list
whose elements are each lists of key value pairs.
Try this instead. (I've renamed the proc to be GET_ALL_XML_ATTRIBUTES; is
your local coding standard that certain types of procs be named in
ALL_CAPS?)
proc GET_ALL_XML_ATTRIBUTES {xml} {
set results [list]
# Match one tag. This will ignore tags that are commented out or
# literal text that looks like a tag within a <![CDATA[ ... ]]> section
set RE(tag) {<([^<>]*)>}
set RE(name-attribs) {(?x)^ # This is an expanded regexp w/comments
(\S+) # non-whitespace chars (i.e., the tag name)
\s* # maybe followed by some white-space
(\S.*)? # everything else (i.e., the attributes)
$}
set RE(next-attrib) {(?x)^ # This is another expanded regexp w/comments
(\S+) # attribute name
\s*=\s* # equals
(["'"].+) # everything else (attr val + other attr(s))
$}
# One version for each of the two possible quoting conventions--single
# quotes (which could contain double quotes), or double quotes (which
# could contain single quotes). In both cases the second set of
# capturing parenthesis will get the rest of the remaining attribute
# data (if any remains)
set RE(single-quote) {^'([^'']*)'(.*)$}
set RE(double-quote) {^"([^""]*)"(.*)$}
# Iterate over each tag in the XML provided
foreach {whole_tag contents} [regexp -inline -all -- $RE(tag) $xml] {
# Trim off any extraneous whitespace to make life easier
set contents [string trim $contents]
# Ignore a completely empty tag (which would be invalid xml
# to begin with)
if {[string length $contents] == 0} then continue
# Ignore closing tags; they aren't supposed to have attributes
if {[string index $contents 0] == "/"} then continue
# Ignore processing instructions; they don't have attributes
if {[string index $contents 0] == "?"} then continue
# Ignore comments and CDATA tags; they don't have attributes
if {[string index $contents 0] == "!"} then continue
# Separate out the tag name and the data of the attribute(s)
regexp -- $RE(name-attribs) $contents => tag_name data
# If the string length of data is zero then there were no attributes
if {[string length $data] == 0} then continue
# Now we will get the name of an attribute (key), see what
# type of quoting is used (single or double), then get the value
# of the attribute (val), and save the rest of the data (additional
# key/value pair(s)) for further processing the next time we go
# through the while loop.
#
# Processing ends when we run out of key/value pairs (data is
# exhausted and our regexp fails to match any more) or when
# we encounter an attribute that is improperly quoted (i.e.,
# no closing single or double quote) which is definitely invalid xml.
while {[regexp -- $RE(next-attrib) $data => key data]} {
# Which type of quoting was used, single or double?
if {[string match '* $data]} then {
set quote_type single-quote
} else {
set quote_type double-quote
}
# There should be a corresponding close $quote_type; between
# the opening & closing quote will be the value of this attrib
# if there is no closing quote of the appropriate type then
# this is invalid xml and we ignore any further processing
# of attributes for this tag
if {![regexp -- $RE($quote_type) $data => val data]} then break
# We now know a key/val attribute pair; add it our results
lappend results $key $val
# Trim off trailing whitespace that separated this key/val
# attribute pair from any that follow it
set data [string trimleft $data]
}
}
# Return the list of key/val attribute pairs that were found
return $results
}
Michael
| |
| comp.lang.tcl 2006-11-26, 4:21 am |
|
Michael A. Cleverly wrote:
> On Sat, 25 Nov 2006, comp.lang.tcl wrote:
>
>
> The proc I wrote returns a list of lists. I now see that the version you
> want should return one long list of key value pairs instead of a list
> whose elements are each lists of key value pairs.
>
> Try this instead. (I've renamed the proc to be GET_ALL_XML_ATTRIBUTES; is
> your local coding standard that certain types of procs be named in
> ALL_CAPS?)
>
> proc GET_ALL_XML_ATTRIBUTES {xml} {
> set results [list]
>
> # Match one tag. This will ignore tags that are commented out or
> # literal text that looks like a tag within a <![CDATA[ ... ]]> section
> set RE(tag) {<([^<>]*)>}
>
> set RE(name-attribs) {(?x)^ # This is an expanded regexp w/comments
> (\S+) # non-whitespace chars (i.e., the tag name)
> \s* # maybe followed by some white-space
> (\S.*)? # everything else (i.e., the attributes)
> $}
>
> set RE(next-attrib) {(?x)^ # This is another expanded regexp w/comments
> (\S+) # attribute name
> \s*=\s* # equals
> (["'"].+) # everything else (attr val + other attr(s))
> $}
>
> # One version for each of the two possible quoting conventions--single
> # quotes (which could contain double quotes), or double quotes (which
> # could contain single quotes). In both cases the second set of
> # capturing parenthesis will get the rest of the remaining attribute
> # data (if any remains)
> set RE(single-quote) {^'([^'']*)'(.*)$}
> set RE(double-quote) {^"([^""]*)"(.*)$}
>
>
> # Iterate over each tag in the XML provided
> foreach {whole_tag contents} [regexp -inline -all -- $RE(tag) $xml] {
> # Trim off any extraneous whitespace to make life easier
> set contents [string trim $contents]
>
> # Ignore a completely empty tag (which would be invalid xml
> # to begin with)
> if {[string length $contents] == 0} then continue
>
> # Ignore closing tags; they aren't supposed to have attributes
> if {[string index $contents 0] == "/"} then continue
>
> # Ignore processing instructions; they don't have attributes
> if {[string index $contents 0] == "?"} then continue
>
> # Ignore comments and CDATA tags; they don't have attributes
> if {[string index $contents 0] == "!"} then continue
>
> # Separate out the tag name and the data of the attribute(s)
> regexp -- $RE(name-attribs) $contents => tag_name data
>
> # If the string length of data is zero then there were no attributes
> if {[string length $data] == 0} then continue
>
> # Now we will get the name of an attribute (key), see what
> # type of quoting is used (single or double), then get the value
> # of the attribute (val), and save the rest of the data (additional
> # key/value pair(s)) for further processing the next time we go
> # through the while loop.
> #
> # Processing ends when we run out of key/value pairs (data is
> # exhausted and our regexp fails to match any more) or when
> # we encounter an attribute that is improperly quoted (i.e.,
> # no closing single or double quote) which is definitely invalid xml.
> while {[regexp -- $RE(next-attrib) $data => key data]} {
> # Which type of quoting was used, single or double?
> if {[string match '* $data]} then {
> set quote_type single-quote
> } else {
> set quote_type double-quote
> }
>
> # There should be a corresponding close $quote_type; between
> # the opening & closing quote will be the value of this attrib
> # if there is no closing quote of the appropriate type then
> # this is invalid xml and we ignore any further processing
> # of attributes for this tag
> if {![regexp -- $RE($quote_type) $data => val data]} then break
>
> # We now know a key/val attribute pair; add it our results
> lappend results $key $val
>
> # Trim off trailing whitespace that separated this key/val
> # attribute pair from any that follow it
> set data [string trimleft $data]
> }
> }
>
> # Return the list of key/val attribute pairs that were found
> return $results
> }
>
> Michael
Wow, thank you so much that was dead on! I'm puzzled by your code
though especially the parts like
regexp -- $RE($quote_type) $data => val data
Could you explain that a little, plus
if {[string length $contents] == 0} then continue
That syntax also throws me as it is within the foreach loop
But it works perfectly for me, at least until someday I figure out
tdom!
Thanx so much!
Phil
| |
| Michael A. Cleverly 2006-11-26, 7:04 pm |
| On Sun, 26 Nov 2006, comp.lang.tcl wrote:
> Wow, thank you so much that was dead on! I'm puzzled by your code
> though especially the parts like
>
> regexp -- $RE($quote_type) $data => val data
This would be a valid XML fragment:
<foo id="42" name='John "the unknown man" Doe' reference="The Tcl'ers
Wiki" />
Each of the attributes is enclosed in either single quotes or double
quotes. Whichever quote type is used cannot appear within the attribute
(unless replaced with the XML entity, i.e., " for a double quote).
But the other kind can.
This makes a single regexp to match both types, while allowing the other
to be embedded, is difficult to impossible.
So in the code I first match the first attribute up to and including the
equal sign and then store the rest of the string in the variable data.
Then, based on whether the string begins with '* or not we know that this
particular attribute is enclosed in either single or double quotes. We
have two different regular expressions, one that matches each type.
So $RE($quote_type) will put the value contained within the particular
type of quotes in the variable val and then the rest of the string (i.e.,
the as-of-yet unparsed attributes--if any remain) back in data for the
subsequent iterations of the loop.
> Could you explain that a little, plus
>
> if {[string length $contents] == 0} then continue
>
> That syntax also throws me as it is within the foreach loop
That means if the length of $contents is 0 (i.e., it is the empty string)
then continue the foreach loop (i.e., we're done with this tag, loop to
the next--if any remain).
continue and break can both be used within for, foreach and while loops,
just like in C. See:
http://www.tcl.tk/man/tcl8.4/TclCmd/continue.htm
http://www.tcl.tk/man/tcl8.4/TclCmd/break.htm
> But it works perfectly for me, at least until someday I figure out
> tdom!
>
> Thanx so much!
> Phil
You're welcome. We can (collectively) help you out with tdom too, I'm
sure.
Michael
| |
| comp.lang.tcl 2006-11-26, 7:04 pm |
|
Michael A. Cleverly wrote:
> On Sun, 26 Nov 2006, comp.lang.tcl wrote:
>
>
> This would be a valid XML fragment:
>
> <foo id="42" name='John "the unknown man" Doe' reference="The Tcl'ers
> Wiki" />
That is a new one, I didn't know that attribute values can be encased
by either single or double quotes, because I thought that in XML you
can only use double quotes, hence, all of my XML attribute values are
always encased in double quotes only
>
> Each of the attributes is enclosed in either single quotes or double
> quotes. Whichever quote type is used cannot appear within the attribute
> (unless replaced with the XML entity, i.e., " for a double quote).
> But the other kind can.
>
> This makes a single regexp to match both types, while allowing the other
> to be embedded, is difficult to impossible.
>
> So in the code I first match the first attribute up to and including the
> equal sign and then store the rest of the string in the variable data.
> Then, based on whether the string begins with '* or not we know that this
> particular attribute is enclosed in either single or double quotes. We
> have two different regular expres | | |