For Programmers: Free Programming Magazines  


Home > Archive > Tcl > June 2005 > Re: join enhancements









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: join enhancements
Neil Madden

2005-06-07, 8:59 pm

Andreas Leitgeb wrote:
> Neil Madden <nem@cs.nott.ac.uk> wrote:

....
>
> I wanted to also extend split, but this opened a can of worms, which
> I found myself unable to handle yet:
> What if the value (in your example) contains an "="? (yes, I know, it
> could also contain a \n, breaking things completely, but thats not what
> I'm worrying about here)
> My question is: will multi::split create a three-element sublist then,
> or will it ignore any further occurrence of the "inner" split-char
> until it finds next "outer" split-char? and will it ignore any
> "outer" split-chars until it finds an "inner" one?


It will create a three-element sub-list. This is what should happen:
split (and multi::split) take a *string* as input -- i.e. they assume no
structure in what they are given, and just split on the given chars.
Thus, talking about "inner" and "outer" chars doesn't make sense in this
context. The caller is assumed to have taken care of quoting (or rather,
eliminating) any stray delimiter characters. A join/split pair *could*
be created that take care of quoting/unquoting, something like:

proc quote-join {list delim} {
join [map [quote $delim] $list] $delim
}
proc quote-split {string delim} {
set re [format {(?:[^%s]|\\[%s])+} $delim $delim]
map [unquote $delim] \
[regexp -all -inline $re $string]
}

with the some reasonable definitions for the other funcs:

proc map {func list} {
set ret [list]
foreach item $list {
lappend ret [uplevel 1 [linsert $func end $item]]
}
return $ret
}
proc quote {delims} {
set map [list \\ {\\}]
foreach char [split $delims {}] {
lappend map $char \\$char
}
return [list string map $map]
}
proc unquote {delims} {
set map [list {\\} \\]
foreach char [split $delims {}] {
lappend map \\$char $char
}
return [list string map $map]
}

With these definitions:

quote-split [quote-join $str $delim] $delim

and

quote-join [quote-split $str $delim] $delim

should be the identity function for all inputs (well, except for the
list normalisation performed by regexp). However, the quoting mechanism
now is altering the structure of the original input beyond a simple
split/join, so it is quite a big change to the semantics and should be
given a different name to avoid confusion.

>
> so, back to proposed non-nesting cycling split, should
> split {a=b=c,r,x=y} "=" ","
> return
> {a b=c r,x y}
> or
> {a b=c r {} x y}
> ?


The former, as that makes it the inverse of your cycling join:

% cycle-join {a b=c r,x y} = ,
a=b=c,r,x=y

why would it produce the latter? With my multi::split it would produce:

a b {c r x} y

which seems more natural to me.
....
>
> Well, I think this functionality would really nicely fit into the
> one tcl join.
>


Perhaps. I'm not so sure that it's the right behaviour, and I don't see
a compelling case for the inclusion when it is so easy to do in a few
lines of Tcl. I think tcllib is a better place for these sorts of simple
functions, at least until the "right" behaviour can be agreed on.

-- Neil
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com