Home > Archive > Tcl > February 2007 > "string match" and "glob" pattern rules
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
"string match" and "glob" pattern rules
|
|
| MartinLemburg@UGS 2007-02-27, 8:14 am |
| Hello,
I only would like to know why there are slight differences between the
"string match" and the "glob" pattern rules.
In glob pattern I can use e.g. "{a,b,e}" as a list of characters to
match:
% pwd
C:/Program Files/tcl/bin
% lrange [set files [glob "{t,w}*.exe"]] 0 2
tclsh.exe tclsh75.exe tclsh76.exe
In the "string match" patterns I can not use this:
% string match "{t,w}*.exe" tclsh.exe
0
Why?
Best regards,
Martin Lemburg
UGS - a Siemens Company - Transforming the Process of Innovation
| |
| suchenwi 2007-02-27, 8:14 am |
| On 27 Feb., 11:26, "MartinLemburg@UGS" <martin.lemburg....@gmx.net>
wrote:
> In glob pattern I can use e.g. "{a,b,e}" as a list of characters to
> match:
>
> % pwd
> C:/Program Files/tcl/bin
> % lrange [set files [glob "{t,w}*.exe"]] 0 2
> tclsh.exe tclsh75.exe tclsh76.exe
>
> In the "string match" patterns I can not use this:
>
> % string match "{t,w}*.exe" tclsh.exe
> 0
>
> Why?
Because string match does not use the same language as glob, even
though it's called "glob-style" :^)
It is documented in the man page that you can specify character
classes somehow like in re_syntax:
% string match {[tw]*.exe} tclsh.exe
1
The reason for this difference is probably historical...
| |
| MartinLemburg@UGS 2007-02-27, 8:14 am |
| Hi Richard,
surely it's all described in the man pages, but ... why those
differences even talking about glob-style patterns?
I never was really deep into the UNIX world, never needed awk or other
tools, so I don't know really about the definition of glob-style
patterns.
But sometimes I found it anyoing that those two glob-style pattern
using commands glob and "string match" could not interchange their
patterns!
Wouldn't it be good for the tcl core to have one glob-style pattern
"engine", to be used on every place where glob-style pattern are used
- like in "array names arrayName -glob pattern"?
Best regards,
Martin Lemburg
UGS - a Siemens Company - Transforming the Process of Innovation
On Feb 27, 12:16 pm, "suchenwi" <richard.suchenwirth-
bauersa...@siemens.com> wrote:
> On 27 Feb., 11:26, "MartinLemburg@UGS" <martin.lemburg....@gmx.net>
> wrote:
>
>
>
>
>
>
> Because string match does not use the same language as glob, even
> though it's called "glob-style" :^)
> It is documented in the man page that you can specify character
> classes somehow like in re_syntax:
>
> % string match {[tw]*.exe} tclsh.exe
> 1
>
> The reason for this difference is probably historical...
| |
|
|
>
> Wouldn't it be good for the tcl core to have one glob-style pattern
> "engine", to be used on every place where glob-style pattern are used
> - like in "array names arrayName -glob pattern"?
>
at this point, what would be the effect on pre-existing code, though?
| |
| MartinLemburg@UGS 2007-02-27, 7:14 pm |
| Hello jkj,
yes - you are right - this would be a probably code-breaking change.
So having none glob-style pattern matching engineen would only be
something for tcl 9.
Good-bye idea! :(
Best regards,
Martin Lemburg
UGS - a Siemens Company - Transforming the Process of Innovation
On Feb 27, 2:07 pm, "jkj" <k...@vexona.com> wrote:
>
> at this point, what would be the effect on pre-existing code, though?
| |
| Erik Leunissen 2007-02-27, 7:14 pm |
| MartinLemburg@UGS wrote:
> Hello jkj,
>
> yes - you are right - this would be a probably code-breaking change.
>
> So having none glob-style pattern matching engineen would only be
> something for tcl 9.
>
> Good-bye idea! :(
>
Why? Doesn't the Tcl development proces have a mechanism to store good
idea's for the long term?
I believe that separating out a common pattern matching proces (engine
or whatever name you give to it) is quite exactly what would solve the
observed inconsistencies. (I've stumbled across these pattern syntax
differences enough in the past to feel your need).
Erik.
> Best regards,
>
> Martin Lemburg
> UGS - a Siemens Company - Transforming the Process of Innovation
>
> On Feb 27, 2:07 pm, "jkj" <k...@vexona.com> wrote:
>
--
leunissen@ nl | Merge the left part of these two lines into one,
e. hccnet. | respecting a character's position in a line.
| |
| MartinLemburg@UGS 2007-02-27, 7:14 pm |
| Hello Eric,
yes - I should write a TIP and hope, that tcl 9 will include this
change.
But - I think more of the time it will need to release tcl 9 and the
tcl version we currently use on work, so I hesitate a bit, because
that TIP won't help a lot.
Is not only, that we (my team) work with an older tcl version, but
that many developer, companies work not always with the newest version
of tcl!
And ... I hesitate, because in all the years of developing with tcl
and reading messages from compl.lang.tcl or the tcl core email list, I
never read such suggestion! So the common need for creating one "glob-
style pattern engine" seems not to be that big!
Right?
I think I should care for it again, if the development of tcl comes
more or less to a 9 release, than working on 8.x releases.
Best regards,
Martin Lemburg
UGS - a Siemens Company - Transforming the Process of Innovation
On Feb 27, 4:48 pm, Erik Leunissen <l...@the.footer.invalid> wrote:
> MartinLemburg@UGS wrote:
>
>
>
>
> Why? Doesn't the Tcl development proces have a mechanism to store good
> idea's for the long term?
>
> I believe that separating out a common pattern matching proces (engine
> or whatever name you give to it) is quite exactly what would solve the
> observed inconsistencies. (I've stumbled across these pattern syntax
> differences enough in the past to feel your need).
>
> Erik.
>
>
>
>
> --
> leunissen@ nl | Merge the left part of these two lines into one,
> e. hccnet. | respecting a character's position in a line.
| |
| suchenwi 2007-02-27, 7:14 pm |
| I'm not sure whether the unification of {t,w} with [tw] merits an
incompatible change.
Once one needs such features, full-fledged regexps may be more
compatible with what people are used to... So how about
string match -regexp
glob -regexp
?
| |
| Larry W. Virden 2007-02-27, 7:14 pm |
| On Feb 27, 5:26 am, "MartinLemburg@UGS" <martin.lemburg....@gmx.net>
wrote:
> I only would like to know why there are slight differences between the
> "string match" and the "glob" pattern rules.
> Why?
There are numerous pattern matching routines in Tcl.
I don't know the history of why string match doesn't use the same code
as glob - to me, it seems like they should match.
I hope that, as discussions in Tcl 9 proceed, someone discusses at
attempt to refactor the Tcl code in a way that more sharing occurs.
The reason why I would find this useful is that one should find
themselves less surprised by the Tcl actions if things share more
code.
| |
| Gerald W. Lester 2007-02-27, 7:14 pm |
| Larry W. Virden wrote:
> On Feb 27, 5:26 am, "MartinLemburg@UGS" <martin.lemburg....@gmx.net>
> wrote:
>
>
>
> There are numerous pattern matching routines in Tcl.
>
> I don't know the history of why string match doesn't use the same code
> as glob - to me, it seems like they should match.
>
> I hope that, as discussions in Tcl 9 proceed, someone discusses at
> attempt to refactor the Tcl code in a way that more sharing occurs.
>
> The reason why I would find this useful is that one should find
> themselves less surprised by the Tcl actions if things share more
> code.
At one time in some of the ports the glob command just passed off the string
and let the OS libraries return the files that match -- thus there was no
code to reuse.
--
+--------------------------------+---------------------------------------+
| Gerald W. Lester |
|"The man who fights for his ideals is the man who is alive." - Cervantes|
+------------------------------------------------------------------------+
| |
| Larry W. Virden 2007-02-27, 7:14 pm |
| On Feb 27, 2:14 pm, "Gerald W. Lester" <Gerald.Les...@cox.net> wrote:
[color=darkred]
> At one time in some of the ports the glob command just passed off the string
> and let the OS libraries return the files that match -- thus there was no
> code to reuse.
And if string match did the same "pass off to the OS libraries", in
the cases where they are available, and otherwise pass along to the
same compatibility code (in the cases where there isn't any globbing
code in the OS), then the amount of surprise would be lessened, I
would think. There would still be a bit of surprise, in the case where
one was attempting to write cross platform and ran into the
differences between OSes as well as the differences with the
compatibility library. But at least on one platform, anything that
looked like a glob (doesn't switch and lsearch have globbing
options?), would act the same.
| |
| Fredderic 2007-02-27, 7:14 pm |
| On Tue, 27 Feb 2007 16:48:50 +0100,
Erik Leunissen <look@the.footer.invalid> wrote:
> MartinLemburg@UGS wrote:
> I believe that separating out a common pattern matching proces
> (engine or whatever name you give to it) is quite exactly what would
> solve the observed inconsistencies. (I've stumbled across these
> pattern syntax differences enough in the past to feel your need).
I've often thought that a centralised string matching engine would make
a huge amount of difference, especially if it could be picked up easily
by any commands that perform string matching. I'll write down what I've
been thinking, though I know not many people like my ideas because I
tend to go a little overboard. Still, maybe it'll invoke some
interesting chatter... :)
Add a new option -match=method to just about everything, with the
old-style -glob and -regexp and so forth being short-hand for their
favourite match style. [glob] would use for example, "glob:unix",
which supports the {,} notation, while plain "glob:string" (as used by
[lsearch/switch -glob]) may not. This choice could be over-ridden by
using -match=glob:XXX specifically.
"regexp" and "exact" would also be included as supported match-types,
with the matching mechanism allowing for some parameter passing;
-nocase, -all, -indicies, and friends, would get passed through to the
match engine, the pattern would be passed in a form allowing for caching
(preferably within the TclObj itself, so the compiled form lasts as
long as the pattern). Also supported would be a means for the match
function to return a list of ranges, ala [regexp -indicies].
This, of course, leads on to another of those internal namespaces (like
the "expression" namespace) wherein all these match engines reside.
Something like [match::regexp] would get you a function equivalent to
[regexp -all -indices -inline]. The C function would be aimed at
efficient invocation from the bytecode, allowing those original -glob
options to bypass the TCL front-end and go straight to the internal
representation, regardless of what you've done to the names within the
matchs namespace. The TCL presence would take, for example, an options
dict as a standard argument, and a variable name into which to drop the
list of returned values. [regexp] would re-process the returned list,
extracting string segments and distributing them among the passed
variables as needed. [regsub] could, presumably, do something similar.
A means to pass extra options through the -match argument would relieve
the burden of every command having to support every new match option
that someone dreams up for their custom match type, as well. The new
system could be considered "in flux" for a few releases, to allow
things to be moved around without being overly concerned at first, who
you upset.
Fredderic
| |
| MartinLemburg@UGS 2007-02-28, 4:14 am |
| Hi Gerald,
if I want to develop platform independent, than IMHO OS dependent glob
styles are the worst to find.
Since 7 years I develop with tcl inside one application for the
platforms MS Windows, SGI Irix, Sun OS, IBM AIX and HP UX.
Finding out the intersection of all glob styles and using it would be
a real nice work, wouldn't it?
And the tcl documentation wouldn't be able to document the glob style,
because it would be OS dependent.
No - I would suggest to have a one glob style implementation for all
tcl commands, that use glob style pattern, like array, glob, info,
lsearch, string, switch, ... .
Than the developers could use the same glob style everywhere and
wouldn't run into surprises, incompabilities, ...!
Take a look at the RE usage in tcl. At every place I want to use REs I
know, that I can rely on the same RE pattern specifications/style. I
want that for glob style patterns too!
So IMHO Fredderics ideas above are worth to discuss!
Best regards,
Martin Lemburg
UGS - a Siemens Company - Transforming the Process of Innovation
On Feb 27, 8:14 pm, "Gerald W. Lester" <Gerald.Les...@cox.net> wrote:
> Larry W. Virden wrote:
>
>
>
>
>
>
>
> At one time in some of the ports the glob command just passed off the string
> and let the OS libraries return the files that match -- thus there was no
> code to reuse.
>
> --
> +--------------------------------+---------------------------------------+
> | Gerald W. Lester |
> |"The man who fights for his ideals is the man who is alive." - Cervantes|
> +------------------------------------------------------------------------+
|
|
|
|
|