Code Comments
Programming Forum and web based access to our favorite programming groups.> Awk won't treat avariableas a regex unless you tell it to with the
> ~ operator. You see,
>
> /regex/ { ... }
>
> is just shorthand for
>
> $0 ~ "regex" { ... }
>
> so you can use
>
> BEGIN {variable= "regex" }
> $0 ~variable{ ... }
>
> (Note: don't usevariable= /regex/ to assign the regex to thevariable.)
I'm dragging up a slightly old thread, but this seems to be exactly my
question.
It seems AWK cannot handle
var = /regex/
but can handle
var = "regex"
How should I write my regex when it contains characters such as |?
I personally would like to write:
var = /.*\|.*/
since this describes exactly what I am after. But this seems to be
prevented. So I am forced to write either:
var = ".*\\|.*"
or
var = ".*|.*"
Presumably in both cases the regexo /.*\|.*/ is constructed when used
in a regular expression such as
A|A ~ var
Is there anyway I can write var = /.*\|.*/?
Or am I forced to use either var = ".*\\|.*" or var = ".*|.*", which
is preferred?
Post Follow-up to this messagesmythe70@googlemail.com wrote: > > > I'm dragging up a slightly old thread, but this seems to be exactly my > question. > > It seems AWK cannot handle > var = /regex/ > but can handle > var = "regex" > > How should I write my regex when it contains characters such as |? > > I personally would like to write: > var = /.*\|.*/ > since this describes exactly what I am after. But this seems to be > prevented. So I am forced to write either: > var = ".*\\|.*" > or > var = ".*|.*" > Presumably in both cases the regexo /.*\|.*/ is constructed when used > in a regular expression such as > A|A ~ var > > Is there anyway I can write var = /.*\|.*/? > Or am I forced to use either var = ".*\\|.*" or var = ".*|.*", which > is preferred? > The syntax is var = ".*\\|.*". See http://www.gnu.org/software/gawk/ma...omputed-Regexps for the rationale. Ed.
Post Follow-up to this message> The syntax is var = ".*\\|.*". Cheers. > Seehttp://www.gnu.org/software/gawk/manual/gawk.html#Computed-Regexpsfor > the rationale. I'm not convinced the reasoning (http://www.gnu.org/software/gawk/ manual/gawk.html#Computed-Regexpsfor) as to why one should chose regexp as opposed to string constants is complete. Consider: PATTERN = "SomeRealHorridRegexp"; ... if (var ~ PATTERN) { ... } ... if (var2 ~ PATTERN) { ... } ... if (var3 ~ PATTERN) { ... } In this case it seems perfectly reasonable and in fact desirable to define the regexp as a string constant.
Post Follow-up to this messagesmythe70@googlemail.com wrote: > > > Cheers. > > > > > I'm not convinced the reasoning (http://www.gnu.org/software/gawk/ > manual/gawk.html#Computed-Regexpsfor) as to why one should chose > regexp as opposed to string constants is complete. Consider: > > PATTERN = "SomeRealHorridRegexp"; > ... > if (var ~ PATTERN) { ... } > ... > if (var2 ~ PATTERN) { ... } > ... > if (var3 ~ PATTERN) { ... } > > In this case it seems perfectly reasonable and in fact desirable to > define the regexp as a string constant. > There's 2 reasons to use string constants: 1) When the test is repeated multiple times as you show above. 2) When the RE has to be constructed, e.g. during input parsing to build an RE from the first 3 records and use it on the rest: NR==1 { start = "^" $0; next } NR==2 { middle = ":" $0 ":"; next } NR==3 { end = $0 "$"; re = start middle end; next } $0 ~ re { whatever.... } Regards, Ed.
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.