For Programmers: Free Programming Magazines  


Home > Archive > AWK > August 2007 > Variables in regular expressions









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Variables in regular expressions
smythe70@googlemail.com

2007-08-17, 7:57 am

> Awk won't treat avariableas a regex unless you tell it to with the
> ~ operator. You see,
>
> /regex/ { ... }
>
> is just shorthand for
>
> $0 ~ "regex" { ... }
>
> so you can use
>
> BEGIN {variable= "regex" }
> $0 ~variable{ ... }
>
> (Note: don't usevariable= /regex/ to assign the regex to thevariable.)


I'm dragging up a slightly old thread, but this seems to be exactly my
question.

It seems AWK cannot handle
var = /regex/
but can handle
var = "regex"

How should I write my regex when it contains characters such as |?

I personally would like to write:
var = /.*\|.*/
since this describes exactly what I am after. But this seems to be
prevented. So I am forced to write either:
var = ".*\\|.*"
or
var = ".*|.*"
Presumably in both cases the regexo /.*\|.*/ is constructed when used
in a regular expression such as
A|A ~ var

Is there anyway I can write var = /.*\|.*/?
Or am I forced to use either var = ".*\\|.*" or var = ".*|.*", which
is preferred?

Ed Morton

2007-08-17, 6:58 pm

smythe70@googlemail.com wrote:
>
>
> I'm dragging up a slightly old thread, but this seems to be exactly my
> question.
>
> It seems AWK cannot handle
> var = /regex/
> but can handle
> var = "regex"
>
> How should I write my regex when it contains characters such as |?
>
> I personally would like to write:
> var = /.*\|.*/
> since this describes exactly what I am after. But this seems to be
> prevented. So I am forced to write either:
> var = ".*\\|.*"
> or
> var = ".*|.*"
> Presumably in both cases the regexo /.*\|.*/ is constructed when used
> in a regular expression such as
> A|A ~ var
>
> Is there anyway I can write var = /.*\|.*/?
> Or am I forced to use either var = ".*\\|.*" or var = ".*|.*", which
> is preferred?
>


The syntax is var = ".*\\|.*". See
http://www.gnu.org/software/gawk/ma...omputed-Regexps for
the rationale.

Ed.
smythe70@googlemail.com

2007-08-17, 6:58 pm

> The syntax is var = ".*\\|.*".

Cheers.

> Seehttp://www.gnu.org/software/gawk/manual/gawk.html#Computed-Regexpsfor
> the rationale.


I'm not convinced the reasoning (http://www.gnu.org/software/gawk/
manual/gawk.html#Computed-Regexpsfor) as to why one should chose
regexp as opposed to string constants is complete. Consider:

PATTERN = "SomeRealHorridRegexp";
....
if (var ~ PATTERN) { ... }
....
if (var2 ~ PATTERN) { ... }
....
if (var3 ~ PATTERN) { ... }

In this case it seems perfectly reasonable and in fact desirable to
define the regexp as a string constant.

Ed Morton

2007-08-19, 6:57 pm

smythe70@googlemail.com wrote:
>
>
> Cheers.
>
>
>
>
> I'm not convinced the reasoning (http://www.gnu.org/software/gawk/
> manual/gawk.html#Computed-Regexpsfor) as to why one should chose
> regexp as opposed to string constants is complete. Consider:
>
> PATTERN = "SomeRealHorridRegexp";
> ...
> if (var ~ PATTERN) { ... }
> ...
> if (var2 ~ PATTERN) { ... }
> ...
> if (var3 ~ PATTERN) { ... }
>
> In this case it seems perfectly reasonable and in fact desirable to
> define the regexp as a string constant.
>


There's 2 reasons to use string constants:

1) When the test is repeated multiple times as you show above.
2) When the RE has to be constructed, e.g. during input parsing to build
an RE from the first 3 records and use it on the rest:

NR==1 { start = "^" $0; next }
NR==2 { middle = ":" $0 ":"; next }
NR==3 { end = $0 "$"; re = start middle end; next }
$0 ~ re { whatever.... }

Regards,

Ed.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com