| Author |
Variables in regular expressions
|
|
| smythe70@googlemail.com 2007-08-17, 7:57 am |
| > Awk won't treat avariableas a regex unless you tell it to with the
> ~ operator. You see,
>
> /regex/ { ... }
>
> is just shorthand for
>
> $0 ~ "regex" { ... }
>
> so you can use
>
> BEGIN {variable= "regex" }
> $0 ~variable{ ... }
>
> (Note: don't usevariable= /regex/ to assign the regex to thevariable.)
I'm dragging up a slightly old thread, but this seems to be exactly my
question.
It seems AWK cannot handle
var = /regex/
but can handle
var = "regex"
How should I write my regex when it contains characters such as |?
I personally would like to write:
var = /.*\|.*/
since this describes exactly what I am after. But this seems to be
prevented. So I am forced to write either:
var = ".*\\|.*"
or
var = ".*|.*"
Presumably in both cases the regexo /.*\|.*/ is constructed when used
in a regular expression such as
A|A ~ var
Is there anyway I can write var = /.*\|.*/?
Or am I forced to use either var = ".*\\|.*" or var = ".*|.*", which
is preferred?
| |
| Ed Morton 2007-08-17, 6:58 pm |
| smythe70@googlemail.com wrote:
>
>
> I'm dragging up a slightly old thread, but this seems to be exactly my
> question.
>
> It seems AWK cannot handle
> var = /regex/
> but can handle
> var = "regex"
>
> How should I write my regex when it contains characters such as |?
>
> I personally would like to write:
> var = /.*\|.*/
> since this describes exactly what I am after. But this seems to be
> prevented. So I am forced to write either:
> var = ".*\\|.*"
> or
> var = ".*|.*"
> Presumably in both cases the regexo /.*\|.*/ is constructed when used
> in a regular expression such as
> A|A ~ var
>
> Is there anyway I can write var = /.*\|.*/?
> Or am I forced to use either var = ".*\\|.*" or var = ".*|.*", which
> is preferred?
>
The syntax is var = ".*\\|.*". See
http://www.gnu.org/software/gawk/ma...omputed-Regexps for
the rationale.
Ed.
| |
| smythe70@googlemail.com 2007-08-17, 6:58 pm |
| > The syntax is var = ".*\\|.*".
Cheers.
> Seehttp://www.gnu.org/software/gawk/manual/gawk.html#Computed-Regexpsfor
> the rationale.
I'm not convinced the reasoning (http://www.gnu.org/software/gawk/
manual/gawk.html#Computed-Regexpsfor) as to why one should chose
regexp as opposed to string constants is complete. Consider:
PATTERN = "SomeRealHorridRegexp";
....
if (var ~ PATTERN) { ... }
....
if (var2 ~ PATTERN) { ... }
....
if (var3 ~ PATTERN) { ... }
In this case it seems perfectly reasonable and in fact desirable to
define the regexp as a string constant.
| |
| Ed Morton 2007-08-19, 6:57 pm |
| smythe70@googlemail.com wrote:
>
>
> Cheers.
>
>
>
>
> I'm not convinced the reasoning (http://www.gnu.org/software/gawk/
> manual/gawk.html#Computed-Regexpsfor) as to why one should chose
> regexp as opposed to string constants is complete. Consider:
>
> PATTERN = "SomeRealHorridRegexp";
> ...
> if (var ~ PATTERN) { ... }
> ...
> if (var2 ~ PATTERN) { ... }
> ...
> if (var3 ~ PATTERN) { ... }
>
> In this case it seems perfectly reasonable and in fact desirable to
> define the regexp as a string constant.
>
There's 2 reasons to use string constants:
1) When the test is repeated multiple times as you show above.
2) When the RE has to be constructed, e.g. during input parsing to build
an RE from the first 3 records and use it on the rest:
NR==1 { start = "^" $0; next }
NR==2 { middle = ":" $0 ":"; next }
NR==3 { end = $0 "$"; re = start middle end; next }
$0 ~ re { whatever.... }
Regards,
Ed.
|
|
|
|