For Programmers: Free Programming Magazines  


Home > Archive > AWK > November 2005 > interval expression in regexp









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author interval expression in regexp
Sebastian Luque

2005-11-16, 6:55 pm

Hi,

According to the manual:

,-----[ (info "(gawk)Regexp Operators") lines: 2309 - 2323 ]
| Interval expressions were not traditionally available in `awk'.
| They were added as part of the POSIX standard to make `awk' and
| `egrep' consistent with each other.
|
| However, because old programs may use `{' and `}' in regexp
| constants, by default `gawk' does _not_ match interval expressions
| in regexps. If either `--posix' or `--re-interval' are specified
| (*note Options::), then interval expressions are allowed in
| regexps.
|
| For new programs that use `{' and `}' in regexp constants, it is
| good practice to always escape them with a backslash. Then the
| regexp constants are valid and work the way you want them to, using
| any version of `awk'.(2)
`-----

I thought:

gawk '/a\{3\}/'

or

gawk --posix '/a{3}/'

should work, but only the latter does. What is going on?



--
Sebastian P. Luque
Ed Morton

2005-11-16, 6:55 pm

Sebastian Luque wrote:
> Hi,
>
> According to the manual:
>
> ,-----[ (info "(gawk)Regexp Operators") lines: 2309 - 2323 ]
> | Interval expressions were not traditionally available in `awk'.
> | They were added as part of the POSIX standard to make `awk' and
> | `egrep' consistent with each other.
> |
> | However, because old programs may use `{' and `}' in regexp
> | constants, by default `gawk' does _not_ match interval expressions
> | in regexps. If either `--posix' or `--re-interval' are specified
> | (*note Options::), then interval expressions are allowed in
> | regexps.
> |
> | For new programs that use `{' and `}' in regexp constants, it is
> | good practice to always escape them with a backslash. Then the
> | regexp constants are valid and work the way you want them to, using
> | any version of `awk'.(2)
> `-----
>
> I thought:
>
> gawk '/a\{3\}/'
>
> or
>
> gawk --posix '/a{3}/'
>
> should work, but only the latter does. What is going on?
>
>
>


The first version is consistent with the syntax of older versions of awk
and so is the default for backward compatibility as the text you quoted
explains. The second works for POSIX syntax, as would:

gawk --re-interval '/a{3}/'

Note that since gensub() is non-posix, that function is not available if
you use --posix, but it is if you use --re-interval so I'd stick to
--re-interval to avoid losing useful GNU awk functionality just to gain
RE intervals.

Ed.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com