Home > Archive > AWK > November 2007 > How to gram awk's regexp submatches?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
How to gram awk's regexp submatches?
|
|
| feaber 2007-11-19, 7:01 pm |
| For example, I have something like this:
$echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER
HERE!}"
awk gets some text to parse. And it match. But I want to get some part
of that text (the number).
In apache2 module, mod_rewrite it was easy. Submatches goes into
variables ($0-$n), but here in awk the $-variables meaning something
else right? :)
| |
| Kenny McCormack 2007-11-19, 7:01 pm |
| In article <9826656f-be67-4113-b4cf-167ab020d87f@d61g2000hsa.googlegroups.com>,
feaber <feaber@gmail.com> wrote:
>For example, I have something like this:
>
>$echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER
>HERE!}"
>
>awk gets some text to parse. And it match. But I want to get some part
>of that text (the number).
>
>In apache2 module, mod_rewrite it was easy. Submatches goes into
>variables ($0-$n), but here in awk the $-variables meaning something
>else right? :)
Both GAWK & TAWK have extensions to do this. Standard (vanilla
standard) AWK does not have anything.
| |
| Bob Harris 2007-11-19, 7:01 pm |
| In article
<9826656f-be67-4113-b4cf-167ab020d87f@d61g2000hsa.googlegroups.com
>,
feaber <feaber@gmail.com> wrote:
> For example, I have something like this:
>
> $echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER
> HERE!}"
>
> awk gets some text to parse. And it match. But I want to get some part
> of that text (the number).
>
> In apache2 module, mod_rewrite it was easy. Submatches goes into
> variables ($0-$n), but here in awk the $-variables meaning something
> else right? :)
echo "test4325363test" | awk '
match($0,/[0-9]+/) {
print substr($0,RSTART,RLENGTH)
}
'
| |
| Ed Morton 2007-11-19, 9:58 pm |
|
On 11/19/2007 4:53 PM, feaber wrote:
> For example, I have something like this:
>
> $echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER
> HERE!}"
>
> awk gets some text to parse. And it match. But I want to get some part
> of that text (the number).
>
> In apache2 module, mod_rewrite it was easy. Submatches goes into
> variables ($0-$n), but here in awk the $-variables meaning something
> else right? :)
This might be what you're looking for (GNU awk):
gawk '{print gensub(/(.*)([0-9]+)(.*)/,"\\2","")}'
Ed.
| |
| feaber 2007-11-20, 7:58 am |
| Thx Guys! :)
| |
| Steffen Schuler 2007-11-20, 6:58 pm |
| Hi feaber, hello netlanders,
On Mon, 19 Nov 2007 14:53:04 -0800, feaber wrote:
> For example, I have something like this:
>
> $echo "test4325363test" | awk "/(.*)([0-9]+)(.*)/ {print NUMBER HERE!}"
>
> awk gets some text to parse. And it match. But I want to get some part
> of that text (the number).
>
> In apache2 module, mod_rewrite it was easy. Submatches goes into
> variables ($0-$n), but here in awk the $-variables meaning something
> else right? :)
in awk $i means the i-th field of the input record ($0 is the whole
record without record separator.) Normally a record is the same as a text
line and the record separator is then a newline.
POSIX awk does not support submatches inside parentheses in your sense
but gawk delivers support with the additional array parameter in match()
and with gensub().
(A) match()-Extension
*********************
Gawk's match-extension match(s, re, a) does what you want:
the submatches inside parentheses in re are assigned to the array
elements
a[1], a[2], ...,a[n]
a[i, "start"] is the start position of a[i] with the length
a[i, "length"]
Please observe that (g)awk matching is greedy.
After match("test4325363test", "(.*)([0-9]+)(.*)", a)
a[1] is "test432536"
a[2] is "3"
a[3] is "test"
Therefore use:
echo test4325363test |
gawk 'match($0, "([^0-9]*)([0-9]+)(.*)", a) { print a[2] }'
to extract the number.
Please, note that gawk 3.1.5 has some bugs in the match function.
These should be corrected in gawk 3.1.6 (see ftp://ftp.gnu.org).
(B) gensub()
************
As Ed Morton told you gensub is the other alternative with gawk.
The correct use in your case is (see the greedy argument above):
echo test4325363test |
gawk '/[0-9]/ { print gensub(/([^0-9]*)([0-9]+)(.*)/, "\\2", "1") }'
Hope I could help you,
Steffen "goedel" Schuler
|
|
|
|
|