Home > Archive > AWK > July 2004 > GAWK: Obtaining text matched by a pattern used to select records
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
GAWK: Obtaining text matched by a pattern used to select records
|
|
| Claro Clavigo 2004-07-16, 8:55 am |
| Is is possible to obtain that part of the current input record with
GAWK, which is matched by the corresponding selection pattern?
Example for the simplest case:
Input:
--------------------
first record
second record
third record
Output:
--------------------
first
Pattern and actions:
--------------------
/f.*t/ { print MATCH }
where MATCH would be a built-in variable that contains the matched
part of the input text (RT is such a built-in variable that contains
the input text that matched the text denoted by the record separator).
I cannot use RSTART and RLENGTH because this works only with the match
function and I need to access the pattern preceding the action.
Thank you for your help
| |
| Ed Morton 2004-07-16, 3:55 pm |
|
Claro Clavigo wrote:
> Is is possible to obtain that part of the current input record with
> GAWK, which is matched by the corresponding selection pattern?
>
> Example for the simplest case:
>
> Input:
> --------------------
> first record
> second record
> third record
>
> Output:
> --------------------
> first
>
> Pattern and actions:
> --------------------
> /f.*t/ { print MATCH }
There's no built-in way to do that. This should work instead:
gawk '/f.*t/{MATCH=gensub(/.*(f.*t).*/,"\\1",1);print MATCH}'
Regards,
Ed.
| |
| Kenny McCormack 2004-07-16, 3:55 pm |
| In article <cd8n8f$jji@netnews.proxy.lucent.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
>
>
>Claro Clavigo wrote:
>
>
>There's no built-in way to do that. This should work instead:
There are at least 2 built-in ways to do that (in current versions of GAWK).
There's yet another way to do it (in current versions of TAWK).
>gawk '/f.*t/{MATCH=gensub(/.*(f.*t).*/,"\\1",1);print MATCH}'
Yes, gensub() is one of the ways (in the context of answering the OP's
question - that is, how do I do a back-reference in AWK).
Or, you can use match in the pattern space:
match($0,/f.*t/) { Use RSTART & RLENGTH here }
Further, you encapsulate the above in a function called "extract" (google
for it).
| |
| Ed Morton 2004-07-16, 3:55 pm |
|
Kenny McCormack wrote:
> In article <cd8n8f$jji@netnews.proxy.lucent.com>,
> Ed Morton <morton@lsupcaemnt.com> wrote:
>
<snip>[color=darkred]
>
>
> There are at least 2 built-in ways to do that (in current versions of GAWK).
By a built-in way, I meant some awk variable that automatically gets set
to the matched pattern rather than having to manually pogram some way to
get the matching string. If there is something like that, please share...
> There's yet another way to do it (in current versions of TAWK).
>
>
>
>
> Yes, gensub() is one of the ways (in the context of answering the OP's
> question - that is, how do I do a back-reference in AWK).
>
> Or, you can use match in the pattern space:
>
> match($0,/f.*t/) { Use RSTART & RLENGTH here }
>
> Further, you encapsulate the above in a function called "extract" (google
> for it).
The match solution is better than my gensub one since you only need to
specify the pattern once, and don't need the surrounding ".*"s. Googling
didn't produce an extract function but I assume it'd look like this:
gawk '
function extract(extractSrc,extractPattern) {
if (match(extractSrc, extractPattern)) {
RMATCH = substr(extractSrc, RSTART, RLENGTH)
extractReturn = 1
} else {
RMATCH = null
extractReturn = 0
}
return extractReturn
}
extract($0,"f.*t") { print RMATCH }
'
If "extract" were a provided gawk function THEN I'd call that a built-in
way of solving the problem.
Ed.
| |
| Claro Clavigo 2004-07-18, 8:55 am |
| gazelle@yin.interaccess.com (Kenny McCormack) wrote in message news:<cd8ogk$avb$1@yin.interaccess.com>...
> Or, you can use match in the pattern space:
>
> match($0,/f.*t/) { Use RSTART & RLENGTH here }
>
> Further, you encapsulate the above in a function called "extract" (google
> for it).
Thank you, that is what I'm was looking for - specifing the pattern
only one time.
The flexibility of awk is always astonishing. I didn't assume that the
match function can be used within the pattern space.
|
|
|
|
|