For Programmers: Free Programming Magazines  


Home > Archive > AWK > March 2006 > Substitution woes









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Substitution woes
Jonas H

2006-03-10, 6:55 pm

First of all, I'm using gawk.

Now for the problem. I need to replace, in the input, characters that
are repeated 5 or more times with just one instance of that character.

I'm a bit at a loss for what to do, since gsub doesn't really seem to
allow back references, least of all in the regexp part. The thing is, I
have a sed-expression that works: "s/\(.\)\\1\{4,\}/\1/".

So what I'd like to do is gsub(/(.)\1\1\1\1+/, "\1", string); - if that
was at all possible, which, ly, it is not.

Will I have to do my own looping through the string (I feel fairly
confident that I can manage this, but would like to avoid it if
possible), or is there some smart way that I have overlooked? Or have I
simply misunderstood the gsub/regex syntax?

Thanks in advance, Jonas
Ed Morton

2006-03-10, 6:55 pm

Jonas H wrote:

> First of all, I'm using gawk.
>
> Now for the problem. I need to replace, in the input, characters that
> are repeated 5 or more times with just one instance of that character.
>
> I'm a bit at a loss for what to do, since gsub doesn't really seem to
> allow back references, least of all in the regexp part. The thing is, I
> have a sed-expression that works: "s/\(.\)\\1\{4,\}/\1/".
>
> So what I'd like to do is gsub(/(.)\1\1\1\1+/, "\1", string); - if that
> was at all possible, which, ly, it is not.
>
> Will I have to do my own looping through the string (I feel fairly
> confident that I can manage this, but would like to avoid it if
> possible), or is there some smart way that I have overlooked? Or have I
> simply misunderstood the gsub/regex syntax?
>
> Thanks in advance, Jonas


Unlike perl and sed, awk doesn't let you use a matched pattern in the
remainder of the RE (e.g. using "\\1"), so you're stuck with needing to
work around that.

Ed.
Gordon Elliot

2006-03-11, 6:56 pm


Ed Morton wrote:
....
> Unlike perl and sed, awk doesn't let you use a matched pattern in the
> remainder of the RE (e.g. using "\\1"), so you're stuck with needing to
> work around that.


You might recommend using "gensub" under GAWK.

Or, TAWK...

Ed Morton

2006-03-11, 6:56 pm

Gordon Elliot wrote:
> Ed Morton wrote:
> ...
>
>
>
> You might recommend using "gensub" under GAWK.


I would if it supported the necessary construct. We're talking about
using "\\1" in the pattern matching part of the command, not the
replacement part.

> Or, TAWK...


Not supported or generally available so no point referring to it.

Ed.
Ed Morton

2006-03-11, 6:56 pm

Gordon Elliot wrote:
> Ed Morton wrote:
> ...
>
>
>
> You might recommend using "gensub" under GAWK.


I would if it supported the necessary construct. We're talking about
using "\\1" in the pattern matching part of the command, not the
replacement part.

> Or, TAWK...


Not supported or generally available so no point referring to it.

Ed.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com