For Programmers: Free Programming Magazines  


Home > Archive > AWK > November 2007 > Changing xxxAxxxx to xxxBxxxxC









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Changing xxxAxxxx to xxxBxxxxC
ibmichuco@hotmail.com

2007-11-14, 6:58 pm

Hi all,

I am trying to modify a file of single word columns (no punctuation
etc.)
that requires the following substitution:

xxxxA -> xxxxBC
xxxAxxxx -> xxxBxxxxC

that is check to see if the word contains string A and if it is at the
end,
replace it with BC; if it is in the middle, replace it with B and add
C to the
end of the word.

The first one is easy enough, I guess I just search for "A_" and make
a
direct subst. The second case is a bit trickier.

Any suggestion would be appreciated,

Michuco

Ed Morton

2007-11-14, 6:58 pm



On 11/14/2007 1:12 PM, ibmichuco@hotmail.com wrote:
> Hi all,
>
> I am trying to modify a file of single word columns (no punctuation
> etc.)
> that requires the following substitution:
>
> xxxxA -> xxxxBC
> xxxAxxxx -> xxxBxxxxC
>
> that is check to see if the word contains string A and if it is at the
> end,
> replace it with BC; if it is in the middle, replace it with B and add
> C to the
> end of the word.
>
> The first one is easy enough, I guess I just search for "A_" and make
> a
> direct subst. The second case is a bit trickier.
>
> Any suggestion would be appreciated,
>
> Michuco
>


It's unclear exactly what your input file looks like and whether or not you want
to replace mumtiple occurences of "A", etc. but this may be what you want:

awk '{sub(/A$/,"BC")} gsub(/A/,"B"){$0=$0"C"} 1' file

If not, post some real sample input and expected output.

Regards,

Ed.

ibmichuco@hotmail.com

2007-11-14, 6:58 pm


Ed Morton wrote:
> On 11/14/2007 1:12 PM, ibmichuco@hotmail.com wrote:
>
> It's unclear exactly what your input file looks like and whether or not you want
> to replace mumtiple occurences of "A", etc. but this may be what you want:
>
> awk '{sub(/A$/,"BC")} gsub(/A/,"B"){$0=$0"C"} 1' file
>
> If not, post some real sample input and expected output.
>
> Regards,
>
> Ed.


Well Ed, you're quick and you're good. Many thanks.
The sample file looks something like:

Ignore\t Ignore\t XXXXXAXXXX\t Ignore\t Ignore
Ignore\t Ignore\t XXXA\t Ignore\t Ignore

The script you gave almost works, except that it add C to the
end of the line:

Ignore Ignore XXXXXBXXXX Ignore IgnoreC
Ignore Ignore XXXBC Ignore Ignore

and it lost the tab FS. I would like to see

Ignore\t Ignore\t XXXXXBXXXXC\t Ignore\t Ignore
Ignore\t Ignore\t XXXBC\t Ignore\t Ignore

So I changed $0 in your script to $3 and it works fine except
the tabs. Adding BEGIN {F = "\t"} didn't help.

Regards,

Michuco

Ed Morton

2007-11-15, 3:57 am



On 11/14/2007 2:32 PM, ibmichuco@hotmail.com wrote:
> Ed Morton wrote:
>
>
>
> Well Ed, you're quick and you're good. Many thanks.
> The sample file looks something like:
>
> Ignore\t Ignore\t XXXXXAXXXX\t Ignore\t Ignore
> Ignore\t Ignore\t XXXA\t Ignore\t Ignore


So your field separators are a single tab followed by a sequence of 1 or more
space characters "\t +"?

> The script you gave almost works, except that it add C to the
> end of the line:
>
> Ignore Ignore XXXXXBXXXX Ignore IgnoreC
> Ignore Ignore XXXBC Ignore Ignore


That doesn't make sense. You'd have had to change it to this:

awk '{sub(/A\t/,"BC")} gsub(/A/,"B"){$0=$0"C"} 1' file

to get that output.

> and it lost the tab FS. I would like to see
>
> Ignore\t Ignore\t XXXXXBXXXXC\t Ignore\t Ignore
> Ignore\t Ignore\t XXXBC\t Ignore\t Ignore
>
> So I changed $0 in your script to $3 and it works fine except
> the tabs. Adding BEGIN {F = "\t"} didn't help.


So, is it always $3 you want to operate on and do you care about preserving the
white space between fields or is it OK to change "\t " to "\t " for example?

Ed.

Steffen Schuler

2007-11-15, 7:12 pm

On Wed, 14 Nov 2007 12:32:05 -0800, ibmichuco@hotmail.com wrote:
<snip>[color=darkred]
<snip>[color=darkred]
> <snip> The sample file
> looks something like:
>
> Ignore\t Ignore\t XXXXXAXXXX\t Ignore\t Ignore Ignore\t Ignore\t XXXA\t
> Ignore\t Ignore
>

<snip>
> <snip> I would like to see
>
> Ignore\t Ignore\t XXXXXBXXXXC\t Ignore\t Ignore Ignore\t Ignore\t
> XXXBC\t Ignore\t Ignore

<snip>

Hi Michuco, hello netlanders,

an awk-solution tested with gawk, mawk, and original-awk:

BEGIN { OFS = FS = "\t" }
{ for (i = 1; i <= NF; ++i) {
if ($i ~ /A/) {
gsub(/A/, "B", $i)
$i = $i "C"
} } }
1

Enjoy it,

Steffen "goedel" Schuler
Steffen Schuler

2007-11-15, 7:12 pm

On Fri, 16 Nov 2007 00:07:02 +0000, Steffen Schuler wrote:

> On Wed, 14 Nov 2007 12:32:05 -0800, ibmichuco@hotmail.com wrote:
> <snip>
> <snip>
> <snip>
<snip>[color=darkred]
> an awk-solution tested with gawk, mawk, and original-awk:
>
> BEGIN { OFS = FS = "\t" }
> { for (i = 1; i <= NF; ++i) {
> if ($i ~ /A/) {
> gsub(/A/, "B", $i)
> $i = $i "C"
> } } }
> 1


Hi Michuco, hello netlanders,

a shorter awk script is:

BEGIN { OFS = FS = "\t" }
{ for (i = 1; i <= NF; ++i)
if (gsub(/A/, "B", $i))
$i = $i "C"
}
1

Enjoy it,

Steffen "goedel" Schuler
Sammy

2007-11-16, 4:15 am

On Nov 15, 4:39 pm, Steffen Schuler <schuler.stef...@googlemail.com>
wrote:
> On Fri, 16 Nov 2007 00:07:02 +0000, Steffen Schuler wrote:
>
>
>
>
>
> <snip>
>
>
> Hi Michuco, hello netlanders,
>
> a shorter awk script is:
>
> BEGIN { OFS = FS = "\t" }
> { for (i = 1; i <= NF; ++i)
> if (gsub(/A/, "B", $i))
> $i = $i "C"}
>
> 1
>
> Enjoy it,
>
> Steffen "goedel" Schuler


This is a real newbie question -- why the "1" at the end of this
script and Ed's?
Steffen Schuler

2007-11-16, 4:15 am

On Fri, 16 Nov 2007 00:39:35 +0000, Steffen Schuler wrote:

> On Fri, 16 Nov 2007 00:07:02 +0000, Steffen Schuler wrote:
>
> <snip>
<snip>[color=darkred]
> a shorter awk script is:
>
> BEGIN { OFS = FS = "\t" }
> { for (i = 1; i <= NF; ++i)
> if (gsub(/A/, "B", $i))
> $i = $i "C"
> }
> 1


Hi Michuco, hello netlanders,

still a bit shorter, but works only with gawk:

BEGIN { RS = "[\t\n]" }
gsub(/A/, "B") { sub(/$/, "C") }
{ printf "%s", $0 RT }

Enjoy it,

Steffen "goedel" Schuler


Steffen Schuler

2007-11-16, 4:15 am

On Thu, 15 Nov 2007 17:38:39 -0800, Sammy wrote:

> On Nov 15, 4:39 pm, Steffen Schuler <schuler.stef...@googlemail.com>

<snip>
<snip>[color=darkred]
>
> This is a real newbie question -- why the "1" at the end of this script
> and Ed's?


Hello Sammy, hello netlanders,

1 is an always true condition.

If a rule consists only of a condition COND, it is an abbreviation for

COND { print $0 }

So the rule consisting only of the condition 1 means: print always the
currrent line/record.

Instead of 1 you can also use any number or string value different from 0
or "" e.g. 9. (0 or "" means false)

Hope I could help,

Steffen "goedel" Schuler
Cesar Rabak

2007-11-16, 4:15 am

Sammy escreveu:
[snipped]
>
> This is a real newbie question -- why the "1" at the end of this
> script and Ed's?


AWK has statements composed of PATTERNS and ACTIONS.

Pattern-Action statements are evaluated in turn for each input RECORD
(for most practical purposes we could say 'lines of text' here).

So an example would be:

$3 < 10 { print $3}

which means "if the third FIELD of the RECORD being scanned is less than
ten print it".

When the logical evaluation of PATTERN is true, ACTION is executed.

It happens that the syntax of AWL language has some defaults, so:

if I wanted that instead of printing _only_ the third field, the whole
record (line) was printed I could change to:

$3 < 10 { print $0}

but it can be simplified to:

$3 < 10 { print}

in fact if the ACTION part is only "print", one can simplify even further:

$3 < 10

If I wanted to print every line, the thing I need to do is to ascertain
the pattern always true.

The "1" does it and it is very compact.

HTH

--
Cesar Rabak
ibmichuco@hotmail.com

2007-11-16, 6:58 pm



Ed Morton wrote:
> On 11/14/2007 2:32 PM, ibmichuco@hotmail.com wrote:
>
> So your field separators are a single tab followed by a sequence of 1 or more
> space characters "\t +"?
>
>
> That doesn't make sense. You'd have had to change it to this:
>
> awk '{sub(/A\t/,"BC")} gsub(/A/,"B"){$0=$0"C"} 1' file
>
> to get that output.
>
>
> So, is it always $3 you want to operate on and do you care about preserving the
> white space between fields or is it OK to change "\t " to "\t " for example?
>
> Ed.


Hi,

The space after the taqb is not important, actually it has something
to do with
Google News. Sorry, don't have much of an option beside Google News.

I was able to slightly twist your script to do what I needed. Again,
thanks to Ed and
others for your quick help.

Michuco
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com