Home > Archive > AWK > November 2007 > Changing xxxAxxxx to xxxBxxxxC
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Changing xxxAxxxx to xxxBxxxxC
|
|
| ibmichuco@hotmail.com 2007-11-14, 6:58 pm |
| Hi all,
I am trying to modify a file of single word columns (no punctuation
etc.)
that requires the following substitution:
xxxxA -> xxxxBC
xxxAxxxx -> xxxBxxxxC
that is check to see if the word contains string A and if it is at the
end,
replace it with BC; if it is in the middle, replace it with B and add
C to the
end of the word.
The first one is easy enough, I guess I just search for "A_" and make
a
direct subst. The second case is a bit trickier.
Any suggestion would be appreciated,
Michuco
| |
| Ed Morton 2007-11-14, 6:58 pm |
|
On 11/14/2007 1:12 PM, ibmichuco@hotmail.com wrote:
> Hi all,
>
> I am trying to modify a file of single word columns (no punctuation
> etc.)
> that requires the following substitution:
>
> xxxxA -> xxxxBC
> xxxAxxxx -> xxxBxxxxC
>
> that is check to see if the word contains string A and if it is at the
> end,
> replace it with BC; if it is in the middle, replace it with B and add
> C to the
> end of the word.
>
> The first one is easy enough, I guess I just search for "A_" and make
> a
> direct subst. The second case is a bit trickier.
>
> Any suggestion would be appreciated,
>
> Michuco
>
It's unclear exactly what your input file looks like and whether or not you want
to replace mumtiple occurences of "A", etc. but this may be what you want:
awk '{sub(/A$/,"BC")} gsub(/A/,"B"){$0=$0"C"} 1' file
If not, post some real sample input and expected output.
Regards,
Ed.
| |
| ibmichuco@hotmail.com 2007-11-14, 6:58 pm |
|
Ed Morton wrote:
> On 11/14/2007 1:12 PM, ibmichuco@hotmail.com wrote:
>
> It's unclear exactly what your input file looks like and whether or not you want
> to replace mumtiple occurences of "A", etc. but this may be what you want:
>
> awk '{sub(/A$/,"BC")} gsub(/A/,"B"){$0=$0"C"} 1' file
>
> If not, post some real sample input and expected output.
>
> Regards,
>
> Ed.
Well Ed, you're quick and you're good. Many thanks.
The sample file looks something like:
Ignore\t Ignore\t XXXXXAXXXX\t Ignore\t Ignore
Ignore\t Ignore\t XXXA\t Ignore\t Ignore
The script you gave almost works, except that it add C to the
end of the line:
Ignore Ignore XXXXXBXXXX Ignore IgnoreC
Ignore Ignore XXXBC Ignore Ignore
and it lost the tab FS. I would like to see
Ignore\t Ignore\t XXXXXBXXXXC\t Ignore\t Ignore
Ignore\t Ignore\t XXXBC\t Ignore\t Ignore
So I changed $0 in your script to $3 and it works fine except
the tabs. Adding BEGIN {F = "\t"} didn't help.
Regards,
Michuco
| |
| Ed Morton 2007-11-15, 3:57 am |
|
On 11/14/2007 2:32 PM, ibmichuco@hotmail.com wrote:
> Ed Morton wrote:
>
>
>
> Well Ed, you're quick and you're good. Many thanks.
> The sample file looks something like:
>
> Ignore\t Ignore\t XXXXXAXXXX\t Ignore\t Ignore
> Ignore\t Ignore\t XXXA\t Ignore\t Ignore
So your field separators are a single tab followed by a sequence of 1 or more
space characters "\t +"?
> The script you gave almost works, except that it add C to the
> end of the line:
>
> Ignore Ignore XXXXXBXXXX Ignore IgnoreC
> Ignore Ignore XXXBC Ignore Ignore
That doesn't make sense. You'd have had to change it to this:
awk '{sub(/A\t/,"BC")} gsub(/A/,"B"){$0=$0"C"} 1' file
to get that output.
> and it lost the tab FS. I would like to see
>
> Ignore\t Ignore\t XXXXXBXXXXC\t Ignore\t Ignore
> Ignore\t Ignore\t XXXBC\t Ignore\t Ignore
>
> So I changed $0 in your script to $3 and it works fine except
> the tabs. Adding BEGIN {F = "\t"} didn't help.
So, is it always $3 you want to operate on and do you care about preserving the
white space between fields or is it OK to change "\t " to "\t " for example?
Ed.
| |
| Steffen Schuler 2007-11-15, 7:12 pm |
| On Wed, 14 Nov 2007 12:32:05 -0800, ibmichuco@hotmail.com wrote:
<snip>[color=darkred]
<snip>[color=darkred]
> <snip> The sample file
> looks something like:
>
> Ignore\t Ignore\t XXXXXAXXXX\t Ignore\t Ignore Ignore\t Ignore\t XXXA\t
> Ignore\t Ignore
>
<snip>
> <snip> I would like to see
>
> Ignore\t Ignore\t XXXXXBXXXXC\t Ignore\t Ignore Ignore\t Ignore\t
> XXXBC\t Ignore\t Ignore
<snip>
Hi Michuco, hello netlanders,
an awk-solution tested with gawk, mawk, and original-awk:
BEGIN { OFS = FS = "\t" }
{ for (i = 1; i <= NF; ++i) {
if ($i ~ /A/) {
gsub(/A/, "B", $i)
$i = $i "C"
} } }
1
Enjoy it,
Steffen "goedel" Schuler
| |
| Steffen Schuler 2007-11-15, 7:12 pm |
| On Fri, 16 Nov 2007 00:07:02 +0000, Steffen Schuler wrote:
> On Wed, 14 Nov 2007 12:32:05 -0800, ibmichuco@hotmail.com wrote:
> <snip>
> <snip>
> <snip>
<snip>[color=darkred]
> an awk-solution tested with gawk, mawk, and original-awk:
>
> BEGIN { OFS = FS = "\t" }
> { for (i = 1; i <= NF; ++i) {
> if ($i ~ /A/) {
> gsub(/A/, "B", $i)
> $i = $i "C"
> } } }
> 1
Hi Michuco, hello netlanders,
a shorter awk script is:
BEGIN { OFS = FS = "\t" }
{ for (i = 1; i <= NF; ++i)
if (gsub(/A/, "B", $i))
$i = $i "C"
}
1
Enjoy it,
Steffen "goedel" Schuler
| |
|
| On Nov 15, 4:39 pm, Steffen Schuler <schuler.stef...@googlemail.com>
wrote:
> On Fri, 16 Nov 2007 00:07:02 +0000, Steffen Schuler wrote:
>
>
>
>
>
> <snip>
>
>
> Hi Michuco, hello netlanders,
>
> a shorter awk script is:
>
> BEGIN { OFS = FS = "\t" }
> { for (i = 1; i <= NF; ++i)
> if (gsub(/A/, "B", $i))
> $i = $i "C"}
>
> 1
>
> Enjoy it,
>
> Steffen "goedel" Schuler
This is a real newbie question -- why the "1" at the end of this
script and Ed's?
| |
| Steffen Schuler 2007-11-16, 4:15 am |
| On Fri, 16 Nov 2007 00:39:35 +0000, Steffen Schuler wrote:
> On Fri, 16 Nov 2007 00:07:02 +0000, Steffen Schuler wrote:
>
> <snip>
<snip>[color=darkred]
> a shorter awk script is:
>
> BEGIN { OFS = FS = "\t" }
> { for (i = 1; i <= NF; ++i)
> if (gsub(/A/, "B", $i))
> $i = $i "C"
> }
> 1
Hi Michuco, hello netlanders,
still a bit shorter, but works only with gawk:
BEGIN { RS = "[\t\n]" }
gsub(/A/, "B") { sub(/$/, "C") }
{ printf "%s", $0 RT }
Enjoy it,
Steffen "goedel" Schuler
| |
| Steffen Schuler 2007-11-16, 4:15 am |
| On Thu, 15 Nov 2007 17:38:39 -0800, Sammy wrote:
> On Nov 15, 4:39 pm, Steffen Schuler <schuler.stef...@googlemail.com>
<snip>
<snip>[color=darkred]
>
> This is a real newbie question -- why the "1" at the end of this script
> and Ed's?
Hello Sammy, hello netlanders,
1 is an always true condition.
If a rule consists only of a condition COND, it is an abbreviation for
COND { print $0 }
So the rule consisting only of the condition 1 means: print always the
currrent line/record.
Instead of 1 you can also use any number or string value different from 0
or "" e.g. 9. (0 or "" means false)
Hope I could help,
Steffen "goedel" Schuler
| |
| Cesar Rabak 2007-11-16, 4:15 am |
| Sammy escreveu:
[snipped]
>
> This is a real newbie question -- why the "1" at the end of this
> script and Ed's?
AWK has statements composed of PATTERNS and ACTIONS.
Pattern-Action statements are evaluated in turn for each input RECORD
(for most practical purposes we could say 'lines of text' here).
So an example would be:
$3 < 10 { print $3}
which means "if the third FIELD of the RECORD being scanned is less than
ten print it".
When the logical evaluation of PATTERN is true, ACTION is executed.
It happens that the syntax of AWL language has some defaults, so:
if I wanted that instead of printing _only_ the third field, the whole
record (line) was printed I could change to:
$3 < 10 { print $0}
but it can be simplified to:
$3 < 10 { print}
in fact if the ACTION part is only "print", one can simplify even further:
$3 < 10
If I wanted to print every line, the thing I need to do is to ascertain
the pattern always true.
The "1" does it and it is very compact.
HTH
--
Cesar Rabak
| |
| ibmichuco@hotmail.com 2007-11-16, 6:58 pm |
|
Ed Morton wrote:
> On 11/14/2007 2:32 PM, ibmichuco@hotmail.com wrote:
>
> So your field separators are a single tab followed by a sequence of 1 or more
> space characters "\t +"?
>
>
> That doesn't make sense. You'd have had to change it to this:
>
> awk '{sub(/A\t/,"BC")} gsub(/A/,"B"){$0=$0"C"} 1' file
>
> to get that output.
>
>
> So, is it always $3 you want to operate on and do you care about preserving the
> white space between fields or is it OK to change "\t " to "\t " for example?
>
> Ed.
Hi,
The space after the taqb is not important, actually it has something
to do with
Google News. Sorry, don't have much of an option beside Google News.
I was able to slightly twist your script to do what I needed. Again,
thanks to Ed and
others for your quick help.
Michuco
|
|
|
|
|