Code Comments
Programming Forum and web based access to our favorite programming groups.I am using gawk (2.15) on Windows 2000 and I have a file in the form: 123456.123|123654.123|#42315 I would like to output it in the form: 123456.123 123654.123 42315 I have experimented with various attempts in the BEGIN block to declare the Field Separator (FS=|# or FS=[|#] etc ). But to no avail, I can get: 123456.123|123654.123| 42315 or 123456.123 123654.123 #42315 But not the output that I require. I thought a problem might be the field separators and the way that gawk handles | or # Any suggestions would be helpful. Thanks in advance Neil
Post Follow-up to this messageYou can use a regex with Fields Separators inserted in. Please try:
BEGIN { FS = "[|#][|#]*" } and you'll be satisfied.
If your gawk supports extended regular expressions, you can also use
this more compact line:
BEGIN { FS = "[|#]+" }
Bye
Post Follow-up to this messageIn article <1116419116.440398.31980@g14g2000cwa.googlegroups.com>, Neil Webster <nswebster@gmail.com> wrote: >I am using gawk (2.15) on Windows 2000 and I have a file in the form: >123456.123|123654.123|#42315 > >I would like to output it in the form: >123456.123 123654.123 42315 > >I have experimented with various attempts in the BEGIN block to declare >the Field Separator (FS=|# or FS=[|#] etc ). But to no avail, I can >get: >123456.123|123654.123| 42315 or 123456.123 123654.123 #42315 >But not the output that I require. > >I thought a problem might be the field separators and the way that gawk >handles | or # My first reaction was, like yours, that something tricky could be done with FS and OFS, but then I realized that simplest is best (assuming, as always, that I have your real requirements right): gsub(/[^0-9.]+/," ") (Yes, that's the whole program!)
Post Follow-up to this messageiper_linuxiano thanks.
Thats just what I was after, I just couldn't get the syntax right.
Thanks again.
Neil
iper_linuxiano wrote:
> You can use a regex with Fields Separators inserted in. Please try:
> BEGIN { FS = "[|#][|#]*" } and you'll be satisfied.
>
> If your gawk supports extended regular expressions, you can also use
> this more compact line:
> BEGIN { FS = "[|#]+" }
> Bye
Post Follow-up to this messageKenny, Thanks for the effort. But I couldn't seem to get that line to work, it just hung with no output. Should I run it as: gawk 'gsub(/[^0-9.]+/," ") ' or another way? Neil
Post Follow-up to this messageLe Wed, 18 May 2005 07:56:05 -0700, Neil Webster a écrit_:
> Kenny,
>
> Thanks for the effort. But I couldn't seem to get that line to work,
> it just hung with no output.
>
> Should I run it as: gawk 'gsub(/[^0-9.]+/," ") ' or another way?
>
> Neil
Yes, like :
# awk 'gsub(/[^0-9.]+/," ") '
Or another way, not squashing anything but digits :
# awk 'gsub(/[|#]+/," ")'
Or in the full developped form (in case of further script complication)
# awk '{gsub(/[|#]+/," ");print}'
Post Follow-up to this messageIn article <428b8282$0$24125$626a14ce@news.free.fr>,
Loki Harfagr <loki@DarkDesign.free.fr> wrote:
>Le Wed, 18 May 2005 07:56:05 -0700, Neil Webster a écrit_:
>
>
>Yes, like :
># awk 'gsub(/[^0-9.]+/," ") '
>
>Or another way, not squashing anything but digits :
># awk 'gsub(/[|#]+/," ")'
>
>Or in the full developped form (in case of further script complication)
># awk '{gsub(/[|#]+/," ");print}'
Well, presumably, if we are going to specify it as shell syntax (including
showing the Unix shell prompt), it would be:
# gawk 'gsub(/[^0-9.]+/," ")' yourfile > youroutputfile
Note that frequently when users say "it just hung", it is because they
didn't specify the input file and the program is silently reading standard
input.
Post Follow-up to this messageLoki Harfagr <loki@DarkDesign.free.fr> wrote in message news:<428b8282$0$24125$626a14ce@new
s.free.fr>...
> Le Wed, 18 May 2005 07:56:05 -0700, Neil Webster a écrit_:
>
>
...
> Or in the full developped form (in case of further script complication)
> # awk '{gsub(/[|#]+/," ");print}'
Since our friend is using win2k, it should be:
\>awk "{gsub(/[|#]+/,\" \");print}"
_____
hq00e
Post Follow-up to this messageThanks for these. They are very helpuful.
The mistake I was making was: gawk "{gsub(/[|#]+/,\" \");print}" <
[infile] > [outfile] instead of: gawk "{gsub(/[|#]+/,\" \");print}"
[infile] > [outfile]
A couple of lingering questions.....
1. Is there likely to be a (significant) difference in speed between
iper_linuxiano's suggestion and the above? Or is the speed just
determined by the speed that the file can be read and outputted?
2. What is the difference in the above functions. I always thought
that to input a file to read you had to redirect it (eg. < [infile]).
Thanks
Neil
Post Follow-up to this message
Neil Webster wrote:
> Thanks for these. They are very helpuful.
>
> The mistake I was making was: gawk "{gsub(/[|#]+/,\" \");print}" <
> [infile] > [outfile] instead of: gawk "{gsub(/[|#]+/,\" \");print}"
> [infile] > [outfile]
Both of the commands are supposed to work fine with gawk.
If the fist command doesn't work ... I don't know.
> A couple of lingering questions.....
>
> 1. Is there likely to be a (significant) difference in speed between
> iper_linuxiano's suggestion and the above? Or is the speed just
> determined by the speed that the file can be read and outputted?
It depends on how awk interprets the script and may vary with different
of awk. Here are some not-always-true rules to save time:
1. Awk reads and interprets the script, thus a shorter script is seem
to be faster.
2. Logic operation takes time(especially when processing big files),
don't use logic operation within awk if you have another choice.
3. Don't use pipe operation if it can be done within a script.
> 2. What is the difference in the above functions. I always thought
> that to input a file to read you had to redirect it (eg. < [infile]).
It depends on how commandline utility process its parameters. If it ask
a filename as parameter , just give it a filename. If it deal with
stdin/filecontent you can redirect it to cml.
Most of cml tools in windows do not accept stdin as parameter.
Gawk deal with either filename or stdin as parameter.
Therefore either gawk .... < [infile] > [outfile]
or gawk .... [infile] > [outfile] works.
_____
hq00e
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.