Home > Archive > AWK > May 2005 > Problem With Multiple Field Separators
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Problem With Multiple Field Separators
|
|
| Neil Webster 2005-05-18, 3:56 pm |
| I am using gawk (2.15) on Windows 2000 and I have a file in the form:
123456.123|123654.123|#42315
I would like to output it in the form:
123456.123 123654.123 42315
I have experimented with various attempts in the BEGIN block to declare
the Field Separator (FS=|# or FS=[|#] etc ). But to no avail, I can
get:
123456.123|123654.123| 42315 or 123456.123 123654.123 #42315
But not the output that I require.
I thought a problem might be the field separators and the way that gawk
handles | or #
Any suggestions would be helpful.
Thanks in advance
Neil
| |
| iper_linuxiano 2005-05-18, 3:56 pm |
| You can use a regex with Fields Separators inserted in. Please try:
BEGIN { FS = "[|#][|#]*" } and you'll be satisfied.
If your gawk supports extended regular expressions, you can also use
this more compact line:
BEGIN { FS = "[|#]+" }
Bye
| |
| Kenny McCormack 2005-05-18, 3:56 pm |
| In article <1116419116.440398.31980@g14g2000cwa.googlegroups.com>,
Neil Webster <nswebster@gmail.com> wrote:
>I am using gawk (2.15) on Windows 2000 and I have a file in the form:
>123456.123|123654.123|#42315
>
>I would like to output it in the form:
>123456.123 123654.123 42315
>
>I have experimented with various attempts in the BEGIN block to declare
>the Field Separator (FS=|# or FS=[|#] etc ). But to no avail, I can
>get:
>123456.123|123654.123| 42315 or 123456.123 123654.123 #42315
>But not the output that I require.
>
>I thought a problem might be the field separators and the way that gawk
>handles | or #
My first reaction was, like yours, that something tricky could be done with
FS and OFS, but then I realized that simplest is best (assuming, as always,
that I have your real requirements right):
gsub(/[^0-9.]+/," ")
(Yes, that's the whole program!)
| |
| Neil Webster 2005-05-18, 3:56 pm |
| iper_linuxiano thanks.
Thats just what I was after, I just couldn't get the syntax right.
Thanks again.
Neil
iper_linuxiano wrote:
> You can use a regex with Fields Separators inserted in. Please try:
> BEGIN { FS = "[|#][|#]*" } and you'll be satisfied.
>
> If your gawk supports extended regular expressions, you can also use
> this more compact line:
> BEGIN { FS = "[|#]+" }
> Bye
| |
| Neil Webster 2005-05-18, 3:56 pm |
| Kenny,
Thanks for the effort. But I couldn't seem to get that line to work,
it just hung with no output.
Should I run it as: gawk 'gsub(/[^0-9.]+/," ") ' or another way?
Neil
| |
| Loki Harfagr 2005-05-18, 3:56 pm |
| Le Wed, 18 May 2005 07:56:05 -0700, Neil Webster a écrit_:
> Kenny,
>
> Thanks for the effort. But I couldn't seem to get that line to work,
> it just hung with no output.
>
> Should I run it as: gawk 'gsub(/[^0-9.]+/," ") ' or another way?
>
> Neil
Yes, like :
# awk 'gsub(/[^0-9.]+/," ") '
Or another way, not squashing anything but digits :
# awk 'gsub(/[|#]+/," ")'
Or in the full developped form (in case of further script complication)
# awk '{gsub(/[|#]+/," ");print}'
| |
| Kenny McCormack 2005-05-18, 8:55 pm |
| In article <428b8282$0$24125$626a14ce@news.free.fr>,
Loki Harfagr <loki@DarkDesign.free.fr> wrote:
>Le Wed, 18 May 2005 07:56:05 -0700, Neil Webster a écrit_:
>
>
>Yes, like :
># awk 'gsub(/[^0-9.]+/," ") '
>
>Or another way, not squashing anything but digits :
># awk 'gsub(/[|#]+/," ")'
>
>Or in the full developped form (in case of further script complication)
># awk '{gsub(/[|#]+/," ");print}'
Well, presumably, if we are going to specify it as shell syntax (including
showing the Unix shell prompt), it would be:
# gawk 'gsub(/[^0-9.]+/," ")' yourfile > youroutputfile
Note that frequently when users say "it just hung", it is because they
didn't specify the input file and the program is silently reading standard
input.
| |
|
| Loki Harfagr <loki@DarkDesign.free.fr> wrote in message news:<428b8282$0$24125$626a14ce@news.free.fr>...
> Le Wed, 18 May 2005 07:56:05 -0700, Neil Webster a écrit_:
>
>
....
> Or in the full developped form (in case of further script complication)
> # awk '{gsub(/[|#]+/," ");print}'
Since our friend is using win2k, it should be:
\>awk "{gsub(/[|#]+/,\" \");print}"
_____
hq00e
| |
| Neil Webster 2005-05-19, 8:55 am |
| Thanks for these. They are very helpuful.
The mistake I was making was: gawk "{gsub(/[|#]+/,\" \");print}" <
[infile] > [outfile] instead of: gawk "{gsub(/[|#]+/,\" \");print}"
[infile] > [outfile]
A couple of lingering questions.....
1. Is there likely to be a (significant) difference in speed between
iper_linuxiano's suggestion and the above? Or is the speed just
determined by the speed that the file can be read and outputted?
2. What is the difference in the above functions. I always thought
that to input a file to read you had to redirect it (eg. < [infile]).
Thanks
Neil
| |
| hq00e@126.com 2005-05-19, 3:56 pm |
|
Neil Webster wrote:
> Thanks for these. They are very helpuful.
>
> The mistake I was making was: gawk "{gsub(/[|#]+/,\" \");print}" <
> [infile] > [outfile] instead of: gawk "{gsub(/[|#]+/,\" \");print}"
> [infile] > [outfile]
Both of the commands are supposed to work fine with gawk.
If the fist command doesn't work ... I don't know.
> A couple of lingering questions.....
>
> 1. Is there likely to be a (significant) difference in speed between
> iper_linuxiano's suggestion and the above? Or is the speed just
> determined by the speed that the file can be read and outputted?
It depends on how awk interprets the script and may vary with different
of awk. Here are some not-always-true rules to save time:
1. Awk reads and interprets the script, thus a shorter script is seem
to be faster.
2. Logic operation takes time(especially when processing big files),
don't use logic operation within awk if you have another choice.
3. Don't use pipe operation if it can be done within a script.
> 2. What is the difference in the above functions. I always thought
> that to input a file to read you had to redirect it (eg. < [infile]).
It depends on how commandline utility process its parameters. If it ask
a filename as parameter , just give it a filename. If it deal with
stdin/filecontent you can redirect it to cml.
Most of cml tools in windows do not accept stdin as parameter.
Gawk deal with either filename or stdin as parameter.
Therefore either gawk .... < [infile] > [outfile]
or gawk .... [infile] > [outfile] works.
_____
hq00e
| |
| Ed Morton 2005-05-19, 3:56 pm |
|
Neil Webster wrote:
> Thanks for these. They are very helpuful.
>
> The mistake I was making was: gawk "{gsub(/[|#]+/,\" \");print}" <
> [infile] > [outfile] instead of: gawk "{gsub(/[|#]+/,\" \");print}"
> [infile] > [outfile]
>
> A couple of lingering questions.....
>
> 1. Is there likely to be a (significant) difference in speed between
> iper_linuxiano's suggestion and the above? Or is the speed just
> determined by the speed that the file can be read and outputted?
A test shows Kenny's suggestion is faster.
iper_linuxiano's suggestion needs an extra $1=$1 assignment to force awk
to rebuild $0 from it's fields in order to replace it's FS with OFS in
the output. Here's the comparison using gawk 3.1.4 on Cygwin/Windows XP
on a 100,000 line file:
$ time awk -F"[|#]+" '$1=$1' file > tmp
real 0m2.491s
user 0m2.420s
sys 0m0.093s
$ time awk 'gsub(/[!#]+/," ")' file > tmp
real 0m1.555s
user 0m1.421s
sys 0m0.124s
They both cause $0 to get rebuilt. Kenny's requires an extra function
call per line, so it's not obvious to me why it should be faster.
> 2. What is the difference in the above functions. I always thought
> that to input a file to read you had to redirect it (eg. < [infile]).
No. awk takes file name arguments. One of the pros of using an argument
is that awk can set the FILENAME variable for you so you can print the
name of the file you're operating on. The con is that then awk is
opening the file rather than your shell so you'll get different output
messages from awk vs other commands if there's anything "wrong" with the
file 9e.g. permissions).
Ed.
|
|
|
|
|