For Programmers: Free Programming Magazines  


Home > Archive > AWK > May 2005 > Problems separating records in a new file









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Problems separating records in a new file
awk@ohnosecond.com

2005-05-19, 3:56 pm

I'm an awk beginner, using DJGPP gawk 3.1.4 on Windows XP. I'm
manipulating a file with 35 fields per record, tab delimited. After
moving a few fields to new positions I'm trying to write a new file
that contains only 18 of the original 35 fields, again tab delimited
(I'll be doing a few other manipulations with the newly-created file
later). I'm trying to use the following code to create the new file:

{ for (i=1; i <= 18; ++i) print $i "\t" > "matter3.txt" }

This line successfully prints a copy of each field with a tab at the
end, but also puts a newline at the end of each field, thereby making
each field a record of its own. I want to end up with the first 18
fields from each original record written as a new 18-field,
tab-delimited record in the new file. What am I missing or doing wrong?


Lee

Chris F.A. Johnson

2005-05-19, 8:55 pm

On Thu, 19 May 2005 at 18:11 GMT, awk@ohnosecond.com wrote:
> I'm an awk beginner, using DJGPP gawk 3.1.4 on Windows XP. I'm
> manipulating a file with 35 fields per record, tab delimited. After
> moving a few fields to new positions I'm trying to write a new file
> that contains only 18 of the original 35 fields, again tab delimited
> (I'll be doing a few other manipulations with the newly-created file
> later). I'm trying to use the following code to create the new file:
>
> { for (i=1; i <= 18; ++i) print $i "\t" > "matter3.txt" }
>
> This line successfully prints a copy of each field with a tab at the
> end, but also puts a newline at the end of each field, thereby making
> each field a record of its own. I want to end up with the first 18
> fields from each original record written as a new 18-field,
> tab-delimited record in the new file. What am I missing or doing wrong?


{ for (i=1; i <= 18; ++i) printf "%s\t" $i > "matter3.txt"; print "" }

It would prbably be faster just to print the entire line and pipe
it through cut:

gawk '.... { print }' | cut -f1-18

--
Chris F.A. Johnson <http://cfaj.freeshell.org>
========================================
==========================
Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress
<http://www.torfree.net/~chris/books/cfaj/ssr.html>
awk@ohnosecond.com

2005-05-19, 8:55 pm


Chris F.A. Johnson wrote:
> On Thu, 19 May 2005 at 18:11 GMT, awk@ohnosecond.com wrote:
>
> { for (i=1; i <= 18; ++i) printf "%s\t" $i > "matter3.txt"; print ""

}
>
> It would prbably be faster just to print the entire line and pipe
> it through cut:
>
> gawk '.... { print }' | cut -f1-18
>


Thanks for the reply, Chris...this looked like it was going to work,
but I get the following when I try to execute it:

C:\home\awk>gawk -F "\t" -f matter.awk test2.txt
gawk: matter.awk:20: (FILENAME=test2.txt FNR=1) fatal: not enough
arguments to satisfy format string
`%s AAA001'
^ ran out for this one

This appears to be the first field in the first record. I've been
seeing a lot of errors like that when I try to use printf, and was
going to try to tackle those problems later, when I process the file
I'm trying to create now. (I'm trying to take this one baby step at a
time). The error message doesn't even make sense to me.

I will try "cut" once I get this part of the problem solved, since I
will need to write out these fields anyway.

Lee

Lee

Janis Papanagnou

2005-05-19, 8:55 pm

awk@ohnosecond.com wrote:
> Chris F.A. Johnson wrote:
>
>
> }
>
>
>
> Thanks for the reply, Chris...this looked like it was going to work,
> but I get the following when I try to execute it:
>
> C:\home\awk>gawk -F "\t" -f matter.awk test2.txt
> gawk: matter.awk:20: (FILENAME=test2.txt FNR=1) fatal: not enough
> arguments to satisfy format string
> `%s AAA001'
> ^ ran out for this one


Function printf requires brackets and comma separated arguments.

printf ("%s\t", $i)

Janis

> This appears to be the first field in the first record. I've been
> seeing a lot of errors like that when I try to use printf, and was
> going to try to tackle those problems later, when I process the file
> I'm trying to create now. (I'm trying to take this one baby step at a
> time). The error message doesn't even make sense to me.
>
> I will try "cut" once I get this part of the problem solved, since I
> will need to write out these fields anyway.
>
> Lee
>
> Lee
>

Chris F.A. Johnson

2005-05-19, 8:55 pm

On Thu, 19 May 2005 at 19:28 GMT, awk@ohnosecond.com wrote:
>
> Chris F.A. Johnson wrote:
> }
>
> Thanks for the reply, Chris...this looked like it was going to work,
> but I get the following when I try to execute it:
>
> C:\home\awk>gawk -F "\t" -f matter.awk test2.txt
> gawk: matter.awk:20: (FILENAME=test2.txt FNR=1) fatal: not enough
> arguments to satisfy format string
> `%s AAA001'
> ^ ran out for this one


Sorry, I forgot a comma (too much shell scripting):

{
for (i=1; i <= 18; ++i) printf "%s\t", $i > "matter3.txt"
print "" > "matter3.txt"
}

> This appears to be the first field in the first record. I've been
> seeing a lot of errors like that when I try to use printf, and was
> going to try to tackle those problems later, when I process the file
> I'm trying to create now. (I'm trying to take this one baby step at
> a time). The error message doesn't even make sense to me.
>
> I will try "cut" once I get this part of the problem solved, since I
> will need to write out these fields anyway.


Which part of the problem? If you use cut, you don't need printf;
just print the entire line.

--
Chris F.A. Johnson <http://cfaj.freeshell.org>
========================================
==========================
Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress
<http://www.torfree.net/~chris/books/cfaj/ssr.html>
Chris F.A. Johnson

2005-05-19, 8:55 pm

On Thu, 19 May 2005 at 19:45 GMT, Janis Papanagnou wrote:
> awk@ohnosecond.com wrote:
>
> Function printf requires brackets and comma separated arguments.
>
> printf ("%s\t", $i)


It requires the comma (which I keep forgetting, since the shell
version doesn't), but the parentheses are not necessary.

--
Chris F.A. Johnson <http://cfaj.freeshell.org>
========================================
==========================
Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress
<http://www.torfree.net/~chris/books/cfaj/ssr.html>
Kenny McCormack

2005-05-19, 8:55 pm

In article <c5avl2-1p8.ln1@rogers.com>,
Chris F.A. Johnson <cfajohnson@gmail.com> wrote:
....
> It requires the comma (which I keep forgetting, since the shell
> version doesn't), but the parentheses are not necessary.


The parentheses are needed in one specific situation. Name it.

Janis Papanagnou

2005-05-19, 8:55 pm

Chris F.A. Johnson wrote:
> On Thu, 19 May 2005 at 19:45 GMT, Janis Papanagnou wrote:
>
> It requires the comma (which I keep forgetting, since the shell
> version doesn't), but the parentheses are not necessary.


Yes, that's right. (I am just used to write brackets with printf.)

Janis
awk@ohnosecond.com

2005-05-19, 8:55 pm

OK, that takes care of part of it...the comma took care of the error
messages, and now all the fields are tab-separated, but there's still
no separation between records...everything appears as one long record
with fields separated by tabs, no line endings.

The comma lesson was invaluable, thank you. And I will look at cut
right now.

Ed Morton

2005-05-19, 8:55 pm



awk@ohnosecond.com wrote:

> I'm an awk beginner, using DJGPP gawk 3.1.4 on Windows XP. I'm
> manipulating a file with 35 fields per record, tab delimited. After
> moving a few fields to new positions I'm trying to write a new file
> that contains only 18 of the original 35 fields, again tab delimited
> (I'll be doing a few other manipulations with the newly-created file
> later). I'm trying to use the following code to create the new file:
>
> { for (i=1; i <= 18; ++i) print $i "\t" > "matter3.txt" }
>
> This line successfully prints a copy of each field with a tab at the
> end, but also puts a newline at the end of each field, thereby making
> each field a record of its own. I want to end up with the first 18
> fields from each original record written as a new 18-field,
> tab-delimited record in the new file. What am I missing or doing wrong?


No need for a loop - just get rid of the trailing 17 fields, e.g. with a
POSIX awk:

awk 'sub(/\t*([^\t]*\t*){17}$/,"")'

gawk can do that when called with "--re-interval" or "--posix" options.

Ed.
Kenny McCormack

2005-05-19, 8:55 pm

In article <1116536972.215170.198940@g43g2000cwa.googlegroups.com>,
awk@ohnosecond.com <lee@ohnosecond.com> wrote:
>OK, that takes care of part of it...the comma took care of the error
>messages, and now all the fields are tab-separated, but there's still
>no separation between records...everything appears as one long record
>with fields separated by tabs, no line endings.
>
>The comma lesson was invaluable, thank you. And I will look at cut
>right now.


And, presumably, post about it in some other newsgroup.

Ed Morton

2005-05-19, 8:55 pm



Ed Morton wrote:
>
>
> awk@ohnosecond.com wrote:
>
<snip>[color=darkred]
> No need for a loop - just get rid of the trailing 17 fields, e.g. with a
> POSIX awk:
>
> awk 'sub(/\t*([^\t]*\t*){17}$/,"")'


Ooops, unnecessary *s:

awk 'sub(/\t([^\t]*\t){17}$/,"")'

Regards,

Ed.
Chris F.A. Johnson

2005-05-19, 8:55 pm

On Thu, 19 May 2005 at 21:09 GMT, awk@ohnosecond.com wrote:
> OK, that takes care of part of it...the comma took care of the error
> messages, and now all the fields are tab-separated, but there's still
> no separation between records...everything appears as one long record
> with fields separated by tabs, no line endings.


Note the 'print ""' after the loop:

for (i=1; i <= 18; ++i) printf "%s\t" $i > "matter3.txt"; print ""

> The comma lesson was invaluable, thank you. And I will look at cut
> right now.



--
Chris F.A. Johnson <http://cfaj.freeshell.org>
========================================
==========================
Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress
<http://www.torfree.net/~chris/books/cfaj/ssr.html>
awk@ohnosecond.com

2005-05-19, 8:55 pm

Yes, I have the 'print ""' at the end of the line, but it doesn't put
the line endings in the file -- it appears to print them on the screen,
so I have a wide gap in my command window between the command line and
the prompt, which I never had until I started using the modified line
from this group. (I'm working with a five record subset of the "real"
file I'll be working on, which is over 6,000 records long). Is there
something I could be doing wrong to make this happen?

martin cohen

2005-05-19, 8:55 pm

awk@ohnosecond.com wrote:
> Yes, I have the 'print ""' at the end of the line, but it doesn't put
> the line endings in the file -- it appears to print them on the screen,
> so I have a wide gap in my command window between the command line and
> the prompt, which I never had until I started using the modified line
> from this group. (I'm working with a five record subset of the "real"
> file I'll be working on, which is over 6,000 records long). Is there
> something I could be doing wrong to make this happen?
>

The final print should go to the file, not the screen.

So, it should be print "" > "matter3.txt";

Actually, I would put the name of the output file in a variable, using
my slogan "Never write anything twice."

Martin Cohen
Patrick TJ McPhee

2005-05-20, 3:56 am

In article <1116536972.215170.198940@g43g2000cwa.googlegroups.com>,
awk@ohnosecond.com <lee@ohnosecond.com> wrote:

% OK, that takes care of part of it...the comma took care of the error
% messages, and now all the fields are tab-separated, but there's still
% no separation between records...everything appears as one long record
% with fields separated by tabs, no line endings.

You could do something like this:

{
o = $1

for (i = 2; i <= 17; i++) o = o "\t" $i
print o
}

Using sub to strip off the trailing fields is a common way of dealing
with this kind of problem, and I like using match and substr:

{ match($0, /^[^\t]+(\t[^\t]+){16}/); print substr($0, RSTART, RLENGTH) }
--

Patrick TJ McPhee
North York Canada
ptjm@interlog.com
awk@ohnosecond.com

2005-05-20, 3:55 pm

Thank you, Patrick, that worked! The records are coming out just right
now. And I'll check out your code sub for stripping the unneeded fields
to make it more efficient. Thanks to all who answered, as well.

Lee


Patrick TJ McPhee wrote:
> In article <1116536972.215170.198940@g43g2000cwa.googlegroups.com>,
> awk@ohnosecond.com <lee@ohnosecond.com> wrote:
>
> % OK, that takes care of part of it...the comma took care of the

error
> % messages, and now all the fields are tab-separated, but there's

still
> % no separation between records...everything appears as one long

record
> % with fields separated by tabs, no line endings.
>
> You could do something like this:
>
> {
> o = $1
>
> for (i = 2; i <= 17; i++) o = o "\t" $i
> print o
> }
>
> Using sub to strip off the trailing fields is a common way of dealing
> with this kind of problem, and I like using match and substr:
>
> { match($0, /^[^\t]+(\t[^\t]+){16}/); print substr($0, RSTART,

RLENGTH) }
> --
>
> Patrick TJ McPhee
> North York Canada
> ptjm@interlog.com


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com