For Programmers: Free Programming Magazines  


Home > Archive > AWK > February 2005 > update a column - all rows









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author update a column - all rows
kshave

2005-02-17, 3:59 pm

I have a fixed field length text file made up of 20 fields. One of the
fields (let's say field 16) is a 9 digit numeric field. I need to add a
constant value (eg. 100) to that column for every row.

(file before)
col1,col2....col15,101299300,col16..col20
col1,col2....col15,992123225,col16..col20

(file after)
col1,col2....col15,101299400,col16..col20
col1,col2....col15,992123325,col16..col20

What's the most efficient way to handle this? (The file is very large)

Ed Morton

2005-02-17, 8:56 pm



kshave wrote:
> I have a fixed field length text file made up of 20 fields. One of the
> fields (let's say field 16) is a 9 digit numeric field. I need to add a
> constant value (eg. 100) to that column for every row.
>
> (file before)
> col1,col2....col15,101299300,col16..col20
> col1,col2....col15,992123225,col16..col20
>
> (file after)
> col1,col2....col15,101299400,col16..col20
> col1,col2....col15,992123325,col16..col20
>
> What's the most efficient way to handle this? (The file is very large)
>


I suggest you just collect solutions and try them to see which is most
efficient. If your input really does have commas between the fields as
you show, then you can do:

gawk 'BEGIN{FS=OFS=","}($16+=100)1'

If there are no field separators and you have to rely on fixed widths,
then you can do:

gawk 'BEGIN{FIELDWIDTHS="width1 width2 ...";OFS=""}($16+=100)1'

See the gawk man page (http://www.rt.com/man/gawk.1.html) or other
on-line documentation (e.g.
http://www.gnu.org/software/gawk/ma...l#Constant-Size) for
more details.

Regards,

Ed.
kshave

2005-02-17, 8:56 pm

Sorry, the example is slightly off. There are no comma delimiters. The
fields are fixed length.

I'll try this Ed and see how well it works.

gawk 'BEGIN{FIELDWIDTHS="width1 width2 ...";OFS=""}($16+=100)1'

kshave

2005-02-17, 8:56 pm

I was able to get with working (under AWK), but the output file was
created without the proper spacing between the columns. The fixed
length fields no longer exist as they are now one long string without
any spaces.

kshave

2005-02-17, 8:56 pm

I was able to get with working (under AWK), but the output file was
created without the proper spacing between the columns. The fixed
length fields no longer exist as they are now one long string without
any spaces.

Ed Morton

2005-02-18, 3:55 am



kshave wrote:
> I was able to get with working (under AWK), but the output file was
> created without the proper spacing between the columns. The fixed
> length fields no longer exist as they are now one long string without
> any spaces.
>


Are you sure you're using gawk? If so, then I don't know what you mean
by the problem you're describing. Given this input file:

col1,col2....col15,101299300,col16..col20
col1,co ...col15,992123225,col16..col20

and this script:

gawk 'BEGIN{FIELDWIDTHS="5 14 9 13";OFS=""}($3+=100)1'

the output is:

col1,col2....col15,101299400,col16..col20
col1,co ...col15,992123325,col16..col20

which is what I'd expect. The only problem I foresee is when the number
you're adding to rolls over into an extra digit as a result of adding
100, then you need to decide how to truncate but that doesn't sound like
the problem you're reporting. You need to show some real sample input
and output to explain the problem.

Ed.
William James

2005-02-18, 3:55 am


kshave wrote:
> Sorry, the example is slightly off. There are no comma delimiters.

The
> fields are fixed length.
>
> I'll try this Ed and see how well it works.
>
> gawk 'BEGIN{FIELDWIDTHS="width1 width2 ...";OFS=""}($16+=100)1'


*# Adjust col to your needs.
*BEGIN { col = 20;wd=9 }
*{ num = substr($0,col,wd) + 100
* printf "%s%0" wd "d%s\n",
* substr($0,1,col-1),num,substr($0,col+wd)
*}

Kenny McCormack

2005-02-18, 3:56 pm

In article <4badnUqtaJ4r94jfRVn-gw@comcast.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
>
>
>kshave wrote:
>
>Are you sure you're using gawk? If so, then I don't know what you mean
>by the problem you're describing. Given this input file:
>
>col1,col2....col15,101299300,col16..col20
>col1,co ...col15,992123225,col16..col20
>
>and this script:
>
>gawk 'BEGIN{FIELDWIDTHS="5 14 9 13";OFS=""}($3+=100)1'
>
>the output is:
>
>col1,col2....col15,101299400,col16..col20
>col1,co ...col15,992123325,col16..col20


Remember, OP's file does *not* contain comma delimiters. It is fixed
width.

>which is what I'd expect. The only problem I foresee is when the number
>you're adding to rolls over into an extra digit as a result of adding
>100, then you need to decide how to truncate but that doesn't sound like
>the problem you're reporting. You need to show some real sample input
>and output to explain the problem.


The problem is real - that is, when you do arithmetic on a column, the line
gets rebuilt, using OFS (if OFS is the empty string, then it gets smushed
together).

Observe:

% echo '1 2 3 ' | gawk 'BEGIN{FIELDWIDTHS="2 5 2"}{print;for (i=1; i<=NF; i++) $i+=4;print}'
1 2 3
5 6 7
% echo '1 2 3 ' | gawk 'BEGIN{FIELDWIDTHS="2 5 2";OFS=""}{print;for (i=1; i<=NF; i++) $i+=4;print}'
1 2 3
567
%

Ed Morton

2005-02-18, 3:56 pm



Kenny McCormack wrote:
> In article <4badnUqtaJ4r94jfRVn-gw@comcast.com>,
> Ed Morton <morton@lsupcaemnt.com> wrote:
>
>
>
> Remember, OP's file does *not* contain comma delimiters. It is fixed
> width.


Do you SEE me using comma delimiters anywhere in that script?

>
>
>
> The problem is real - that is, when you do arithmetic on a column, the line
> gets rebuilt, using OFS (if OFS is the empty string, then it gets smushed
> together).


That doesn't apply to fixed width fields. With fixed width fields the
only thing that gets smushed is the specific field you're adding to IF
it had leading or trailing blanks (which the OPs input doesn't have -
it's a 9-DIGIT NUMERIC FIELD) and the resulting field is not the same
size as the expected input. If that field could contain blanks (or
periods or other characters), then he would have a problem and require
at least a printf solution but that's not what he said, nor is is what
he showed as his sample input. If OFS was not set to "", then we'd see
spaces ADDED to the output as field separators.

> Observe:
>
> % echo '1 2 3 ' | gawk 'BEGIN{FIELDWIDTHS="2 5 2"}{print;for (i=1; i<=NF; i++) $i+=4;print}'
> 1 2 3
> 5 6 7
> % echo '1 2 3 ' | gawk 'BEGIN{FIELDWIDTHS="2 5 2";OFS=""}{print;for (i=1; i<=NF; i++) $i+=4;print}'
> 1 2 3
> 567
> %


Observe:

% echo '123' | gawk 'BEGIN{FIELDWIDTHS="1 1 1"}{print;for (i=1;
i<=NF;i++) $i+=4;print}'
123
5 6 7
% echo '123' | gawk 'BEGIN{FIELDWIDTHS="1 1 1";OFS=""}{print;for (i=1;
i<=NF; i++) $i+=4;print}'
123
567
% echo ' 1 2 3' | gawk 'BEGIN{FIELDWIDTHS="3 3 3"}{print; $2+=4; print}'
1 2 3
1 6 3
% echo ' 1 2 3' | gawk 'BEGIN{FIELDWIDTHS="3 3 3";OFS=""}{print;
$2+=4; print}'
1 2 3
16 3
% echo ' 1 2 3' | gawk 'BEGIN{FIELDWIDTHS="3 3 3";OFS=""}{print
;$2=sprintf("%3d",$2+4);print}'
1 2 3
1 6 3

That last will only work if the OP wants his field right-justfied and
his problem is just surrounding blanks.

Regards,

Ed.

kshave

2005-02-21, 3:55 am

I was able to get with working (under AWK), but the output file was
created without the proper spacing between the columns. The fixed
length fields no longer exist as they are now one long string without
any spaces.

Ed Morton

2005-02-22, 8:55 pm



kshave wrote:
> I was able to get with working (under AWK), but the output file was
> created without the proper spacing between the columns. The fixed
> length fields no longer exist as they are now one long string without
> any spaces.
>


Are you sure you're using gawk? If so, then I don't know what you mean
by the problem you're describing. Given this input file:

col1,col2....col15,101299300,col16..col20
col1,co ...col15,992123225,col16..col20

and this script:

gawk 'BEGIN{FIELDWIDTHS="5 14 9 13";OFS=""}($3+=100)1'

the output is:

col1,col2....col15,101299400,col16..col20
col1,co ...col15,992123325,col16..col20

which is what I'd expect. The only problem I foresee is when the number
you're adding to rolls over into an extra digit as a result of adding
100, then you need to decide how to truncate but that doesn't sound like
the problem you're reporting. You need to show some real sample input
and output to explain the problem.

Ed.
kshave

2005-02-24, 3:55 pm

I worked around the formatting problem.

Thanks for the solutions!

-Keith

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com