For Programmers: Free Programming Magazines  


Home > Archive > AWK > May 2005 > FS, RS - Problem









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author FS, RS - Problem
chrishunnell@gmail.com

2005-05-05, 3:56 pm

Hi,

I have a problem with an awk-script! I extracted some infos out of a
database and I know that there are 33 columns - the problem i have is:
within these columns there are some with carriage return in it.
I have to eliminate those carriage returns. But when awk "sees" one of
those carriage returns he operates as if the next column was a new row.

I hope my description was clear enough, if not feel free to ask more
details, but I think my description was clear enough.

Greetings

Ed Morton

2005-05-05, 3:56 pm



chrishunnell@gmail.com wrote:
> Most of the columns are of fixed size but some are not. This is why I
> am not able to use the FIELDWIDTHS.
> This would be some possible input for the script:
>
> index£stuff£more
> \
> problem here\
> more problem\
> £end of line£
>
> and this should be the output:
> index£stuff£more problem here more problem£end of line£
>
> At the moment the sample seems to be 5 records, but actually it is only
> one record.
> Is it possible to concat the 5 records?
>


Should there be a backslash at the end of that first line? if so then
this would work:

gawk -vRS="#$" -vORS="" '{gsub(/\\\n/," ")}1'

if not then you'd need:

gawk -vRS="#$" -vORS="" '{gsub(/\\\n/," ");gsub(/\n^ /," ")}1'

The above assumes your real line always ends in a pound sign as your
sample input showed.

You need gawk in the above to use an RS with multiple characters. I
substituted a hash ("#") for the pound sign since I don't have that on
my keyboard.

Ed.
chrishunnell@gmail.com

2005-05-05, 3:56 pm

Ed,

I want to think you for solving my problem.

Greetings,

Chris

Ed Morton

2005-05-06, 8:55 am



chrishunnell@gmail.com wrote:
> after I analysed the script a few minutes i understood it and it is
> very simple - but one question arose:
> what does the 1 do?
>


It's a true condition which invokes the default action of printing $0.
Remove it and you'll see no output.

Ed.
Ed Morton

2005-05-06, 3:56 pm



chrishunnell@gmail.com wrote:
> after I analysed the script a few minutes i understood it and it is
> very simple


Then try this ;-) :

gawk 'BEGIN{RS="[\\\\]\n|\n"}{ORS=RT~/\\/?"":"\n"}1'

It's a more idiomatic solution for the general question of "how do I
join lines that end in backslashes?". It wouldn't work as-is for the
input sample you posted since your to-be-joined lines don't always end
in backslashes and you want to replace the backslash-newlines with spaces.

Ed.
Ed Morton

2005-05-06, 3:56 pm



Ed Morton wrote:

>
>
> chrishunnell@gmail.com wrote:
>
>
>
> Then try this ;-) :
>
> gawk 'BEGIN{RS="[\\\\]\n|\n"}{ORS=RT~/\\/?"":"\n"}1'


Make that:

gawk 'BEGIN{RS="\\\\\n|\n"}{ORS=RT~/\\/?"":"\n"}1'

Ed.
Kenny McCormack

2005-05-06, 3:56 pm

In article <6ZSdna1AZ9BkNuXfRVn-jg@comcast.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
....
>
>Make that:
>
>gawk 'BEGIN{RS="\\\\\n|\n"}{ORS=RT~/\\/?"":"\n"}1'


Change that to:

BEGIN{RS="\\\\\n|\n"}ORS=RT~/\\/?" ":"\n"

and you get the desired "backslashes changed into spaces" behavior, as well
as saving a few more keystrokes.

chrishunnell@gmail.com

2005-05-08, 8:55 pm

Most of the columns are of fixed size but some are not. This is why I
am not able to use the FIELDWIDTHS.
This would be some possible input for the script:

index=A3stuff=A3more
\
problem here\
more problem\
=A3end of line=A3

and this should be the output:
index=A3stuff=A3more problem here more problem=A3end of line=A3

At the moment the sample seems to be 5 records, but actually it is only
one record.
Is it possible to concat the 5 records?

Ed Morton

2005-05-08, 8:55 pm



chrishunnell@gmail.com wrote:
> Most of the columns are of fixed size but some are not. This is why I
> am not able to use the FIELDWIDTHS.
> This would be some possible input for the script:
>
> index£stuff£more
> \
> problem here\
> more problem\
> £end of line£
>
> and this should be the output:
> index£stuff£more problem here more problem£end of line£
>
> At the moment the sample seems to be 5 records, but actually it is only
> one record.
> Is it possible to concat the 5 records?
>


Should there be a backslash at the end of that first line? if so then
this would work:

gawk -vRS="#$" -vORS="" '{gsub(/\\\n/," ")}1'

if not then you'd need:

gawk -vRS="#$" -vORS="" '{gsub(/\\\n/," ");gsub(/\n^ /," ")}1'

The above assumes your real line always ends in a pound sign as your
sample input showed.

You need gawk in the above to use an RS with multiple characters. I
substituted a hash ("#") for the pound sign since I don't have that on
my keyboard.

Ed.
Kenny McCormack

2005-05-09, 3:55 am

In article <6ZSdna1AZ9BkNuXfRVn-jg@comcast.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
....
>
>Make that:
>
>gawk 'BEGIN{RS="\\\\\n|\n"}{ORS=RT~/\\/?"":"\n"}1'


Change that to:

BEGIN{RS="\\\\\n|\n"}ORS=RT~/\\/?" ":"\n"

and you get the desired "backslashes changed into spaces" behavior, as well
as saving a few more keystrokes.

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com