For Programmers: Free Programming Magazines  


Home > Archive > AWK > February 2005 > Merging lines and adding FS









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Merging lines and adding FS
mbalu

2005-02-07, 8:56 pm

Hi,

I have a DOS file in this format:
Ace Construction Co., Inc.
Fax: 123-555-1212
101 Main St., Ste 23., Jensen Beach FL 34958
321-555-6523
< second record>
< third record>
.....

I would like to get it into the following format:
Ace Construction Co., Inc.|Fax: 123-555-1212|101 Main St., Ste 23.,
|Jensen Beach |FL|34958|321-555-6523

Is there anyway that I can do it in awk. I am using the following
program, but don't know how to split the address into <street
address>|city|state|zip. I am not a programmer and any help would be
appreciated. Thank you.

BEGIN { OFS = "|" }
{
if (NR % 4)
{gsub("^M", OFS) <removing dos line breaks>
printf("%s", $0)
}
else
printf("%s\n", $0)
}
END { }.

William James

2005-02-07, 8:56 pm

mbalu wrote:
> Hi,
>
> I have a DOS file in this format:
> Ace Construction Co., Inc.
> Fax: 123-555-1212
> 101 Main St., Ste 23., Jensen Beach FL 34958
> 321-555-6523
> < second record>
> < third record>
> ....
>
> I would like to get it into the following format:
> Ace Construction Co., Inc.|Fax: 123-555-1212|101 Main St., Ste 23.,
> |Jensen Beach |FL|34958|321-555-6523



* BEGIN { FS = ", +" }
* (NR-1)%4<2 { rec = rec $0 "|" ; next }
* 3==NR%4 {
* sub( /^ +/, "" )
* for (i=1; i<NF; i++)
* rec = rec $i ", "
* sub( / $/, "", rec )
* rec = rec "|"
* $0 = $NF
* match( $0, / [A-Z][A-Z] / )
* rec = rec substr($0,1,RSTART-1) "|"
* rec = rec substr($0,RSTART+1,2) "|"
* rec = rec substr($0,RSTART+4) "|"
* next
* }
* { print rec $0 ; rec = "" }

Ed Morton

2005-02-08, 8:55 pm



mbalu wrote:
> Hi,
>
> I have a DOS file in this format:
> Ace Construction Co., Inc.
> Fax: 123-555-1212
> 101 Main St., Ste 23., Jensen Beach FL 34958
> 321-555-6523
> < second record>
> < third record>
> ....
>
> I would like to get it into the following format:
> Ace Construction Co., Inc.|Fax: 123-555-1212|101 Main St., Ste 23.,
> |Jensen Beach |FL|34958|321-555-6523

<snip>

This should do it in gawk for the sample you showed:

gawk '{gsub("^M","")}
++num==3{$0=gensub(/^[[:blank:]]*(.*,) (.*) (.*) ([[:digit:]]*)$/,
"\\1|\\2|\\3|\\4","") }
{ line = line sep $0; sep="|" }
!(NR%4){print line; num=0; line=""; sep=""}'

Playing with the gensub patterns should fix any problems in parsing your
real input.

Regards,

Ed.
William James

2005-02-08, 8:55 pm

Using standard Awk:

* 3==NR%4 { sub(/.*,/, "&|")
* $(NF-1) = "|" $(NF-1) "|" }
* 0==NR%4 {gsub(/[, ]*\| */, "|", rec)
* print rec $0; rec=""; next}
* {rec = rec $0 "|"}

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com