For Programmers: Free Programming Magazines  


Home > Archive > AWK > November 2005 > converting a paragraphs into comma delimited text tables









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author converting a paragraphs into comma delimited text tables
ari.rennt@gmail.com

2005-11-12, 3:55 am

I would like to convert text files with the following format:

title1: paragraph1.
(space)
title2: paraphraph2

e.g.
ALLERGIES: He has no known drug allergies.

IMMUNIZATIONS: up to date.

BIRTH HISTORY: He was born via cesarean section secondary to repeat.
This is usually a paragraph of data with multiple lines. Sometimes the
paragraph text contains colons and numbers as well.

to something that looks like this

"ALLERGIES" "IMMUNIZATIONS" "BIRTH HISTORY"
"He has no known drug allergies. " "up to date." "He was born via
cesarean section secondary to repeat."

I've had very little success and would sincerely appreciate any
help/advice you fine folks could provide!

thanks
ari

Ed Morton

2005-11-12, 6:55 pm

ari.rennt@gmail.com wrote:
> I would like to convert text files with the following format:
>
> title1: paragraph1.
> (space)
> title2: paraphraph2
>
> e.g.
> ALLERGIES: He has no known drug allergies.
>
> IMMUNIZATIONS: up to date.
>
> BIRTH HISTORY: He was born via cesarean section secondary to repeat.
> This is usually a paragraph of data with multiple lines. Sometimes the
> paragraph text contains colons and numbers as well.
>
> to something that looks like this
>
> "ALLERGIES" "IMMUNIZATIONS" "BIRTH HISTORY"
> "He has no known drug allergies. " "up to date." "He was born via
> cesarean section secondary to repeat."
>
> I've had very little success and would sincerely appreciate any
> help/advice you fine folks could provide!
>
> thanks
> ari
>


You can tweak where the spaces appear on the printf and/or strip leading
blanks if you like:

$ awk -vRS="" -F: '{t[NR]=$1;$1="";r[NR]=$0}END{for(i=1;i<=NR;i++)printf
"\"%s\" ",t[i];print "";for(i=1;i<=NR;i++)printf "\"%s\" ",r[i];print
""}' file
"ALLERGIES" "IMMUNIZATIONS" "BIRTH HISTORY"
" He has no known drug allergies." " up to date." " He was born via
cesarean section secondary to repeat."

Regards,

Ed.
William James

2005-11-12, 6:55 pm

ari.rennt@gmail.com wrote:
> I would like to convert text files with the following format:
>
> title1: paragraph1.
> (space)
> title2: paraphraph2
>
> e.g.
> ALLERGIES: He has no known drug allergies.
>
> IMMUNIZATIONS: up to date.
>
> BIRTH HISTORY: He was born via cesarean section secondary to repeat.
> This is usually a paragraph of data with multiple lines. Sometimes the
> paragraph text contains colons and numbers as well.
>
> to something that looks like this
>
> "ALLERGIES" "IMMUNIZATIONS" "BIRTH HISTORY"
> "He has no known drug allergies. " "up to date." "He was born via
> cesarean section secondary to repeat."
>
> I've had very little success and would sincerely appreciate any
> help/advice you fine folks could provide!
>
> thanks
> ari


Save as "convert.awk" and run with
gawk --re-interval -f convert.awk myfile.txt >outfile.txt

BEGIN {
Width = 70
if ( "j" !~ /j{1}/ )
{ print "Must run with 'gawk --re-interval'."
exit
}
# A blank line may contain blanks.
RS = "\n([ \t]*\n)+"
}

match($0, /: */) {
header = header " \"" substr($0,1,RSTART-1) "\""
text = text " \"" strip( substr($0,RSTART+RLENGTH)) "\""
}

END {
display( header )
display( text )
}

function display( s )
{ s = strip( s )
gsub( /[ \t\n]+/, " ", s )
while ( length(s) > Width && match( s, /^.{1,70}[ \t]+/ ) )
{ print strip( substr( s, 1, RLENGTH ) )
s = substr( s, RLENGTH + 1 )
}
if ( s ) print s
}

function strip( s )
{ gsub( /^[ \t]+|[ \t]+$/, "", s )
return s
}

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com