Home > Archive > AWK > November 2005 > converting a paragraphs into comma delimited text tables
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
converting a paragraphs into comma delimited text tables
|
|
| ari.rennt@gmail.com 2005-11-12, 3:55 am |
| I would like to convert text files with the following format:
title1: paragraph1.
(space)
title2: paraphraph2
e.g.
ALLERGIES: He has no known drug allergies.
IMMUNIZATIONS: up to date.
BIRTH HISTORY: He was born via cesarean section secondary to repeat.
This is usually a paragraph of data with multiple lines. Sometimes the
paragraph text contains colons and numbers as well.
to something that looks like this
"ALLERGIES" "IMMUNIZATIONS" "BIRTH HISTORY"
"He has no known drug allergies. " "up to date." "He was born via
cesarean section secondary to repeat."
I've had very little success and would sincerely appreciate any
help/advice you fine folks could provide!
thanks
ari
| |
| Ed Morton 2005-11-12, 6:55 pm |
| ari.rennt@gmail.com wrote:
> I would like to convert text files with the following format:
>
> title1: paragraph1.
> (space)
> title2: paraphraph2
>
> e.g.
> ALLERGIES: He has no known drug allergies.
>
> IMMUNIZATIONS: up to date.
>
> BIRTH HISTORY: He was born via cesarean section secondary to repeat.
> This is usually a paragraph of data with multiple lines. Sometimes the
> paragraph text contains colons and numbers as well.
>
> to something that looks like this
>
> "ALLERGIES" "IMMUNIZATIONS" "BIRTH HISTORY"
> "He has no known drug allergies. " "up to date." "He was born via
> cesarean section secondary to repeat."
>
> I've had very little success and would sincerely appreciate any
> help/advice you fine folks could provide!
>
> thanks
> ari
>
You can tweak where the spaces appear on the printf and/or strip leading
blanks if you like:
$ awk -vRS="" -F: '{t[NR]=$1;$1="";r[NR]=$0}END{for(i=1;i<=NR;i++)printf
"\"%s\" ",t[i];print "";for(i=1;i<=NR;i++)printf "\"%s\" ",r[i];print
""}' file
"ALLERGIES" "IMMUNIZATIONS" "BIRTH HISTORY"
" He has no known drug allergies." " up to date." " He was born via
cesarean section secondary to repeat."
Regards,
Ed.
| |
| William James 2005-11-12, 6:55 pm |
| ari.rennt@gmail.com wrote:
> I would like to convert text files with the following format:
>
> title1: paragraph1.
> (space)
> title2: paraphraph2
>
> e.g.
> ALLERGIES: He has no known drug allergies.
>
> IMMUNIZATIONS: up to date.
>
> BIRTH HISTORY: He was born via cesarean section secondary to repeat.
> This is usually a paragraph of data with multiple lines. Sometimes the
> paragraph text contains colons and numbers as well.
>
> to something that looks like this
>
> "ALLERGIES" "IMMUNIZATIONS" "BIRTH HISTORY"
> "He has no known drug allergies. " "up to date." "He was born via
> cesarean section secondary to repeat."
>
> I've had very little success and would sincerely appreciate any
> help/advice you fine folks could provide!
>
> thanks
> ari
Save as "convert.awk" and run with
gawk --re-interval -f convert.awk myfile.txt >outfile.txt
BEGIN {
Width = 70
if ( "j" !~ /j{1}/ )
{ print "Must run with 'gawk --re-interval'."
exit
}
# A blank line may contain blanks.
RS = "\n([ \t]*\n)+"
}
match($0, /: */) {
header = header " \"" substr($0,1,RSTART-1) "\""
text = text " \"" strip( substr($0,RSTART+RLENGTH)) "\""
}
END {
display( header )
display( text )
}
function display( s )
{ s = strip( s )
gsub( /[ \t\n]+/, " ", s )
while ( length(s) > Width && match( s, /^.{1,70}[ \t]+/ ) )
{ print strip( substr( s, 1, RLENGTH ) )
s = substr( s, RLENGTH + 1 )
}
if ( s ) print s
}
function strip( s )
{ gsub( /^[ \t]+|[ \t]+$/, "", s )
return s
}
|
|
|
|
|