For Programmers: Free Programming Magazines  


Home > Archive > AWK > January 2006 > Pulling out fields in a CSV file









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Pulling out fields in a CSV file
usenet@linuxnuts.net

2005-12-13, 3:56 am

I have a CSV that contains device names, IPs, model #s, locations, etc
that I need to pull out, stuff into a template, and dump to stdout. If
I could figure out how to pull out a field and assign it to a shell
variable then I could easily cat my template through sed, change the
values, and dump the output to numerous files. My only problem is I
can't figure out how to take a specific field such as $1 and define it
as $HOSTNAME. This CSV contains all the hosts I used to have monitored
in Nagios. Due to a simultaneous failure of 2 hard drives in a RAID
array and the subsequent meltdown of the attached PERC controller I
have lost all the Nagios config files and their backups.

I can reproduce these files with time and effort but I was hoping to
minimize both this time around if at all possible. I thought about
attempting this in Perl but I'm not really a programmer. I figure this
would be easier to accomplish with Awk. I've tried this before but
basically ran into the same stumbling blocks each time.

The shell scripting is the easy part of this task. A basic for loop, a
handful of sed substitutions, and dumping the output to specific files
is simple to accomplish. My problem is I can't figure out how to take
each of the specific fields from a single line in my CSV and assign it
as the value of a variable in a single for loop iteration so that I can
use it in my sed substition later on in the loop. Any suggestions or
pointers to relevant example commands would be much appreciated.

Thanks
J

karthicksmail@gmail.com

2005-12-13, 3:56 am

usenet@linuxnuts.net wrote:
> I have a CSV that contains device names, IPs, model #s, locations, etc
> that I need to pull out, stuff into a template, and dump to stdout. If
> I could figure out how to pull out a field and assign it to a shell
> variable then I could easily cat my template through sed, change the
> values, and dump the output to numerous files. My only problem is I
> can't figure out how to take a specific field such as $1 and define it
> as $HOSTNAME. This CSV contains all the hosts I used to have monitored
> in Nagios. Due to a simultaneous failure of 2 hard drives in a RAID
> array and the subsequent meltdown of the attached PERC controller I
> have lost all the Nagios config files and their backups.
>


Try this..

#!/bin/bash

NUMREC=`awk ' END { print NR } ' file.txt`
for i in `seq 1 $NUMREC`
do
FIELD1=`head -$i file.txt | tail -1 | awk ' { print $1 } '`
echo $FIELD1
done

This is the basic idea.. there might be better ways..

Thanks and regards,
Karthick S.

J

2005-12-13, 6:57 pm

Karthick,

Thanks for the reply. That set me on the right path. I built on what
you provided to get me close enough to my desired output to work. One
of these days I'll write a nice script for this task but for now this
will do.

#!/bin/bash

NUMREC=`awk ' END { print NR } ' watc.ed`

for i in `seq 1 $NUMREC`;
do
HOSTNAME=`head -n $i watc.ed | tail -n 1 | awk -F "," ' { print $1 }
'`;
IP=`head -n $i watc.ed | tail -n 1 | awk -F "," ' { print $2 } '`;
MODEL=`head -n $i watc.ed | tail -n 1 | awk -F "," ' { print $3 } '`;

SITE=`head -n $i watc.ed | tail -n 1 | awk -F "," ' { print $5 } '`;
BUILDING=`head -n $i watc.ed | tail -n 1 | awk -F "," ' { print $6 }
'`;
ROOM=`head -n $i watc.ed | tail -n 1 | awk -F "," ' { print $7 } '`;
hostname=$(echo $HOSTNAME | tr A-Z a-z)
cat network-template.cfg | sed -e "s/HOSTNAME/$hostname/g" \
-e "s/MODEL/$MODEL/g" \
-e "s/LOCATION/$SITE $BUILDING, $ROOM/g" \
-e "s/IP/$IP/g" > $hostname.watc.local.cfg && echo "Wrote
$hostname.watc.local.cfg";
done

It doesn't take into account inconsistencies in the CSV (like not
having a building or room number) but it works well enough.

Thanks again
J

William Park

2005-12-18, 6:56 pm

usenet@linuxnuts.net wrote:
> I have a CSV that contains device names, IPs, model #s, locations, etc
> that I need to pull out, stuff into a template, and dump to stdout. If
> I could figure out how to pull out a field and assign it to a shell
> variable then I could easily cat my template through sed, change the
> values, and dump the output to numerous files. My only problem is I
> can't figure out how to take a specific field such as $1 and define it
> as $HOSTNAME. This CSV contains all the hosts I used to have monitored
> in Nagios. Due to a simultaneous failure of 2 hard drives in a RAID
> array and the subsequent meltdown of the attached PERC controller I
> have lost all the Nagios config files and their backups.
>
> I can reproduce these files with time and effort but I was hoping to
> minimize both this time around if at all possible. I thought about
> attempting this in Perl but I'm not really a programmer. I figure this
> would be easier to accomplish with Awk. I've tried this before but
> basically ran into the same stumbling blocks each time.
>
> The shell scripting is the easy part of this task. A basic for loop, a
> handful of sed substitutions, and dumping the output to specific files
> is simple to accomplish. My problem is I can't figure out how to take
> each of the specific fields from a single line in my CSV and assign it
> as the value of a variable in a single for loop iteration so that I can
> use it in my sed substition later on in the loop. Any suggestions or
> pointers to relevant example commands would be much appreciated.


If your CSV is simple (ie. with no embedded stuffs), then IFS will do
the trick. For example,

IFS=,
x='11,22,33'

set -- $x; echo $1
read a b c <<< "$x"; echo $a

But, if a field has embedded commas, doublequotes, CR, or LF, then it
becomes complicated. Better to use CSV extension,
http://home.eol.ca/~parkw/index.html#dos_line
http://home.eol.ca/~parkw/index.htm...meter_expansion

For example,

IFS=$' \t\n'
x='"1,1",22,33'

read -C a b c <<< "$x"; echo $a
set -- "${x|,}"; echo $1

--
William Park <opengeometry@yahoo.ca>, Toronto, Canada
ThinFlash: Linux thin-client on USB key (flash) drive
http://home.eol.ca/~parkw/thinflash.html
BashDiff: Super Bash shell
http://freshmeat.net/projects/bashdiff/
Dave Thompson

2006-01-10, 3:58 am

On 13 Dec 2005 09:33:43 -0800, "J" <usenet@linuxnuts.net> wrote:

> Karthick,
>
> Thanks for the reply. That set me on the right path. I built on what
> you provided to get me close enough to my desired output to work. One
> of these days I'll write a nice script for this task but for now this
> will do.
>
> #!/bin/bash
>
> NUMREC=`awk ' END { print NR } ' watc.ed`
>
> for i in `seq 1 $NUMREC`;
> do
> HOSTNAME=`head -n $i watc.ed | tail -n 1 | awk -F "," ' { print $1 }
> '`;
> IP=`head -n $i watc.ed | tail -n 1 | awk -F "," ' { print $2 } '`;
> MODEL=`head -n $i watc.ed | tail -n 1 | awk -F "," ' { print $3 } '`;
>
> SITE=`head -n $i watc.ed | tail -n 1 | awk -F "," ' { print $5 } '`;
> BUILDING=`head -n $i watc.ed | tail -n 1 | awk -F "," ' { print $6 }
> '`;
> ROOM=`head -n $i watc.ed | tail -n 1 | awk -F "," ' { print $7 } '`;


<snip rest>

You don't need head and tail, awk can do the line selection:
HOSTNAME = ` awk -F, -vi=$i 'NR==i {print $1;exit}' watc.ed `
or HOSTNAME = ` awk <watc.ed -F, -vi=$i 'NR==i {print $1;exit}' `
(which I consider a more logical order to think in).

You don't need multiple invocations, most shells (including bash) can
set multiple variables, but you may need to use a temp file. IF your
fields don't include embedded spaces:
for i in `seq blah`; do
awk <watc.ed -F, -vi=$i 'NR==i {$1=$1;print;exit}' >temp
<temp read HOSTNAME IP MODEL NOTUSED SITE BUILDING ROOM
# edit and write file for this item
done

If they might (and IME things like site and building often do), but
not comma, as implied by your simple (simplified?) take on CSV:
for i in `seq blah`; do
awk <watc.ed -vi=$i 'NR==i {print;exit}' >temp
<temp IFS=, read HOSTNAME IP MODEL NOTUSED SITE BUILDING ROOM
# edit and write file for this item
done

(You might be able to use a pipe depending on whether your shell puts
the end-of-pipeline in a child process or not.)

But you don't really need to select lines at all, with awk or
otherwise; just have the shell read in a loop:
<watc.ed while IFS=, read HOSTNAME IP MODEL NOTUSED SITE BUILDING ROOM
do # edit and write file for this item
done

Alternatively and ontopic you could use awk to generate an output file
for each input line and need no nontrivial shell at all:
awk -F, 'NR==FNR {template = template $0 RS;next}
{copy=template; gsub(/IP/, $2, copy); etc etc
print copy > "perhost." $1 }' template watc.ed

If awk on your system is (too) limited on open files, add a close of
the (each) output file after generating it.
- David.Thompson1 at worldnet.att.net
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com