For Programmers: Free Programming Magazines  


Home > Archive > AWK > April 2007 > Awk script to separate spaces and tabs









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Awk script to separate spaces and tabs
explor

2007-03-01, 9:57 pm

Hi,
I have bunch of file as shown below "sample.txt" with multiple lines.
Each line is separated either by TAB or BLANK space. I am trying to
write shell script which gives me col 3 and corrosponding col 4. so
that i can see generate a report for each line in each of the text
file with size and # no files in each folder. Also need to copy the
data from corrponding foler in each line to new folder say the first
line is

test.com emp Announcements /ms/0B/03/81512

I need foldername= Announcements
corrosponding dirpath=/ms/0B/03/81512

Report:
Folder: Announcements
# files= `ls /ms/0B/03/81512`
Size: `du -sk /ms/0B/03/81512`

Becasue of tabs and white sapces betwen the colums and white spaces in
col 3 its difficult to separate with cat and awk . Any help is greatly
apprciated.

for i in `cat /tmp/sample.txt`
do
echo $i
dirpath=`echo $i | awk -F"\t" '{print $4}'`
# echo $dirpath
# foldername=`echo $i | awk -F"\t" '{print $3}'`
#/bin/echo "size = \c"
# du -s $dirpath
#/bin/echo "largest = \c"
# ls -ltsr $dirpath | sort -n | tail -1
#/bin/echo "num msgs = \c"
# ls $dirpath | wc -l
done


sample.txt
test.com emp Announcements /ms/0B/03/81512
test.com emp BOD /ms/76/0F/81513
test.com emp CGC /ms2/76/0C/81517
test.com emp Drafts /ms/21/03/81526
test.com emp INBOX /ms2/6B/0E/81511
test.com emp "Junk E-mail" /ms/62/1C/81569
test.com emp Personal /ms/15/00/81578
test.com emp "some sap" /ms2/11/02/81579
test.com emp Sent /ms2/22/1A/81580
test.com emp Templates /ms2/07/17/81581
test.com emp Trash /ms2/49/16/81582
test.com emp "Work Request" /ms2/01/0F/81583
test.com emp admins /ms2/6C/0A/81584
test.com emp dumpster /ms2/77/10/81585
test.com emp ec /ms2/19/07/81586
test.com emp "emp personal" /ms2/50/

Thomas J.

2007-03-02, 7:57 am

awk's fieldseparator (FS) cant be a RE,
so you have to "split" your records.

try

awk '{split($0,a,"([ \t])+");print a[3];}')
awk '{split($0,a,"([ \t])+");print a[4];}')

man awk

hth,

Thomas


Vassilis

2007-03-02, 7:57 am


=CF/=C7 Thomas J. =DD=E3=F1=E1=F8=E5:
> awk's fieldseparator (FS) cant be a RE,
> so you have to "split" your records.
>
> try
>
> awk '{split($0,a,"([ \t])+");print a[3];}')
> awk '{split($0,a,"([ \t])+");print a[4];}')
>
> man awk
>
> hth,
>
> Thomas


Or you can pass the field separator in cmdline like this:

awk -F'\t' '{ gsub(/"/, "", $3); print $3 }'
awk -F'\t' '{ print $4 }'

Ed Morton

2007-03-02, 7:57 am

Thomas J. wrote:

> awk's fieldseparator (FS) cant be a RE,


GNU awks can.

> so you have to "split" your records.
>
> try
>
> awk '{split($0,a,"([ \t])+");print a[3];}')
> awk '{split($0,a,"([ \t])+");print a[4];}')


The trailing semicolons aren't necessary and chains of white space are
the default FS so, unless there's some white space characters you're
specicially trying to exclude, the above is equivalent to:

awk '{print $3}'
awk '{print $4}'

> man awk


Indeed ;-).

Ed.
Ed Morton

2007-03-02, 7:57 am

Vassilis wrote:

> Ï/Ç Thomas J. Ýãñáøå:
>
>
>
> Or you can pass the field separator in cmdline like this:
>
> awk -F'\t' '{ gsub(/"/, "", $3); print $3 }'
> awk -F'\t' '{ print $4 }'
>


In the part that was snipped the OP said:

> Becasue of tabs and white sapces betwen the colums and white spaces in
> col 3...


so you can't just use a single tab as the field separator.

Ed.
Ed Morton

2007-03-02, 7:57 am

explor wrote:

> Hi,
> I have bunch of file as shown below "sample.txt" with multiple lines.
> Each line is separated either by TAB or BLANK space. I am trying to
> write shell script which gives me col 3 and corrosponding col 4. so
> that i can see generate a report for each line in each of the text
> file with size and # no files in each folder. Also need to copy the
> data from corrponding foler in each line to new folder say the first
> line is
>
> test.com emp Announcements /ms/0B/03/81512
>
> I need foldername= Announcements
> corrosponding dirpath=/ms/0B/03/81512
>
> Report:
> Folder: Announcements
> # files= `ls /ms/0B/03/81512`
> Size: `du -sk /ms/0B/03/81512`
>
> Becasue of tabs and white sapces betwen the colums and white spaces in
> col 3 its difficult to separate with cat and awk . Any help is greatly
> apprciated.
>
> for i in `cat /tmp/sample.txt`
> do
> echo $i
> dirpath=`echo $i | awk -F"\t" '{print $4}'`
> # echo $dirpath
> # foldername=`echo $i | awk -F"\t" '{print $3}'`
> #/bin/echo "size = \c"
> # du -s $dirpath
> #/bin/echo "largest = \c"
> # ls -ltsr $dirpath | sort -n | tail -1
> #/bin/echo "num msgs = \c"
> # ls $dirpath | wc -l
> done
>
>
> sample.txt
> test.com emp Announcements /ms/0B/03/81512
> test.com emp BOD /ms/76/0F/81513
> test.com emp CGC /ms2/76/0C/81517
> test.com emp Drafts /ms/21/03/81526
> test.com emp INBOX /ms2/6B/0E/81511
> test.com emp "Junk E-mail" /ms/62/1C/81569
> test.com emp Personal /ms/15/00/81578
> test.com emp "some sap" /ms2/11/02/81579
> test.com emp Sent /ms2/22/1A/81580
> test.com emp Templates /ms2/07/17/81581
> test.com emp Trash /ms2/49/16/81582
> test.com emp "Work Request" /ms2/01/0F/81583
> test.com emp admins /ms2/6C/0A/81584
> test.com emp dumpster /ms2/77/10/81585
> test.com emp ec /ms2/19/07/81586
> test.com emp "emp personal" /ms2/50/
>


This (untested) should give you the 2 fields you want, assuming that
your directory names never contain white space:

awk '{
dirpath=$NF
sub(/ [^[:space:]]+[[:space:]]+[^[:space:]]+[[
:space:]]+/,"")
sub(/[[:space:]]+[^[:space:]]+$/,"")
gsub(/"/,"")
foldername=$0
printf "dirpath=\"%s\" foldername=\"%s\"\n",dirpath,foldername
}' sample.txt

The first sub deletes the first 2 fields and subsequent white space. The
second sub deletes the last field and preceeding white space. The end
result is just the third field.

You can use "eval" if you want to directly create shell variables of the
same names as used inthe awk output (ask in comp.unix.shell about that
if you're unsure).

Regards,

Ed.
Thomas J.

2007-03-02, 7:57 am

On 2 Mrz., 14:11, Ed Morton <mor...@lsupcaemnt.com> wrote:
> Thomas J. wrote:
>
> GNU awks can.


the OP uses unspecific awk...

>
>
>
>
> The trailing semicolons aren't necessary and chains of white space are
> the default FS so, unless there's some white space characters you're
> specicially trying to exclude, the above is equivalent to:
>
> awk '{print $3}'
> awk '{print $4}'


I dont agree.
My hint does defnitly not the same as
awk '{print $3}'

but you are right:
my hints doesnt meet the requirements of the OP.

>
>
> Indeed ;-).
>


or read Ed Mortons posts ;)

Thank you,

Thomas


Vassilis

2007-03-02, 6:57 pm


=CF/=C7 Thomas J. =DD=E3=F1=E1=F8=E5:
> On 2 Mrz., 14:11, Ed Morton <mor...@lsupcaemnt.com> wrote:
>
> or read Ed Mortons posts ;)


That's the key exactly :)

Kenny McCormack

2007-03-02, 6:57 pm

In article < D9CdnaTu6tcav3XYnZ2dnUVZ_vyunZ2d@comcast
.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
>Thomas J. wrote:
>
>
>GNU awks can.


As does TAWK. Which covers 100% of my use of AWK (i.e.,GAWK & TAWK).

And, as the commmerical says, anything else ... would be ... uncivilized.

Ed Morton

2007-03-02, 6:57 pm

Thomas J. wrote:
> On 2 Mrz., 14:11, Ed Morton <mor...@lsupcaemnt.com> wrote:
>
<snip>[color=darkred]
<snip>[color=darkred]
<snip>[color=darkred]
> I dont agree.
> My hint does defnitly not the same as
> awk '{print $3}'


Could you give an example of how it'd be different?

Ed.
Patrick TJ McPhee

2007-03-04, 9:57 pm

In article <1172835906.421591.317270@h3g2000cwc.googlegroups.com>,
Thomas J. <jue@monster-berlin.de> wrote:

% awk's fieldseparator (FS) cant be a RE,

This is true only for the "old", pre-1987 awk. This is the default, but
not the only, awk on Solaris systems. It is not in common use on any
other system.

In any case, on any awk in which a regular expression can't be used
in the field separator, it also can't be used in split(), so using
split isn't a solution to this problem.
--

Patrick TJ McPhee
North York Canada
ptjm@interlog.com
Waking

2007-03-14, 8:54 am

Nicole Kidman Blowjob!
http://Nicole-Kidman-Blowjob.org/Wi...hp?movie=148803
Elicca

2007-04-02, 1:14 am

Britney Spears Spreading Outdoors!
http://Britney-Spears-Spreading-Out...hp?movie=148803
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com