| Patrick TJ McPhee 2004-05-12, 7:19 pm |
| In article <5b85a1e6.0405090413.1a0dfa2b@posting.google.com>,
Cityhunters <winstonk22@hotmail.com> wrote:
% Can anyone help me with this question??
Your basic approach is wrong. Once the problems I'll note below
are sorted out, what you'll have is an array with the counts of
each occurance of a word in a file, but not of consecutive duplicates.
Do deal with the problem, you need to compare each field to the field
that follows it. You might have to save the last field and compare
it to the first field of the next line. You'll want to print the
line number (FNR) and the value of the duplicated field. You might
have to set FS to ignore punctuation marks (FS = "[^a-zA-Z]+" ought
to do it -- this will treat a word as a string of letters). You might
have to prevent repeated printing if there are two or more duplicates.
You don't really need to use an array to do this -- you only need
to retain the value of field i when you're looking at field i+1.
% awk 'BEGIN{FS=" "}{for(i=1;i <=FNR;i++)count[$i]++;print $word
% }END{print count[1] }' fileName
FNR is the record (line) number in the current file. What you want is
FS, which is the number of fields in the current record. Where is `word'
coming from?
% awk 'BEGIN{FS=" "}{for(i=1;i <=FNR;i++)count[$i]++ }END{for(word in
% count)print count[word] }' filename
A more normal idiom for this is
for (word in count) print word, count[word]
As I said above, this will give the number of occurances of each word.
--
Patrick TJ McPhee
East York Canada
ptjm@interlog.com
|