Home > Archive > AWK > June 2004 > Re: Display the line number containing a word which is consecutively
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Re: Display the line number containing a word which is consecutively
|
|
| Ed Morton 2004-05-12, 7:19 pm |
|
Cityhunters wrote:
> Ladies & gentlemen,
> I am a part time student who is trying to complete my SPA assignment
> with the following question:
> From an input file "abc"
> Display the line number containing a word which is consecutively
> duplicated,and display the duplicated word as well.
That's a very different requirement from your earlier post, so the
solutions that were posted before don't apply to this problem. I'm not
really sure what you're asking though: do you want to find lines that
have the same word repeated on that line, or lines that contain the same
word as the preceeding line, or is each word on it's own line? Do you
want every line number for those lines that have consecutive words, or
every line but the first, or only the first, or only the last? A sample
input file and desired output would be a big help. To illustrate one
possibility, if your input file contains:
bob
bill
bill
bill
joe
bill
sue
and you want the output to be:
3,bill
4,bill
then this will do it:
awk '$1 == prev{print NR, prev}{prev = $1 }'
Regards,
Ed.
| |
| Ed Morton 2004-05-12, 7:19 pm |
|
Cityhunters wrote:
> Hi everyone,
>
> I try again and end's up with this solution.although it wasn't 100%
> perfect ,but at least it fulfill the basic requirement.
No, it doesn't. See below.
> awk -f cgw.AWK inputfile > outputfile
> BEGIN { counter = 0 }
> /WORD/ { for (i=1; (i <= NF); i+=1)
The above searches wor a word that CONTAINS "WORD", not that IS "WORD"
so if your WORD was, say "stock", then the above would match on stock,
stocks, and stockmarket. You're using this because of the comma between
"stock" and "than" with no intervening white-space in the sample input
file you posted elsethread:
if your company have the stock,than my company will have the stock too.
so do you want to purchase the stock from us?
To awk this is a single field with value "stock,than" which is not the
same as the word "stock" later on the line.
> { if ($i == "WORD") counter += 1 }}
This will set counter to 1 for your search for "stock" since now you're
matching whole words, not REs and so you will not match on "stock,than"
but will match on the later "stock".
> END { print (" NFR"," WORD occurs ") }
This will ALWAYS print the text "NFR WORD occurs" no matter what word
you search for.
And of course, it just looks for one word rather than looking for all
duplicated words which I've been assuming was your goal. Do you really
just want to search for one specific word?
Ed.
| |
| Ed Morton 2004-05-12, 7:19 pm |
|
Chris F.A. Johnson wrote:
> On 2004-05-10, Cityhunters wrote:
>
>
>
> Also:
>
> 1 company
> 1 have
> 1 the
>
>
>
>
>
> awk -F ', \t' '{
> n = 1
> gsub(/[\.,;:]+/," ")
Why not just add the above to the FS list?
For the OP - is a word just alphabetic characters? If not, then the
above will change 10.5 and 7.5 into 10 5 and 7 5 so you'll get hits on
duplicate 5s, but it does take care of senteneces that are missing
white-space after a period. There's various other characters you may
want to have treated as blanks too, e.g. question marks. What about
words that contain non-aplhabetic characters, e.g. o'clock or company's
or hyphenated names?
Ed.
| |
| Ed Morton 2004-06-07, 8:55 am |
|
Cityhunters wrote:
> Ed Morton <morton@lsupcaemnt.com> wrote in message news:<DvOdneaLHrJo4gLdRVn-jw@comcast.com>...
>
>
>
> Hi there,
>
> Sorry for such a slow respond for the above mention title!I had finish
> my SPA assignment.So this is my answer which i got from my friend .
>
> {
> for (i=1;i<=NF;i++)
> {
> if (i>1)
> prev = $(i-1)
> if ($i == prev)
> print "Line No.",NR,"has duplicated word",prev
> }
> prev=$NF
The above "prev=$NF" does nothing within the loop since prev gets re-set
earlier in your loop.
> }
It doesn't have to be that complicated. This is easier to understand,
faster, and would produce the same output:
{
prev = $1
for (i=2;i<=NF;i++)
{
if ($i == prev)
print "Line No.",NR,"has duplicated word",prev
prev = $i
}
}
Regards,
Ed.
> Anyway~thanks for the help.
> ^_^
>
> From Cityhunters
| |
| Ed Morton 2004-06-07, 8:55 am |
|
Ed Morton wrote:
<snip>
> It doesn't have to be that complicated. This is easier to understand,
> faster, and would produce the same output:
>
> {
> prev = $1
> for (i=2;i<=NF;i++)
> {
> if ($i == prev)
> print "Line No.",NR,"has duplicated word",prev
> prev = $i
> }
> }
By the way, except for the formatting, the above produces the same
output as the very first response you got from Chuck Demas about a month
ago:
awk '{for(i=1;i<NF;i++){if($i==$(i+1)){print NR, $i}}}' infile
Regards,
Ed.
|
|
|
|
|