For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > September 2006 > word counting









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author word counting
Andrew Kennard

2006-09-05, 7:57 am

Hi all

I'm looking for a good word counting module/sub routine

I've found this so far
http://www.planet-source-code.com/v...Id=562&lngWId=6

but it counts things like the "Item1,Item2,Item3" as one word

I've had a search on CPAN but that did not result in any at all ? did I miss
something ?

I need a word counter to count the number of words in a scientific paper. I
know it wont be 100% accurate due to formulas etc but has anyone got a
better solution than the one above ? It's to check it is under a max word
count

Thanks in advance

Andrew


Dr.Ruud

2006-09-05, 7:57 am

"Andrew Kennard" schreef:

> I need a word counter to count the number of words in a scientific
> paper. I know it wont be 100% accurate due to formulas etc


echo 'I,Item1,Item2,Item3,a' |
sed 's/[^A-Za-z0-9]/ /g' |
wc

If you want to count only strings with a specific minimum length, use
`strings` as well:

echo 'I,Item1,Item2,Item3,a' |
sed 's/[^A-Za-z0-9]/\n/g' |
strings -2 |
wc -l
(for some seds you need to express the \n differently)


Or use perl:

echo 'I,Item1,Item2,Item3,a' |
perl -aF/[^[:alnum:]]+/ -nle'
print 0+grep length >= 2, @F
'

Alternative:

echo 'I,Item1,Item2,Item3,a' |
perl -aF/[\W_]+/ -ple'
$_=grep length>1,@F
'

See also `perldoc -q count`.

--
Affijn, Ruud

"Gewoon is een tijger."


Mumia W.

2006-09-05, 7:57 am

On 09/05/2006 03:47 AM, Andrew Kennard wrote:
> Hi all
>
> I'm looking for a good word counting module/sub routine
>
> I've found this so far
> http://www.planet-source-code.com/v...Id=562&lngWId=6
>
> but it counts things like the "Item1,Item2,Item3" as one word
>
> I've had a search on CPAN but that did not result in any at all ? did I miss
> something ?
>
> I need a word counter to count the number of words in a scientific paper. I
> know it wont be 100% accurate due to formulas etc but has anyone got a
> better solution than the one above ? It's to check it is under a max word
> count
>
> Thanks in advance
>
> Andrew
>
>
>


This is untested ($data should contain the text with the words):

my $count = () = $data =~ m/[[:alpha:]]+/g;



Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com