Home > Archive > PERL Beginners > September 2006 > word counting
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
| Andrew Kennard 2006-09-05, 7:57 am |
| Hi all
I'm looking for a good word counting module/sub routine
I've found this so far
http://www.planet-source-code.com/v...Id=562&lngWId=6
but it counts things like the "Item1,Item2,Item3" as one word
I've had a search on CPAN but that did not result in any at all ? did I miss
something ?
I need a word counter to count the number of words in a scientific paper. I
know it wont be 100% accurate due to formulas etc but has anyone got a
better solution than the one above ? It's to check it is under a max word
count
Thanks in advance
Andrew
| |
| Dr.Ruud 2006-09-05, 7:57 am |
| "Andrew Kennard" schreef:
> I need a word counter to count the number of words in a scientific
> paper. I know it wont be 100% accurate due to formulas etc
echo 'I,Item1,Item2,Item3,a' |
sed 's/[^A-Za-z0-9]/ /g' |
wc
If you want to count only strings with a specific minimum length, use
`strings` as well:
echo 'I,Item1,Item2,Item3,a' |
sed 's/[^A-Za-z0-9]/\n/g' |
strings -2 |
wc -l
(for some seds you need to express the \n differently)
Or use perl:
echo 'I,Item1,Item2,Item3,a' |
perl -aF/[^[:alnum:]]+/ -nle'
print 0+grep length >= 2, @F
'
Alternative:
echo 'I,Item1,Item2,Item3,a' |
perl -aF/[\W_]+/ -ple'
$_=grep length>1,@F
'
See also `perldoc -q count`.
--
Affijn, Ruud
"Gewoon is een tijger."
| |
| Mumia W. 2006-09-05, 7:57 am |
| On 09/05/2006 03:47 AM, Andrew Kennard wrote:
> Hi all
>
> I'm looking for a good word counting module/sub routine
>
> I've found this so far
> http://www.planet-source-code.com/v...Id=562&lngWId=6
>
> but it counts things like the "Item1,Item2,Item3" as one word
>
> I've had a search on CPAN but that did not result in any at all ? did I miss
> something ?
>
> I need a word counter to count the number of words in a scientific paper. I
> know it wont be 100% accurate due to formulas etc but has anyone got a
> better solution than the one above ? It's to check it is under a max word
> count
>
> Thanks in advance
>
> Andrew
>
>
>
This is untested ($data should contain the text with the words):
my $count = () = $data =~ m/[[:alpha:]]+/g;
|
|
|
|
|