For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > March 2005 > how would you write a spell checker?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author how would you write a spell checker?
mark McWilliams

2005-03-20, 3:55 am

Hello
I need to write a spell checker.

I have tride to check the words in the file to be
checked against the words in the dictionary but I
havve not been able to make progress.

If I use an
unless($word =~ /^$single_word$/i)
{ print $single_word; do other stuff ...}
# above prints every word of dictionary because most
words in dictionary will not match.

if ($word !~ /^$single_word$/i)
{ print $single_word; do other stuff ...}
# same prints every word of dictionary

if($word =~ /^$single_word$/i)
{ print $single_word; do other stuff ...}
# prints if $word (dictionary) and $single_word
(document) match not what I want

if($word =~ /Zzz/) # end of dictioinary
{
print $single_word; ... do other stuff
# the above prints out all words in the document


I have been thinking about making an array of letters
from the $single_word to be maatched againt letters of
the $words in the dictionary. but I do not know how to
strip a single character from a word to be assigned to
a $variable.


Maybe I am going about this all wrong altogether and
that it is a mistake to try to match words in the
document to those in the dictionary. I do not know.

I can reprint out the document, print out the words
that are in the dictionary, but what I need to do is
print out the words that are not in the dictionary.

One of the problems in my veiw of the problem is that
the only one word in the dictionary will match the
word in the document and the rest will not.

A fresh Idea from outside would be very helpful





__________________________________
Do you Yahoo!?
Yahoo! Small Business - Try our new resources site!
http://smallbusiness.yahoo.com/resources/
Robin

2005-03-20, 8:55 am

Dale Hagglund

2005-03-20, 8:55 am

>>>>> "mark" == mark McWilliams <hpv_of_earth@yahoo.com> writes:

mark> I need to write a spell checker. . . . A fresh Idea from
mark> outside would be very helpful

I can't quite tell what your approach is right now, but it looks too
complicated to me. In particular, I suggest you consider using a hash
table to represent the dictionary and check if each input word is a
key in the hashtable. If it isn't, it's a spelling error. An overly
simplified version of this program would look like this.

#!/usr/local/bin/perl
my %dict = ();
open HASH, "/path/to/dictionary"; # one word per line
while (<HASH> ) {
chomp;
$dict{$_} = 1;
}
close HASH;

my %errors = ();
while (<> ) {
chomp;
for $w (split $_) {
$errors{$w} = 1 unless defined $dict{$w}
}
}

for $w (sort keys %errors) {
print $w, "\n";
}

(Note: this code is completely untested. It hasn't been anywhere near
the perl interpreter.)

This program needs refinement in many ways:

1. It doesn't show you where the word was misspelled. This
behaviour is similar to the classic unix spell
implementation.

2. Only exact matches in the dictionary are considered to be
spelled correctly. A fancy spell checker might know about
english suffix and prefix rules, etc.

3. It doesn't support custom dictionaries and the like.

4. The use of split to break up words on a line probably isn't
flexible enough.

5. It assumes the document to spell check is on standard
input.

However, even as it stands (modulo any syntax errors) it should be
functional as a simple spell checker.

Dale Hagglund.

Offer Kaye

2005-03-20, 8:55 am

On Sat, 19 Mar 2005 22:52:21 -0800 (PST), mark McWilliams wrote:
>
> A fresh Idea from outside would be very helpful
>


Check out the Perl implementation of the unix "spell" command, from
the PPT project:
http://search.cpan.org/~cwest/ppt-0.14/bin/spell

There are other related modules on CPAN that you may find helpful.
Just do a CPAN search for the word "spell".

Hope this helps,
--
Offer Kaye
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com