Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Text database?
I am trying to implement a text recognition module. But I need some
character to train the algorithms with. Does anyone know of a free
online database that contains characters?

Report this thread to moderator Post Follow-up to this message
Old Post
saneman
03-17-08 09:41 AM


Re: Text database?
> But I need some
> character to train the algorithms with. Does anyone know of a free
> online database that contains characters?

Wouldn't the Internet itself serve as such a database?
Or perhaps a subset, like Usenet, perhaps even narrower-
talk.religion.newage, for example, is full of long, winding
texts.

wget + sed|perl|(g)awk|python|... should get you a _lot_
of training data.

HTH and TTFN,
Tarkin

Report this thread to moderator Post Follow-up to this message
Old Post
Tarkin
03-18-08 12:22 AM


Re: Text database?
saneman wrote:
> I am trying to implement a text recognition module. But I need some
> character to train the algorithms with. Does anyone know of a free
> online database that contains characters?

You're using an online resource right now that contains characters.

If you'd like larger, more standardized corpus of text, the Gutenberg
project could probably help.

I suppose there's a chance you actually want bitmaps of fonts, though.
That could be accomplished by downloading some fonts.  Or by using
some bitmaps containing rasterized fonts; one could even create such
bitmaps by loading some text into a text editor or word processor and
typing or pasting the desired characters.

- Logan

Report this thread to moderator Post Follow-up to this message
Old Post
Logan Shaw
03-18-08 09:37 AM


Re: Text database?
>I am trying to implement a text recognition module. But I need some
>character to train the algorithms with. Does anyone know of a free
>online database that contains characters?

Post a message on USENET using your real email address, and you'll
have an unending supply of fresh SPAM.  Does that meet your requirements?




Report this thread to moderator Post Follow-up to this message
Old Post
Gordon Burditt
03-18-08 09:37 AM


Re: Text database?
On Mar 17, 1:54 am, saneman <asdf...@asd.com> wrote:
> I am trying to implement a text recognition module. But I need some
> character to train the algorithms with. Does anyone know of a free
> online database that contains characters?

You'd probably be better off asking on sci.image.processing, where you
were posting in the first place. That said, this is a reasonable place
for the following point:

You are presumable after a database of images of characters. You could
synthesize one by rasterizing a number of fonts (automatically) and
then adding various kinds of noise or various distortions.

I have a program for generating rasterizaton from here:

http://linuxfromscratch.org/piperma...ary/004748.html

look at links-2.1pre32-italic.patch.gz

You can run this patch on an empty directory, to extract the relavent
files.

To add distortions, you may wish to experiment with pnmscale,
pnmrotate, pgmnoise and pnmshear to add distortions. To be honest,
comp.unix.shell is also a good place for this kind of commandline
stuff, so I've cross posted there as well. Maybe some imagemagick
expert can weigh in on adding errors automatically.

-Ed
--
(You can't go wrong with psycho-rats.)(http://mi.eng.cam.ac.uk/~er258)

/d{def}def/f{/Times s selectfont}d/s{11}d/r{roll}d f 2/m{moveto}d -1
r 230 350 m 0 1 179{ 1 index show 88 rotate 4 mul 0 rmoveto}for/s 12
d f pop 235 420 translate 0 0 moveto 1 2 scale show showpage


Report this thread to moderator Post Follow-up to this message
Old Post
Edward Rosten
03-19-08 12:20 AM


Re: Text database?
Edward Rosten wrote:
> On Mar 17, 1:54 am, saneman <asdf...@asd.com> wrote: 
>
> You'd probably be better off asking on sci.image.processing, where you
> were posting in the first place. That said, this is a reasonable place
> for the following point:
>
> You are presumable after a database of images of characters. You could
> synthesize one by rasterizing a number of fonts (automatically) and
> then adding various kinds of noise or various distortions.
>
> I have a program for generating rasterizaton from here:
>
> http://linuxfromscratch.org/piperma...ary/004748.html
>
> look at links-2.1pre32-italic.patch.gz
>
> You can run this patch on an empty directory, to extract the relavent
> files.
>
> To add distortions, you may wish to experiment with pnmscale,
> pnmrotate, pgmnoise and pnmshear to add distortions. To be honest,
> comp.unix.shell is also a good place for this kind of commandline
> stuff, so I've cross posted there as well. Maybe some imagemagick
> expert can weigh in on adding errors automatically.
>
> -Ed
> --
> (You can't go wrong with psycho-rats.)(http://mi.eng.cam.ac.uk/~er258)
>
> /d{def}def/f{/Times s selectfont}d/s{11}d/r{roll}d f 2/m{moveto}d -1
> r 230 350 m 0 1 179{ 1 index show 88 rotate 4 mul 0 rmoveto}for/s 12
>     d f pop 235 420 translate 0 0 moveto 1 2 scale show showpage
>

This here was just what I needed:

http://yann.lecun.com/exdb/mnist/

which is also used on the below pages:

http://www.bcl.hamilton.ie/~barak/t.../hw1/index.html
http://www.iro.umontreal.ca/~lisa/t...MnistVariations
http://www.int.tu-darmstadt.de/mlu/index.html

Report this thread to moderator Post Follow-up to this message
Old Post
saneman
03-21-08 12:15 AM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

Unix Programming archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 10:24 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.