Code Comments
Programming Forum and web based access to our favorite programming groups.I am trying to implement a text recognition module. But I need some character to train the algorithms with. Does anyone know of a free online database that contains characters?
Post Follow-up to this message> But I need some > character to train the algorithms with. Does anyone know of a free > online database that contains characters? Wouldn't the Internet itself serve as such a database? Or perhaps a subset, like Usenet, perhaps even narrower- talk.religion.newage, for example, is full of long, winding texts. wget + sed|perl|(g)awk|python|... should get you a _lot_ of training data. HTH and TTFN, Tarkin
Post Follow-up to this messagesaneman wrote: > I am trying to implement a text recognition module. But I need some > character to train the algorithms with. Does anyone know of a free > online database that contains characters? You're using an online resource right now that contains characters. If you'd like larger, more standardized corpus of text, the Gutenberg project could probably help. I suppose there's a chance you actually want bitmaps of fonts, though. That could be accomplished by downloading some fonts. Or by using some bitmaps containing rasterized fonts; one could even create such bitmaps by loading some text into a text editor or word processor and typing or pasting the desired characters. - Logan
Post Follow-up to this message>I am trying to implement a text recognition module. But I need some >character to train the algorithms with. Does anyone know of a free >online database that contains characters? Post a message on USENET using your real email address, and you'll have an unending supply of fresh SPAM. Does that meet your requirements?
Post Follow-up to this messageOn Mar 17, 1:54 am, saneman <asdf...@asd.com> wrote: > I am trying to implement a text recognition module. But I need some > character to train the algorithms with. Does anyone know of a free > online database that contains characters? You'd probably be better off asking on sci.image.processing, where you were posting in the first place. That said, this is a reasonable place for the following point: You are presumable after a database of images of characters. You could synthesize one by rasterizing a number of fonts (automatically) and then adding various kinds of noise or various distortions. I have a program for generating rasterizaton from here: http://linuxfromscratch.org/piperma...ary/004748.html look at links-2.1pre32-italic.patch.gz You can run this patch on an empty directory, to extract the relavent files. To add distortions, you may wish to experiment with pnmscale, pnmrotate, pgmnoise and pnmshear to add distortions. To be honest, comp.unix.shell is also a good place for this kind of commandline stuff, so I've cross posted there as well. Maybe some imagemagick expert can weigh in on adding errors automatically. -Ed -- (You can't go wrong with psycho-rats.)(http://mi.eng.cam.ac.uk/~er258) /d{def}def/f{/Times s selectfont}d/s{11}d/r{roll}d f 2/m{moveto}d -1 r 230 350 m 0 1 179{ 1 index show 88 rotate 4 mul 0 rmoveto}for/s 12 d f pop 235 420 translate 0 0 moveto 1 2 scale show showpage
Post Follow-up to this messageEdward Rosten wrote: > On Mar 17, 1:54 am, saneman <asdf...@asd.com> wrote: > > You'd probably be better off asking on sci.image.processing, where you > were posting in the first place. That said, this is a reasonable place > for the following point: > > You are presumable after a database of images of characters. You could > synthesize one by rasterizing a number of fonts (automatically) and > then adding various kinds of noise or various distortions. > > I have a program for generating rasterizaton from here: > > http://linuxfromscratch.org/piperma...ary/004748.html > > look at links-2.1pre32-italic.patch.gz > > You can run this patch on an empty directory, to extract the relavent > files. > > To add distortions, you may wish to experiment with pnmscale, > pnmrotate, pgmnoise and pnmshear to add distortions. To be honest, > comp.unix.shell is also a good place for this kind of commandline > stuff, so I've cross posted there as well. Maybe some imagemagick > expert can weigh in on adding errors automatically. > > -Ed > -- > (You can't go wrong with psycho-rats.)(http://mi.eng.cam.ac.uk/~er258) > > /d{def}def/f{/Times s selectfont}d/s{11}d/r{roll}d f 2/m{moveto}d -1 > r 230 350 m 0 1 179{ 1 index show 88 rotate 4 mul 0 rmoveto}for/s 12 > d f pop 235 420 translate 0 0 moveto 1 2 scale show showpage > This here was just what I needed: http://yann.lecun.com/exdb/mnist/ which is also used on the below pages: http://www.bcl.hamilton.ie/~barak/t.../hw1/index.html http://www.iro.umontreal.ca/~lisa/t...MnistVariations http://www.int.tu-darmstadt.de/mlu/index.html
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.