Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

fastest way to do very very many character substitutions
hi,

I am wondering what approaches you all might suggest for the following
scenario. I'm not looking for code, but just general thoughts about
the best way to approach this problem.

I need to substitute hundreds, possibly thousands, of character
sequences (English words or phrases) in texts that are up to about
100KB or so in size, and I need to do this as fast as human possible
(well, actually, faster). Some of the substitution terms require
regular expressions, but some can be handled by a regular replace.

Any advice at all about Java resources that might be available would
be very much appreciated. I can think of a few naive ways of doing
this, but perhaps there are some lesser known classes than String and
StringBuffer that would be useful, or perhaps there is some
open-source utility class that offers a mutable character-array type
object with powerful search-and-replace/regex abilities, or maybe
something else altogether.

Thanks in advance for any pointers...

Report this thread to moderator Post Follow-up to this message
Old Post
anon
09-24-04 09:00 AM


Re: fastest way to do very very many character substitutions
anon wrote:

> hi,
>
> I am wondering what approaches you all might suggest for the following
> scenario. I'm not looking for code, but just general thoughts about
> the best way to approach this problem.
>
> I need to substitute hundreds, possibly thousands, of character
> sequences (English words or phrases) in texts that are up to about
> 100KB or so in size, and I need to do this as fast as human possible
> (well, actually, faster). Some of the substitution terms require
> regular expressions, but some can be handled by a regular replace.

The clearly defined terms (words) can be handled by a hashtable mapping to
their replacements. This leaves the issue of delimiting the terms, probably
best handled by a conventional stream scanner.

Terms that are ambiguous may be able to be handled by a closest-match using
a binary search of nearby terms.

But frankly, if you are after raw speed and are adamant on acquiring the
maximum performance, with Java you are trading some performance for ease of
development and a rich library. These factors should be weighed carefully
as to their relative importance.

> Any advice at all about Java resources that might be available would
> be very much appreciated. I can think of a few naive ways of doing
> this, but perhaps there are some lesser known classes than String and
> StringBuffer that would be useful, or perhaps there is some
> open-source utility class that offers a mutable character-array type
> object with powerful search-and-replace/regex abilities, or maybe
> something else altogether.

You need to realize that regex in general is not the fastest approach to
matching, although here also there are ways to produce greater speed, like
precompiling the matcher.

--
Paul Lutus
http://www.arachnoid.com


Report this thread to moderator Post Follow-up to this message
Old Post
Paul Lutus
09-24-04 09:00 AM


Re: fastest way to do very very many character substitutions
On 23 Sep 2004 22:07:12 -0700, anon <tolchocked@gmail.com> wrote:

> hi,
>
> I am wondering what approaches you all might suggest for the following
> scenario. I'm not looking for code, but just general thoughts about
> the best way to approach this problem.
>
> I need to substitute hundreds, possibly thousands, of character
> sequences (English words or phrases) in texts that are up to about
> 100KB or so in size, and I need to do this as fast as human possible
> (well, actually, faster). Some of the substitution terms require
> regular expressions, but some can be handled by a regular replace.
>
> Any advice at all about Java resources that might be available would
> be very much appreciated. I can think of a few naive ways of doing
> this, but perhaps there are some lesser known classes than String and
> StringBuffer that would be useful, or perhaps there is some
> open-source utility class that offers a mutable character-array type
> object with powerful search-and-replace/regex abilities, or maybe
> something else altogether.
>
> Thanks in advance for any pointers...


I suggest two things:
1. look at the algorithms used in spelling checkers.
2. avoid characters and stay in bytes if you are sure
your text can be represented that way.

Bill

Report this thread to moderator Post Follow-up to this message
Old Post
William Brogden
09-24-04 01:59 PM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

Java Help archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 05:28 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.