Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Efficient way of testing for substring being one of a set?
What's the neatest and/or most efficient way of testing if one of a
set of strings (contained in a dictionary, list or similar) is a
sub-string of a given string?

I.e. I have a string delivered into my program and I want to see if
any of a set of strings is a substring of the string I have been
given.  It's quite OK to stop at the first one found.  Ideally the
strings being searched through will be the keys of a dictionary but
this isn't a necessity, they can just be in a list if it could be done
more efficiently using a list.


Is this the best one can do (ignoring the likelihood that I've got
some syntax wrong) :-

# l is the list
# str is the incoming string
answer = ""
for x in l:
if str.find(x) < 0:
continue
answer = x


--
Chris Green

Report this thread to moderator Post Follow-up to this message
Old Post
tinnews@isbd.co.uk
04-03-08 01:44 PM


Re: Efficient way of testing for substring being one of a set?
On Apr 3, 12:37 pm, tinn...@isbd.co.uk wrote:
> What's the neatest and/or most efficient way of testing if one of a
> set of strings (contained in a dictionary, list or similar) is a
> sub-string of a given string?
>
> I.e. I have a string delivered into my program and I want to see if
> any of a set of strings is a substring of the string I have been
> given.  It's quite OK to stop at the first one found.  Ideally the
> strings being searched through will be the keys of a dictionary but
> this isn't a necessity, they can just be in a list if it could be done
> more efficiently using a list.
>
> Is this the best one can do (ignoring the likelihood that I've got
> some syntax wrong) :-
>
>     # l is the list
>     # str is the incoming string
>     answer = ""
>     for x in l:
>         if str.find(x) < 0:
>             continue
>         answer = x

I'd not use 'l' ( with '1') or 'str' (a standard module) as
variable names. Your code checks every string in the list even when
it's found one... you can reverse the test and break when the first
one is found. Using 'in' rather than testing the return value of find
is nicer as a substring test. Finally, using the 'else' clause lets
you make it clear that answer is set to the empty string when no match
is found.

for answer in l:
if str in answer: break
else:
answer = ''

--
Paul Hankin

Report this thread to moderator Post Follow-up to this message
Old Post
Paul Hankin
04-03-08 01:44 PM


Re: Efficient way of testing for substring being one of a set?
def foo(sample, strings):
for s in strings:
if sample in s:
return True
return False

This was an order of magnitude faster for me than using str.find or
str.index.  That was finding rare words in the entire word-list (w/
duplicates) of War and Peace.

Report this thread to moderator Post Follow-up to this message
Old Post
Jeff
04-03-08 01:44 PM


Re: Efficient way of testing for substring being one of a set?
Paul Hankin <paul.hankin@gmail.com> wrote:
> On Apr 3, 12:37 pm, tinn...@isbd.co.uk wrote: 
>
> I'd not use 'l' ( with '1') or 'str' (a standard module) as
> variable names.

Neither would I, it was just thrown together to show how I was
thinking.

>                 Your code checks every string in the list even when
> it's found one... you can reverse the test and break when the first
> one is found. Using 'in' rather than testing the return value of find
> is nicer as a substring test. Finally, using the 'else' clause lets
> you make it clear that answer is set to the empty string when no match
> is found.
>
> for answer in l:
>     if str in answer: break
> else:
>     answer = ''
>
OK, that does improve things somewhat, thanks.

--
Chris Green

Report this thread to moderator Post Follow-up to this message
Old Post
tinnews@isbd.co.uk
04-03-08 01:44 PM


Re: Efficient way of testing for substring being one of a set?
Jeff <jeffober@gmail.com> wrote:
> def foo(sample, strings):
>         for s in strings:
>                 if sample in s:
>                         return True
>         return False
>
> This was an order of magnitude faster for me than using str.find or
> str.index.  That was finding rare words in the entire word-list (w/
> duplicates) of War and Peace.

However it's the wrong way around, in my case 'sample' is the longer
string and I want to know if s is in it.  It's simple enough to do it
the other way around though:-

def foo(sample, strings):
for s in strings:
if s in sample:
return True
return False

Using in rather than find() and making it a function would seem to be
the way to go, thanks.

--
Chris Green

Report this thread to moderator Post Follow-up to this message
Old Post
tinnews@isbd.co.uk
04-03-08 01:44 PM


Re: Efficient way of testing for substring being one of a set?
On Apr 3, 8:03 am, Jeff <jeffo...@gmail.com> wrote:
> def foo(sample, strings):
>         for s in strings:
>                 if sample in s:
>                         return True
>         return False
>
> This was an order of magnitude faster for me than using str.find or
> str.index.  That was finding rare words in the entire word-list (w/
> duplicates) of War and Peace.

If you test against the same substrings over and over again, an
alternative would be to build a regular expression:

import re
search = re.compile('|'.join(re.escape(x)
for x in substrings)).search
p = search(somestring)
if p is not None:
print 'Found', p.group()


George

Report this thread to moderator Post Follow-up to this message
Old Post
George Sakkis
04-03-08 01:44 PM


Re: Efficient way of testing for substring being one of a set?
On Apr 3, 12:37 pm, tinn...@isbd.co.uk wrote:
> What's the neatest and/or most efficient way of testing if one of a

A different approach:
 
True
 
False

Perhaps not as obvious or readable as Jeff's example, but is
essentially doing the same thing using generator syntax.

Report this thread to moderator Post Follow-up to this message
Old Post
Ant
04-03-08 01:44 PM


Re: Efficient way of testing for substring being one of a set?
On Apr 3, 8:19=A0am, George Sakkis <george.sak...@gmail.com> wrote:
> On Apr 3, 8:03 am, Jeff <jeffo...@gmail.com> wrote:
> 
> 
>
> If you test against the same substrings over and over again, an
> alternative would be to build a regular expression:
>
> import re
> search =3D re.compile('|'.join(re.escape(x)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0for x in substr=[/color
]
ings)).search
> p =3D search(somestring)
> if p is not None:
> =A0 print 'Found', p.group()
>
> George

That would be an enormous regular expression and eat a lot of memory.
But over an enormous number of substrings, it would be O(log n),
rather than O(n).

Report this thread to moderator Post Follow-up to this message
Old Post
Jeff
04-03-08 01:44 PM


Re: Efficient way of testing for substring being one of a set?
On Apr 3, 8:44=A0am, Ant <ant...@gmail.com> wrote:
> On Apr 3, 12:37 pm, tinn...@isbd.co.uk wrote:
> 
>
> A different approach:
> 
>
> True
> 
>
> False
>
> Perhaps not as obvious or readable as Jeff's example, but is
> essentially doing the same thing using generator syntax.

That's pretty :)

Report this thread to moderator Post Follow-up to this message
Old Post
Jeff
04-03-08 01:44 PM


Re: Efficient way of testing for substring being one of a set?
On Apr 3, 1:37 pm, tinn...@isbd.co.uk wrote:
> What's the neatest and/or most efficient way of testing if one of a
> set of strings (contained in a dictionary, list or similar) is a
> sub-string of a given string?
> [...]

You could use the Aho-Corasick algorithm <http://en.wikipedia.org/wiki/
Aho-Corasick_algorithm>.
I don't know if there's a Python implementation yet.

Dennis Benzinger

Report this thread to moderator Post Follow-up to this message
Old Post
Dennis.Benzinger@gmx.net
04-03-08 01:44 PM


Sponsored Links




Last Thread Next Thread Next
Pages (2): [1] 2 »
Search this forum -> 
Post New Thread

Python archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 06:17 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.