Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Re: regex of the month (decade?)
Stoll, Steven R. wrote:
> After solving the case sensitivity issue, separating the alternations, and
> solving the un-escaped /, here is what we are left with.
>
> (p(ost)?[.\s]*o(ffice)?[.\s]*box)
> po(b|x|drawer|stoffice|[ ]bx|box)
> p[\/]o
> b(x|ox|uzon)
> a(partado|ptdo)

If we unroll that to

post[.\s]*o(ffice)?[.\s]*box
p[.\s]*o(ffice)?[.\s]*box
pob
pox
podrawer
postoffice
po[ ]bx
pobox
p[\/]o
bx
box
buzon
apartado
aptdo

reassembling it, we obtain

(?:p(?:o(?:st(?:[.\s]*o(ffice)?[.\s]*box|office)|(?:
b)?x|b(?:ox)?|drawer)|[.\s]*o(ffice)?[.\s]*box|\/o)|ap(?:arta|t)do|b(?:uzon|
o?x))

I'm happy that they thought to check for 'pox' as a shorthand to
postoffice box. I'll remember to use that next time I need to address
such a letter.

David

Report this thread to moderator Post Follow-up to this message
Old Post
David Landgren
01-09-08 12:40 AM


Re: regex of the month (decade?)
Dan Collins wrote:

> On Jan 7, 2008 5:06 PM, Uri Guttman <uri@stemsystems.com> wrote:
> 
>
> I don't even want to know what that's supposed to do.
> First, and most obviously, that should use /i.

You can also find code like this in HTML::Template. Sam Tregar =20
explained the reason in the FAQ:

Q: Why do you use /[Tt]/ instead of /t/i? It's so ugly!

A: Simple - the case-insensitive match switch is very
inefficient. According to "Mastering Regular Expressions"
from O'Reilly Press, /[Tt]/ is faster and more space
efficient than /t/i - by as much as double against long
strings. //i essentially does a lc() on the string and
keeps a temporary copy in memory.

When this changes, and it is in the 5.6 development series,
I will gladly use //i. Believe me, I realize [Tt] is hideously
ugly.

=BB http://search.cpan.org/dist/HTML-Template/=20
Template.pm#FREQUENTLY_ASKED_QUESTIONS


--=20
S=E9bastien Aperghis-Tramoni

Close the world, txEn eht nepO.


Report this thread to moderator Post Follow-up to this message
Old Post
Sébastien Aperghis-Tramoni
01-11-08 03:34 AM


Re: regex of the month (decade?)
Sébastien Aperghis-Tramoni wrote:
>     Q: Why do you use /[Tt]/ instead of /t/i? It's so ugly!
>
>     A: Simple - the case-insensitive match switch is very
>     inefficient. According to "Mastering Regular Expressions"
>     from O'Reilly Press, /[Tt]/ is faster and more space
>     efficient than /t/i - by as much as double against long
>     strings. //i essentially does a lc() on the string and
>     keeps a temporary copy in memory.
>
>     When this changes, and it is in the 5.6 development series,
>     I will gladly use //i. Believe me, I realize [Tt] is hideously
>     ugly.

Looks like it was painfully true for 5.5...

$ time perl5.5.5 -wle '$foo = "x" x 10000; $foo .= "T";  $foo =~ /[Tt]/ for
1..100000'

real    0m4.882s
user    0m4.761s
sys     0m0.026s

$ time perl5.5.5 -wle '$foo = "x" x 10000; $foo .= "T";  $foo =~ /t/i for
1..100000'

real    0m40.656s
user    0m39.587s
sys     0m0.149s


And the reverse is now true in this highly inaccurate test...

$ time perl5.8.8 -wle '$foo = "x" x 10000; $foo .= "T";  $foo =~ /[Tt]/ for
1..100000'

real    0m5.732s
user    0m5.565s
sys     0m0.027s

$ time perl5.8.8 -wle '$foo = "x" x 10000; $foo .= "T";  $foo =~ /t/i for
1..100000'

real    0m2.589s
user    0m2.544s
sys     0m0.015s



--
<Schwern> What we learned was if you get , grab someone and swing
them around a few times
-- Life's lessons from square dancing

Report this thread to moderator Post Follow-up to this message
Old Post
Michael G Schwern
01-11-08 03:34 AM


Re: regex of the month (decade?)
Michael G Schwern wrote:

> And the reverse is now true in this highly inaccurate test...
>
> $ time perl5.8.8 -wle '$foo = "x" x 10000; $foo .= "T";  $foo =~ /[Tt]/ for
> 1..100000'
>
> real    0m5.732s
> user    0m5.565s
> sys     0m0.027s
>
> $ time perl5.8.8 -wle '$foo = "x" x 10000; $foo .= "T";  $foo =~ /t/i for
> 1..100000'
>
> real    0m2.589s
> user    0m2.544s
> sys     0m0.015s

And if I recall my perl510delta correctly, /i should be even faster on
5.10.0. No, hang on, it's when UTF-8 strings are involved.

% time perl5.8.8 -Mutf8 -Mcharnames=:full -wle '$foo = "e\N{GREEK SMALL
LETTER BETA}" x 5000; $foo .= "T";  $foo =~ /t/i for 1..1000'

real    0m22.855s
user    0m22.827s
sys     0m0.016s

% ./perl -v
This is perl, v5.10.0 DEVEL32604 built for i386-freebsd-thread-multi

% time ./perl -Ilib -Mutf8 -Mcharnames=:full -wle '$foo = "e\N{GREEK
SMALL LETTER BETA}" x 5000; $foo .= "T";  $foo =~ /t/i for 1..1000'

real    0m22.957s
user    0m22.948s
sys     0m0.001s

Well, look on the bright side. It's no worse.

The benchmark may be flawed, since my appreciation of Unicode is little
more than "things went downhill after 7-bit ASCII".

David

Report this thread to moderator Post Follow-up to this message
Old Post
David Landgren
01-11-08 01:38 PM


Re: regex of the month (decade?)
On Jan 11, 2008, at 8:01 AM, David Landgren wrote:

> The benchmark may be flawed, since my appreciation of Unicode is =20
> little more than "things went downhill after 7-bit ASCII".

Haven't I read that you live in Paris?  I figured that anyone who =20
lives in a country whose dominant language was not fully expressible =20
in ASCII would love Unicode.

On a major tangent, have others noticed the resurgence of the umlaut =20
in printed English?  I keep seeing things like co=F6peration or =20
co=F6rdinates -- particularly in Technology Review, but in other =20
publications on occasion too.  Is that because it's *supposed* to be =20
spelled that way, but ASCII and the typewriter have suppressed that =20
spelling for my lifetime?

Chris


Report this thread to moderator Post Follow-up to this message
Old Post
Chris Dolan
01-13-08 12:34 AM


Re: regex of the month (decade?)
Chris Dolan wrote:
> On a major tangent, have others noticed the resurgence of the umlaut
> in printed English?  I keep seeing things like coöperation or
> coördinates -- particularly in Technology Review, but in other
> publications on occasion too.  Is that because it's *supposed* to be
> spelled that way, but ASCII and the typewriter have suppressed that
> spelling for my lifetime?
>

A quick use of Google-fu unearthed a blog entry
http://www.dwelle.org/archives/2007...l-the-umlauts/,
which in turn pointed to the page
http://ourworld.compuserve.com/homepages/profirst/d.htm that says:

*dieresis* or *diæresis   *A diacritical mark (* ¨ *) optionally used in
English, oftentimes replaced by a hyphen. In English, the dieresis is
used on a second identical vowel to indicate a change in pronunciation
of that vowel or indicate it is pronounced in a separate syllable. It is
sometimes referred to as an « umlaut » when used with a single character
or in a « diphthong. » Examples: reëlecting, reëncoding, coöperation,
coördination.



Well I, for one, never knew that such a thing existed.  Neato!  Too
bad the name of the mark, though, which is definitively unfortunate.


Joÿ,
`/anick

Report this thread to moderator Post Follow-up to this message
Old Post
Yanick Champoux
01-13-08 12:34 AM


Re: regex of the month (decade?)

Report this thread to moderator Post Follow-up to this message
Old Post
Georg Moritz
01-13-08 12:34 AM


Re: regex of the month (decade?)
On Sun, Jan 13, 2008 at 12:23:35AM +0100, Georg Moritz wrote:
> Well, that's sort of quotemeta for the double o - differentiating e.g.
> double-o usage in  vs. cooperation. I haven't seen that usage in
> english yet, but it's used in spanish to mark a vowel as literal, e.g. in
> "Parque Güell".

The only English word I think its commonly seen with is naïve,
to indicate that the ai isn't a digraph.


--
"But Sidley Park is already a picture, and a most amiable picture too.
The slopes are green and gentle. The trees are companionably grouped at
intervals that show them to advantage. The rill is a serpentine ribbon
unwound from the lake peaceably contained by meadows on which the right
amount of sheep are tastefully arranged." -- Lady Croom, "Arcadia"

Report this thread to moderator Post Follow-up to this message
Old Post
Dave Mitchell
01-13-08 12:34 AM


Re: regex of the month (decade?)
Chris Dolan wrote:
> Haven't I read that you live in Paris?  I figured that anyone who lives
> in a country whose dominant language was not fully expressible in ASCII
> would love Unicode.

"Not fully expressible" seems mild to apply to writing French in ASCII
(which after all has no diacritics).  The phrase seems more appropriate
for writing French in ISO-8859-1 (because of the lack of "oe" ligature).

--
Keith C. Ivey <keith@iveys.org>
Washington, DC

Report this thread to moderator Post Follow-up to this message
Old Post
Keith Ivey
01-13-08 12:34 AM


Re: regex of the month (decade?)
* Chris Dolan <chris@chrisdolan.net> [2008-01-12 23:55]:
> I figured that anyone who lives in a country whose dominant
> language was not fully expressible in ASCII would love Unicode.

For bonus points, try writing, say, German (fully expressible
with an ISO-8859 charset) and Gr (fully expressible[^1] with
an ISO-8859 charset) in the same document.

[1]: Well, Modern Gr anyway.

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>

Report this thread to moderator Post Follow-up to this message
Old Post
Aristotle Pagaltzis
01-13-08 09:49 AM


Sponsored Links




Last Thread Next Thread Next
Pages (3): « 1 [2] 3 »
Search this forum -> 
Post New Thread

PERL Tricks archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 06:17 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.