For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > September 2007 > regex help









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author regex help
Jonathan Lang

2007-09-25, 4:00 am

I'm trying to devise a regex that matches from the first double-quote
character found to the next double-quote character that isn't part of
a pair; but for some reason, I'm having no luck. Here's what I tried:

/"(.*?)"(?!")/

Sample text:

author: "Jonathan ""Dataweaver"" Lang" key=val

What I'm getting for $1 in the first match:

Jonathan "

What I'm looking for:

Jonathan ""Dataweaver"" Lang

What did I miss, and how can I most efficiently perform the desired match?

--
Jonathan "Dataweaver" Lang
Rob Dixon

2007-09-25, 7:59 am

Jonathan Lang wrote:
>
> I'm trying to devise a regex that matches from the first double-quote
> character found to the next double-quote character that isn't part of
> a pair; but for some reason, I'm having no luck. Here's what I tried:
>
> /"(.*?)"(?!")/
>
> Sample text:
>
> author: "Jonathan ""Dataweaver"" Lang" key=val
>
> What I'm getting for $1 in the first match:
>
> Jonathan "
>
> What I'm looking for:
>
> Jonathan ""Dataweaver"" Lang
>
> What did I miss, and how can I most efficiently perform the desired match?


Your regex looks for the first double-quote and then captures everything after
that up to the first subsequent double-quote that isn't followed immediately by
another one. The second quote of the pair before 'Dataweaver' matches this
criterion so your regex captures up to the character before it.

This:

$str =~ /"((?:.*?"")*.*?)"/;

should do what you want. After finding the first double-quote it captures all
following sequences ending in a pair of double quotes, plus anything after
those up to the closing quote.

HTH,

Rob
Jonathan Lang

2007-09-25, 7:00 pm

Rob Dixon wrote:
> Jonathan Lang wrote:
>
> Your regex looks for the first double-quote and then captures everything after
> that up to the first subsequent double-quote that isn't followed immediately by
> another one. The second quote of the pair before 'Dataweaver' matches this
> criterion so your regex captures up to the character before it.
>
> This:
>
> $str =~ /"((?:.*?"")*.*?)"/;
>
> should do what you want. After finding the first double-quote it captures all
> following sequences ending in a pair of double quotes, plus anything after
> those up to the closing quote.


Ah. I had tried /"((.*?"")*.*?)"/ and hadn't gotten it to work; it
never occurred to me to try the non-capturing group instead.

Thank you.

--
Jonathan "Dataweaver" Lang
Rob Dixon

2007-09-25, 7:00 pm

Jonathan Lang wrote:
> Rob Dixon wrote:
>
> Ah. I had tried /"((.*?"")*.*?)"/ and hadn't gotten it to work; it
> never occurred to me to try the non-capturing group instead.


That also works! (But is performing unnecessary and wasteful captures.)

Rob



use strict;
use warnings;

my $str = q(author: "Jonathan ""Dataweaver"" Lang" key=val);

$str =~ /"((.*?"")*.*?)"/;
print $1, "\n";

**OUTPUT**

Jonathan ""Dataweaver"" Lang

Nobull67@Gmail.Com

2007-09-26, 10:00 pm

On Sep 25, 4:33 pm, rob.di...@350.com (Rob Dixon) wrote:
> Jonathan Lang wrote:
>
>
>
>
>
>
>
>
>
>
>
>
>
> That also works! (But is performing unnecessary and wasteful captures.)
>
> Rob
>
> use strict;
> use warnings;
>
> my $str = q(author: "Jonathan ""Dataweaver"" Lang" key=val);
>
> $str =~ /"((.*?"")*.*?)"/;
> print $1, "\n";
>
> **OUTPUT**
>
> Jonathan ""Dataweaver"" Lang


use strict;
use warnings;

my $str = q(author: "Jonathan ""Dataweaver"" Lang" key=val fly-in-
ointment: "Brian ""Nobull"" McCauley");

$str =~ /"((.*?"")*.*?)"/;
print $1, "\n";

__END__

**OUTPUT**

Jonathan ""Dataweaver"" Lang" key=val fly-in-ointment: "Brian
""Nobull"" McCaule
y

An alternative pattern would be /"((?:[^"]*"")*.*?)"/ although the
behaviour or that may be counter-intuative if presented with bad input
in which there's no closing quote.


My perferred pattern would be much closer to Jonathan's original:

/"((?:[^"]|"")*)"(?!")/

This has the advantage of failing to match if presented with input
that lacks a closing quote.

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com