Home > Archive > PERL Beginners > September 2007 > regex help
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
| Jonathan Lang 2007-09-25, 4:00 am |
| I'm trying to devise a regex that matches from the first double-quote
character found to the next double-quote character that isn't part of
a pair; but for some reason, I'm having no luck. Here's what I tried:
/"(.*?)"(?!")/
Sample text:
author: "Jonathan ""Dataweaver"" Lang" key=val
What I'm getting for $1 in the first match:
Jonathan "
What I'm looking for:
Jonathan ""Dataweaver"" Lang
What did I miss, and how can I most efficiently perform the desired match?
--
Jonathan "Dataweaver" Lang
| |
| Rob Dixon 2007-09-25, 7:59 am |
| Jonathan Lang wrote:
>
> I'm trying to devise a regex that matches from the first double-quote
> character found to the next double-quote character that isn't part of
> a pair; but for some reason, I'm having no luck. Here's what I tried:
>
> /"(.*?)"(?!")/
>
> Sample text:
>
> author: "Jonathan ""Dataweaver"" Lang" key=val
>
> What I'm getting for $1 in the first match:
>
> Jonathan "
>
> What I'm looking for:
>
> Jonathan ""Dataweaver"" Lang
>
> What did I miss, and how can I most efficiently perform the desired match?
Your regex looks for the first double-quote and then captures everything after
that up to the first subsequent double-quote that isn't followed immediately by
another one. The second quote of the pair before 'Dataweaver' matches this
criterion so your regex captures up to the character before it.
This:
$str =~ /"((?:.*?"")*.*?)"/;
should do what you want. After finding the first double-quote it captures all
following sequences ending in a pair of double quotes, plus anything after
those up to the closing quote.
HTH,
Rob
| |
| Jonathan Lang 2007-09-25, 7:00 pm |
| Rob Dixon wrote:
> Jonathan Lang wrote:
>
> Your regex looks for the first double-quote and then captures everything after
> that up to the first subsequent double-quote that isn't followed immediately by
> another one. The second quote of the pair before 'Dataweaver' matches this
> criterion so your regex captures up to the character before it.
>
> This:
>
> $str =~ /"((?:.*?"")*.*?)"/;
>
> should do what you want. After finding the first double-quote it captures all
> following sequences ending in a pair of double quotes, plus anything after
> those up to the closing quote.
Ah. I had tried /"((.*?"")*.*?)"/ and hadn't gotten it to work; it
never occurred to me to try the non-capturing group instead.
Thank you.
--
Jonathan "Dataweaver" Lang
| |
| Rob Dixon 2007-09-25, 7:00 pm |
| Jonathan Lang wrote:
> Rob Dixon wrote:
>
> Ah. I had tried /"((.*?"")*.*?)"/ and hadn't gotten it to work; it
> never occurred to me to try the non-capturing group instead.
That also works! (But is performing unnecessary and wasteful captures.)
Rob
use strict;
use warnings;
my $str = q(author: "Jonathan ""Dataweaver"" Lang" key=val);
$str =~ /"((.*?"")*.*?)"/;
print $1, "\n";
**OUTPUT**
Jonathan ""Dataweaver"" Lang
| |
| Nobull67@Gmail.Com 2007-09-26, 10:00 pm |
| On Sep 25, 4:33 pm, rob.di...@350.com (Rob Dixon) wrote:
> Jonathan Lang wrote:
>
>
>
>
>
>
>
>
>
>
>
>
>
> That also works! (But is performing unnecessary and wasteful captures.)
>
> Rob
>
> use strict;
> use warnings;
>
> my $str = q(author: "Jonathan ""Dataweaver"" Lang" key=val);
>
> $str =~ /"((.*?"")*.*?)"/;
> print $1, "\n";
>
> **OUTPUT**
>
> Jonathan ""Dataweaver"" Lang
use strict;
use warnings;
my $str = q(author: "Jonathan ""Dataweaver"" Lang" key=val fly-in-
ointment: "Brian ""Nobull"" McCauley");
$str =~ /"((.*?"")*.*?)"/;
print $1, "\n";
__END__
**OUTPUT**
Jonathan ""Dataweaver"" Lang" key=val fly-in-ointment: "Brian
""Nobull"" McCaule
y
An alternative pattern would be /"((?:[^"]*"")*.*?)"/ although the
behaviour or that may be counter-intuative if presented with bad input
in which there's no closing quote.
My perferred pattern would be much closer to Jonathan's original:
/"((?:[^"]|"")*)"(?!")/
This has the advantage of failing to match if presented with input
that lacks a closing quote.
|
|
|
|
|