Home > Archive > PERL Beginners > July 2005 > search and replace
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
search and replace
|
|
| Brent Clark 2005-07-24, 8:29 pm |
| Hi all
Think I have a better unstanding of the use of () for my regex search,
but this morning I have a new set of problems, whereby I need to perform
a search and replace and then pass on to the new variable.
My current code is as so, and works:
$_ =~ s/(^\w+\.dat\|)//; my $lineExtract = $_;
But I would like to have it a one liner.
I tried:
my ($lineExtract) = ($_ =~ s/(^\w+\.dat\|)//);
and
$lineExtract = ($_ =~ s/(^\w+\.dat\|)//);
If I do a print on the $lineExtract I dont get the output I require.
Any tips and advise would greatfully be appreciated.
I tried googling and the perldoc, but I havent found a decent answer.
Kind Regards
Brent Clark
| |
| Jeff 'japhy' Pinyan 2005-07-24, 8:29 pm |
| On Jul 22, Brent Clark said:
> Think I have a better unstanding of the use of () for my regex search, but
> this morning I have a new set of problems, whereby I need to perform a search
> and replace and then pass on to the new variable.
>
> My current code is as so, and works:
>
> $_ =~ s/(^\w+\.dat\|)//;
> my $lineExtract = $_;
I'd ask first why you've got something in $_. You might be able to save
time by putting the content directly into $lineExtract.
Also, you don't need to say '$_ =~' when doing a pattern match or
substitution against $_; it's implied:
s/foo/bar/; # works on $_
This one-liner stores $_'s contents in $lineExtract, and THEN runs the
substitution on $lineExtract (keeping $_ intact):
(my $lineExtract = $_) =~ s/.../.../;
But again, I'm curious how you got stuff into $_ that you want to copy to
another location anyway.
--
Jeff "japhy" Pinyan % How can we ever be the sold short or
RPI Acacia Brother #734 % the cheated, we who for every service
http://japhy.perlmonk.org/ % have long ago been overpaid?
http://www.perlmonks.org/ % -- Meister Eckhart
| |
| Tom Allison 2005-07-24, 8:29 pm |
| Brent Clark wrote:
> Hi all
>
> Think I have a better unstanding of the use of () for my regex search,
> but this morning I have a new set of problems, whereby I need to perform
> a search and replace and then pass on to the new variable.
>
> My current code is as so, and works:
>
> $_ =~ s/(^\w+\.dat\|)//; my $lineExtract = $_;
>
> But I would like to have it a one liner.
>
> I tried:
>
> my ($lineExtract) = ($_ =~ s/(^\w+\.dat\|)//);
> and
> $lineExtract = ($_ =~ s/(^\w+\.dat\|)//);
>
($lineExtract) = /^\w+\.dat\|(.+)$/o;
s/(^\w+\.dat\|)//o says you want to remove all the \w+.dat| stuff
from the string in $_ and store it in $1;
If you want $lineExtract to be what is left after it's striped of the
\w+\.dat\| then you only need to use the line above.
If you want $lineExtract to contain the material that has been removed
then you want:
$lineExtract = /(^\w+\.dat\|)/o
If you want both... could you do this:
($lineExtract, $_) = /(^\w+\.dat\|)(.+)$/o;
I think it would work, but you'll want to test it.
(I like to add the regex option 'o' at the end to improve performance.)
| |
| Jeff 'japhy' Pinyan 2005-07-24, 8:29 pm |
| On Jul 23, Tom Allison said:
> ($lineExtract, $_) = /(^\w+\.dat\|)(.+)$/o;
>
> (I like to add the regex option 'o' at the end to improve performance.)
This is a common misconception. The purpose and effects of the /o
modifier (as well as the /s and /m modifiers) are unclear to a lot of Perl
programmers out there. I'll make a PSA now and try to clear things up a
bit.
The /o modifier tells the internal regex compiler that, after this regex
has been compiled 'o'nce, it is never to be compiled again. "Well, what
good is that?" you ask. For your average regex, there is absolutely no
difference, no change in performance: /foo/ and /foo/o are identical.
The place where the /o modifier matters is when there are variables inside
the regex, e.g. /^$field: (.*)/. The /o modifier says that, after the
regex has been compiled for the first time -- which means that all the
variables in it have been interpolated -- *that* compiled regex will take
the place of the regex with the variables in it. Any changes to your
variables will be ignored, for the rest of the program. There's no way to
reverse the effect of /o.
The /s modifier often carries the mnemonic "so you can match your string
as though it were a single line" and that all of the sudden makes people
think all sorts of crazy things. The /m modifier is often said to "make
your regex match over multiple lines" and this too makes people do weird
things. I've seen code like this:
while (<FILE> ) {
print $1 if /Start(.*)End/ms;
}
The person has a file with "Start" on one line and "End" on another, and
they're not sure why their regex doesn't match the stuff in between. The
reason is because the /m and /s modifiers change the *regex* and nothing
else. In that code above, you're still only reading one physical line of
text from FILE at a time.
So what do /m and /s do? It's very simple. The only thing the /s
modifier does is make the . metacharacter match newlines. That's all.
If there's no . in your regex, there's no need for you to add an /s to
your regex. The /m modifier makes ^ and $ match the beginning and end of
"lines" -- that is, it makes ^ match after any newline in your string (as
well as at the absolute beginning of your string), and it makes $ match
before any newline in your string (as well as at the absolute end of the
string).
So there you have it. I can go into more detail about /o and regex
compilation (and have before, probably on this list), but for now, what
I've told you is all you need to know.
--
Jeff "japhy" Pinyan % How can we ever be the sold short or
RPI Acacia Brother #734 % the cheated, we who for every service
http://japhy.perlmonk.org/ % have long ago been overpaid?
http://www.perlmonks.org/ % -- Meister Eckhart
| |
| Tom Allison 2005-07-24, 8:29 pm |
| Jeff 'japhy' Pinyan wrote:
> On Jul 23, Tom Allison said:
>
>
>
> This is a common misconception. The purpose and effects of the /o
> modifier (as well as the /s and /m modifiers) are unclear to a lot of
> Perl programmers out there. I'll make a PSA now and try to clear things
> up a bit.
>
> The /o modifier tells the internal regex compiler that, after this regex
> has been compiled 'o'nce, it is never to be compiled again. "Well, what
> good is that?" you ask. For your average regex, there is absolutely no
> difference, no change in performance: /foo/ and /foo/o are identical.
> The place where the /o modifier matters is when there are variables
> inside the regex, e.g. /^$field: (.*)/. The /o modifier says that,
> after the regex has been compiled for the first time -- which means that
> all the variables in it have been interpolated -- *that* compiled regex
> will take the place of the regex with the variables in it. Any changes
> to your variables will be ignored, for the rest of the program. There's
> no way to reverse the effect of /o.
>
If you are running regex without variables, like in this case, it does
improve things if you are doing this through a loop within your code.
Otherwise they are identical.
| |
| Jeff 'japhy' Pinyan 2005-07-24, 8:29 pm |
| On Jul 23, Tom Allison said:
> Jeff 'japhy' Pinyan wrote:
>
>
> If you are running regex without variables, like in this case, it does improve
> things if you are doing this through a loop within your code. Otherwise they
> are identical.
I disagree. I see no consistent data that shows /rx/o is faster than
/rx/, or vice versa (and I just ran a few benchmarks). Any difference in
speed is purely artificial.
The reason that I am confident of this is because I know how the regex
compiler works internally with Perl. When Perl is compiling your program,
when it sees a constant regex (a regex with no variables in it), it does
something special that it doesn't do when the regex does have variables in
it.
A regex with variables in it goes through three primary steps when
encountered in the running of your code:
1. interpolate all the variables in the regex
2. compare THIS version of the regex with the last version of the regex
we saw at this op-code
2a. if they are the same, use the already compiled version of the
regex (from last time)
2b. if they are different, recompile the regex
3. match the target string using this compiled regex
A regex without variables in it at all goes through a similar set of
steps, but these happen at *compile*-time, not *run*-time:
1. compile this regex
2. create an op-code to match against this compiled regex
Now, when you place an /o modifier on a regex, all that means is that WHEN
THAT REGEX GETS COMPILED, it replaces the op-code it's at with an op-code
that says "match against this compiled regex". That means a regex with
variables and the /o modifier only gets interpolated once, and from then
on, Perl doesn't even recognize it as a regex with variables, but instead
sees it as just another constant regex. Placing an /o modifier on
a constant regex doesn't produce any meaningful loss or gain.
--
Jeff "japhy" Pinyan % How can we ever be the sold short or
RPI Acacia Brother #734 % the cheated, we who for every service
http://japhy.perlmonk.org/ % have long ago been overpaid?
http://www.perlmonks.org/ % -- Meister Eckhart
|
|
|
|
|