For Programmers: Free Programming Magazines  


Home > Archive > PERL Miscellaneous > January 2008 > Quotes around words









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Quotes around words
Pat

2008-01-25, 4:28 am

Hi,

I have a big input file full of words, whitespace, newlines, punctuation,
and various other symbols. I want to surround every word with quotes,
UNLESS it already has quotes around it.

After some trial and error, I was seeing some unexpected results. The
closest I came to getting it right was this:

my $str = ' "these" "have" "quotes" these do not. ';
$str =~ s/([^"a-zA-Z0-9_])([a-zA-Z0-9_]+)([^"a-zA-Z0-9_])/$1"$2"$3/gs;

And the result is this:
"these" "have" "quotes" "these" do "not".

The only problem is that "do" is skipped. Is this expected? So how do I
get around this?

Thanks.
Gunnar Hjalmarsson

2008-01-25, 4:28 am

Pat wrote:
> I have a big input file full of words, whitespace, newlines, punctuation,
> and various other symbols. I want to surround every word with quotes,
> UNLESS it already has quotes around it.
>
> After some trial and error, I was seeing some unexpected results. The
> closest I came to getting it right was this:
>
> my $str = ' "these" "have" "quotes" these do not. ';
> $str =~ s/([^"a-zA-Z0-9_])([a-zA-Z0-9_]+)([^"a-zA-Z0-9_])/$1"$2"$3/gs;
>
> And the result is this:
> "these" "have" "quotes" "these" do "not".
>
> The only problem is that "do" is skipped. Is this expected?


Yes. The problem is that you include the non-word characters before and
after respective word in the match.

> So how do I get around this?


Please read the section "Extended Patterns" in "perldoc perlre". Example:

$str =~ s/(?<!")\b(\w+)\b(?!")/"$1"/g;

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
Abigail

2008-01-25, 4:28 am

_
Pat (none@none.none) wrote on VCCLX September MCMXCIII in
<URL:news:Xns9A30132861D3Bnone@140.99.99.130>:
}} Hi,
}}
}} I have a big input file full of words, whitespace, newlines, punctuation,
}} and various other symbols. I want to surround every word with quotes,
}} UNLESS it already has quotes around it.
}}
}} After some trial and error, I was seeing some unexpected results. The
}} closest I came to getting it right was this:
}}
}} my $str = ' "these" "have" "quotes" these do not. ';
}} $str =~ s/([^"a-zA-Z0-9_])([a-zA-Z0-9_]+)([^"a-zA-Z0-9_])/$1"$2"$3/gs;
}}
}} And the result is this:
}} "these" "have" "quotes" "these" do "not".
}}
}} The only problem is that "do" is skipped. Is this expected? So how do I
}} get around this?


I'd first skip things I want to leave alone, then match a word I want
to quote, and repeat this.

What do I want to skip? Two things: quoted substrings, and substrings
consisting of non-word, non-quote characters. I can match those with
a standard unrolling technique:


$str =~ s {[^"\w]* # Non-word, non-quote sequence
(?:
"[^"]*" # Quoted
[^"\w]* # Non-word, non-quote sequence
)* # Repeat
\K # Cut
(\w+) # Capture an unquoted word.
}
{"$1"}xg; # Replace.


Abigail
--
print v74.117.115.116.32, v97.110.111.116.104.101.114.32,
v80.101.114.108.32, v72.97.99.107.101.114.10;
John W. Krahn

2008-01-25, 8:14 am

Pat wrote:
>
> I have a big input file full of words, whitespace, newlines, punctuation,
> and various other symbols. I want to surround every word with quotes,
> UNLESS it already has quotes around it.
>
> After some trial and error, I was seeing some unexpected results. The
> closest I came to getting it right was this:
>
> my $str = ' "these" "have" "quotes" these do not. ';
> $str =~ s/([^"a-zA-Z0-9_])([a-zA-Z0-9_]+)([^"a-zA-Z0-9_])/$1"$2"$3/gs;
>
> And the result is this:
> "these" "have" "quotes" "these" do "not".
>
> The only problem is that "do" is skipped. Is this expected?


Yes.

> So how do I get around this?


$ perl -le'
my $str = q[ "these" "have" "quotes" these do not. ];
print $str;
$str =~ s/(?<!")\b(\w+)\b(?!")/"$1"/g;
print $str;
'
"these" "have" "quotes" these do not.
"these" "have" "quotes" "these" "do" "not".



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
Petr Vileta

2008-01-25, 8:14 am

Abigail wrote:
> _
> Pat (none@none.none) wrote on VCCLX September MCMXCIII in
> <URL:news:Xns9A30132861D3Bnone@140.99.99.130>:
> }} my $str = ' "these" "have" "quotes" these do not. ';
> }} $str =~
> s/([^"a-zA-Z0-9_])([a-zA-Z0-9_]+)([^"a-zA-Z0-9_])/$1"$2"$3/gs; }}
> }} And the result is this:
> }} "these" "have" "quotes" "these" do "not".
> }}
> }} The only problem is that "do" is skipped. Is this expected? So
> how do I }} get around this?
>
>
> $str =~ s {[^"\w]* # Non-word, non-quote sequence
> (?:
> "[^"]*" # Quoted
> [^"\w]* # Non-word, non-quote sequence
> )* # Repeat
> \K # Cut
> (\w+) # Capture an unquoted word.
> }
> {"$1"}xg; # Replace.
>
>
> Abigail


Please how to do the same in Perl 5.6.1?
I tested script bellow and I got warning "Unrecognized escape \K passed
through at L:\temp\test.pl line 4."

use strict;
use warnings;
my $str = ' "these" "have" "quotes" these do not. ';
$str =~ s {[^"\w]*(?:"[^"]*"[^"\w]*)*\K(\w+)}{"$1"}xg;
print $str;

--
Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your
mail from another non-spammer site please.)

Please reply to <petr AT practisoft DOT cz>

Abigail

2008-01-25, 8:14 am

_
Petr Vileta (stoupa@practisoft.cz) wrote on VCCLX September MCMXCIII in
<URL:news:fncohc$22sr$1@ns.felk.cvut.cz>:
,, Abigail wrote:
,, > _
,, > Pat (none@none.none) wrote on VCCLX September MCMXCIII in
,, > <URL:news:Xns9A30132861D3Bnone@140.99.99.130>:
,, > }} my $str = ' "these" "have" "quotes" these do not. ';
,, > }} $str =~
,, > s/([^"a-zA-Z0-9_])([a-zA-Z0-9_]+)([^"a-zA-Z0-9_])/$1"$2"$3/gs; }}
,, > }} And the result is this:
,, > }} "these" "have" "quotes" "these" do "not".
,, > }}
,, > }} The only problem is that "do" is skipped. Is this expected? So
,, > how do I }} get around this?
,, >
,, >
,, > $str =~ s {[^"\w]* # Non-word, non-quote sequence
,, > (?:
,, > "[^"]*" # Quoted
,, > [^"\w]* # Non-word, non-quote sequence
,, > )* # Repeat
,, > \K # Cut
,, > (\w+) # Capture an unquoted word.
,, > }
,, > {"$1"}xg; # Replace.
,, >
,, >
,, > Abigail
,,
,, Please how to do the same in Perl 5.6.1?

Capture the part before \K, and use it in the replacement. And remove the \K.

,, I tested script bellow and I got warning "Unrecognized escape \K passed
,, through at L:\temp\test.pl line 4."


\K is available in 5.10.



Abigail
--
perl -Mstrict='}); print "Just another Perl Hacker"; ({' -le1
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com