Home > Archive > PERL Miscellaneous > April 2007 > FAQ 4.32 How do I strip blank space from the beginning/end of a string?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
FAQ 4.32 How do I strip blank space from the beginning/end of a string?
|
|
| PerlFAQ Server 2007-04-23, 7:03 pm |
| This is an excerpt from the latest version perlfaq4.pod, which
comes with the standard Perl distribution. These postings aim to
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .
--------------------------------------------------------------------
4.32: How do I strip blank space from the beginning/end of a string?
(contributed by brian d foy)
A substitution can do this for you. For a single line, you want to
replace all the leading or trailing whitespace with nothing. You can do
that with a pair of substitutions.
s/^\s+//;
s/\s+$//;
You can also write that as a single substitution, although it turns out
the combined statement is slower than the separate ones. That might not
matter to you, though.
s/^\s+|\s+$//g;
In this regular expression, the alternation matches either at the
beginning or the end of the string since the anchors have a lower
precedence than the alternation. With the "/g" flag, the substitution
makes all possible matches, so it gets both. Remember, the trailing
newline matches the "\s+", and the "$" anchor can match to the physical
end of the string, so the newline disappears too. Just add the newline
to the output, which has the added benefit of preserving "blank"
(consisting entirely of whitespace) lines which the "^\s+" would remove
all by itself.
while( <> )
{
s/^\s+|\s+$//g;
print "$_\n";
}
For a multi-line string, you can apply the regular expression to each
logical line in the string by adding the "/m" flag (for "multi-line").
With the "/m" flag, the "$" matches *before* an embedded newline, so it
doesn't remove it. It still removes the newline at the end of the
string.
$string =~ s/^\s+|\s+$//gm;
Remember that lines consisting entirely of whitespace will disappear,
since the first part of the alternation can match the entire string and
replace it with nothing. If need to keep embedded blank lines, you have
to do a little more work. Instead of matching any whitespace (since that
includes a newline), just match the other whitespace.
$string =~ s/^[\t\f ]+|[\t\f ]+$//mg;
--------------------------------------------------------------------
The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.
If you'd like to help maintain the perlfaq, see the details in
perlfaq.pod.
--
Posted via a free Usenet account from http://www.teranews.com
| |
| Petr Vileta 2007-04-23, 10:01 pm |
| > 4.32: How do I strip blank space from the beginning/end of a string?
>
> s/^\s+//;
> s/\s+$//;
>
> You can also write that as a single substitution, although it turns out
> the combined statement is slower than the separate ones. That might not
> matter to you, though.
>
> s/^\s+|\s+$//g;
>
I'm using this
s/^\s*(.*?)\s*$/$1/;
Is this better or poorer then examples above?
--
Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your mail
from another non-spammer site please.)
| |
| brian d foy 2007-04-24, 4:02 am |
| In article <x74pn6ftfx.fsf@mail.sysarch.com>, Uri Guttman
<uri@stemsystems.com> wrote:
> so the final result is trust the FAQ and don't try to beat it as it is
> has been tested over many years
Well, the FAQ has often been wrong or outdated. Don't trust the FAQ
necessarily: take what you think might be a different solution and test
it against the FAQ. If your new solution comes out better, post it here
or send it to perlfaq-workers as noted in perlfaq. You might just make
the FAQ even better! :)
--
Posted via a free Usenet account from http://www.teranews.com
| |
| Petr Vileta 2007-04-25, 10:01 pm |
| "Uri Guttman" <uri@stemsystems.com> píse v diskusním príspevku
news:x74pn6ftfx.fsf@mail.sysarch.com...
>
> out
> not
> PV> I'm using this
> PV> s/^\s*(.*?)\s*$/$1/;
> PV> Is this better or poorer then examples above?
>
> so there are plenty of differences and they point out that yours will
> likely be slower than either of the FAQ answers (and definitely slower
> if no whitespace is found).
>
Thank you for detailed explaining. I'm using my own function for removing
ledaing/trailing spaces and replacing multiple spaces with one only. This
function I use for normalizing string before I put in into databases to
avoid duplications. I mean duplications from human point of view, because
Johh Doe
John Doe
is the same for human but different for machine. By this FAQ and your ansver
I rewote my function to this (maybe some can use this too)
sub normalize_string {
my $string = shift;
$string =~ s/^\s+//s;
$string =~ s/\s+$//s;
$string =~ s/\s+/ /sg;
return $string;
}
--
Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your mail
from another non-spammer site please.)
| |
| anno4000@radom.zrz.tu-berlin.de 2007-04-26, 7:03 pm |
| Petr Vileta <stoupa@practisoft.cz> wrote in comp.lang.perl.misc:
> "Uri Guttman" <uri@stemsystems.com> píse v diskusním príspevku
> news:x74pn6ftfx.fsf@mail.sysarch.com...
[...]
[color=darkred]
> is the same for human but different for machine. By this FAQ and your ansver
> I rewote my function to this (maybe some can use this too)
>
> sub normalize_string {
> my $string = shift;
> $string =~ s/^\s+//s;
> $string =~ s/\s+$//s;
> $string =~ s/\s+/ /sg;
> return $string;
> }
The squeezing of multiple blanks into one can also be done by tr///,
which is a little faster. Di vantage: You can't specify "\s" but
must list the white space characters explicitly. If you do that first,
the regexes for trimming leading and trailing space have to deal with
at most one blank and can be simplified. So, alternatively:
sub normalize_string {
my $string = shift;
$string =~ tr/ \t\n//s;
$string =~ s/^\s//s;
$string =~ s/\s$//s;
return $string;
}
Anno
| |
| Bart Lateur 2007-04-26, 7:03 pm |
| PerlFAQ Server wrote:
>4.32: How do I strip blank space from the beginning/end of a string?
So in what way does a blank space differ from a normal space?
--
Bart.
| |
| Petr Vileta 2007-04-26, 10:01 pm |
| <anno4000@radom.zrz.tu-berlin.de> píše v diskusním příspěvku
news:59b6adF2ir6nuU1@mid.dfncis.de...
> Petr Vileta <stoupa@practisoft.cz> wrote in comp.lang.perl.misc:
> The squeezing of multiple blanks into one can also be done by tr///,
> which is a little faster. Di vantage: You can't specify "\s" but
> must list the white space characters explicitly. If you do that first,
> the regexes for trimming leading and trailing space have to deal with
> at most one blank and can be simplified. So, alternatively:
>
> sub normalize_string {
> my $string = shift;
> $string =~ tr/ \t\n//s;
> $string =~ s/^\s//s;
> $string =~ s/\s$//s;
> return $string;
> }
>
Thanks for improve my function. Possible latest version should be
sub normalize_string {
my $string = shift;
$string =~ tr/ \t\r\n\f\0xa0//s;
$string =~ s/^\s//;
$string =~ s/\s$//;
return $string;
}
Stupid "0xa0" character is some soft wrap character in Windows world and
sometime create a solid problems in strings comparison.
--
Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your mail
from another non-spammer site please.)
| |
| brian d foy 2007-04-27, 4:02 am |
| In article <4u2233lrud945grkbe47d6svrq1afngeud@4ax.com>, Bart Lateur
<bart.lateur@pandora.be> wrote:
> PerlFAQ Server wrote:
>
>
> So in what way does a blank space differ from a normal space?
I think normal space is semantic and has a use, such as separating
tokens or lining up columns. Blank space is an artefact of parsing or
some other process and doesn't have a use.
That doesn't mean that the question is well-worded, however. :)
--
Posted via a free Usenet account from http://www.teranews.com
| |
| Dr.Ruud 2007-04-27, 8:02 am |
| anno4000@radom.zrz.tu-berlin.de schreef:
> sub normalize_string {
> my $string = shift;
> $string =~ tr/ \t\r\n\f\0xa0/ /s;
> $string =~ s/^ //;
> $string =~ s/ $//;
> return $string;
> }
Variant:
sub normalize_string {
my $string = shift;
tr/ \t\r\n\f\0xa0/ /s,
s/^ //,
s/ $// for $string;
return $string;
}
--
Affijn, Ruud
"Gewoon is een tijger."
|
|
|
|
|