For Programmers: Free Programming Magazines  


Home > Archive > PHP Language > March 2004 > regex









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author regex
Seb

2004-03-29, 9:33 am

Hi,

Has anyone an idee how i can replace every character in a string if it is not alphanumeric ?

something like eregi_replace, but i don't know how i say in regex NOT.

Tnx
Allan Rydberg

2004-03-29, 9:33 am




http://ch.php.net/manual/en/function.preg-replace.php

$out = preg_replace("/^\W+/, "", $in);





Seb wrote:

> Hi,
>
> Has anyone an idee how i can replace every character in a string if it
> is not alphanumeric ?
>
> something like eregi_replace, but i don't know how i say in regex NOT.
>
> Tnx


Ian.H

2004-03-29, 9:33 am

On Mon, 29 Mar 2004 13:10:42 +0000, Seb wrote:


DO NOT POST IN HTML FORMAT!!


> Hi,
>
> Has anyone an idee how i can replace every character in a string if it
> is not alphanumeric ?
>
> something like eregi_replace, but i don't know how i say in regex NOT.



Use [^] in your regex for "not" =)



Regards,

Ian


[ way too many groups trimmed from x-post ]

--
Ian.H
digiServ Network
London, UK
http://digiserv.net/

Seb

2004-03-29, 10:33 am

Tnxx, that works for me.

Here is a reference for other people with same probs :

WildcardDescription
\dMatches a digit (character class [0-9])
\DMatches a non digit ([^0-9])
\wMatches a word character ([a-zA-Z0-9_])
\WMatches a non-word character ([^a-zA-Z0-9_])
\sMatches a space character ([\t\n ])
\SMatches a non-space character ([^\t\n ])
..Matches any character
$Matches "end of line" if placed at the end of a regular expression

"Allan Rydberg" <alrdbg@southtech.net> wrote in message
news:c497iu$u8e$1@newshispeed.ch...



http://ch.php.net/manual/en/function.preg-replace.php

$out = preg_replace("/^\W+/, "", $in);





Seb wrote:

> Hi,
>
> Has anyone an idee how i can replace every character in a string if it
> is not alphanumeric ?
>
> something like eregi_replace, but i don't know how i say in regex NOT.
>
> Tnx



John Dunlop

2004-03-30, 12:32 pm

Followup-to c.l.p. This is off-topic in two groups.

Seb wrote upsidedown:

> [Allan Rydberg wrote upsidedown:]
>
>

What do you mean? Will you recast the question, please, Seb?
[color=darkred]
(-----------------------------^
A typo there!)

I can't fit the above pattern into any of my interpretations of Seb's
question.

preg_replace('_^\W+_','',$foo)

returns $foo with one or more non-"word" characters at the beginning
stripped off. If $foo were "-_-", the first hyphen would match and
get replaced by an empty string, but the underscore and second hyphen
would remain.
[color=darkred]
> Tnxx, that works for me.


Really? You used an atypical definition of "alphanumeric" then.
Despite Merrium-Webster Online's definition allowing punctuation
marks -- the inclusion of underscores are described as perverse by
FOLDOC -- alphanumerics are usually represented by the character
class [a-zA-Z0-9]. M-W gives the etymology of "alphanumeric" as
"/alpha/bet/ic/ + /numeric/", i.e., it derived from "alphabet" and
"numeric". The Manual's pattern syntax guide, however, doesn't
include underscores in its implicit definition of "alphanumeric".
(Is there an explicit definition, anywhere in the Manual?) C.f. the
character type functions,

http://www.php.net/manual/en/ref.ctype.php

> Here is a reference for other people with same probs :


I reckon a better reference is the Manual, don't you?

http://www.php.net/manual/en/pcre.pattern.syntax.php

> \dMatches a digit (character class [0-9])
> \DMatches a non digit ([^0-9])


Although your character classes are correct and clarify your
definition, it'd be less ambiguous to state that \d matches *decimal*
digits, not just digits, and that \D matches any character that isn't
a *decimal* digit. \d does not match all hexadecimal digits, for
example.

> \wMatches a word character ([a-zA-Z0-9_])
> \WMatches a non-word character ([^a-zA-Z0-9_])


Your character classes are misleading.

| A "word" character is any letter or digit or the underscore
| character, that is, any character which can be part of a Perl
| "word". The definition of letters and digits is controlled by
| PCRE's character tables, and may vary if locale-specific matching
| is taking place. [ ... ]

http://www.php.net/manual/en/pcre.pattern.syntax.php

> \sMatches a space character ([\t\n ])
> \SMatches a non-space character ([^\t\n ])


Your character classes are incorrect and out of sync with your
natural language descriptions, which are also incorrect. The generic
character type \s matches "whitespace" characters, not just the space
character; \S matches any non-"whitespace" character. According to
the Manual, the characters \s matches are, by default, normally:
"space, formfeed, newline, carriage return, horizontal tab, and
vertical tab". The "space" in the above definition covers non-
breaking spaces and spaces, I think.

> .Matches any character


... excluding newlines by default.

| Outside a character class, a dot in the pattern matches any one
| character in the subject, including a non-printing character, but
| not (by default) newline. If the PCRE_DOTALL option is set, then
| dots match newlines as well. [ ... ] Dot has no special meaning in
| a character class.

http://www.php.net/manual/en/pcre.pattern.syntax.php

> $Matches "end of line" if placed at the end of a regular expression


While that may sometimes be true, it doesn't tell the whole story.
The $ isn't a "wildcard" or generic character type metacharacter.

| A dollar character is an assertion which is TRUE only if the
| current matching point is at the end of the subject string, or
| immediately before a newline character that is the last character
| in the string (by default). Dollar need not be the last character
| of the pattern if a number of alternatives are involved, but it
| should be the last item in any branch in which it appears.
|
| [ ... ] The meaning of dollar can be changed so that it matches
| only at the very end of the string, by setting the
| PCRE_DOLLAR_ENDONLY option at compile or matching time.

http://www.php.net/manual/en/pcre.pattern.syntax.php

HTH.

--
Jock
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com