Home > Archive > PHP Language > March 2004 > regex
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
|
| Hi,
Has anyone an idee how i can replace every character in a string if it is not alphanumeric ?
something like eregi_replace, but i don't know how i say in regex NOT.
Tnx
| |
| Allan Rydberg 2004-03-29, 9:33 am |
|
http://ch.php.net/manual/en/function.preg-replace.php
$out = preg_replace("/^\W+/, "", $in);
Seb wrote:
> Hi,
>
> Has anyone an idee how i can replace every character in a string if it
> is not alphanumeric ?
>
> something like eregi_replace, but i don't know how i say in regex NOT.
>
> Tnx
| |
|
| On Mon, 29 Mar 2004 13:10:42 +0000, Seb wrote:
DO NOT POST IN HTML FORMAT!!
> Hi,
>
> Has anyone an idee how i can replace every character in a string if it
> is not alphanumeric ?
>
> something like eregi_replace, but i don't know how i say in regex NOT.
Use [^] in your regex for "not" =)
Regards,
Ian
[ way too many groups trimmed from x-post ]
--
Ian.H
digiServ Network
London, UK
http://digiserv.net/
| |
|
| Tnxx, that works for me.
Here is a reference for other people with same probs :
WildcardDescription
\dMatches a digit (character class [0-9])
\DMatches a non digit ([^0-9])
\wMatches a word character ([a-zA-Z0-9_])
\WMatches a non-word character ([^a-zA-Z0-9_])
\sMatches a space character ([\t\n ])
\SMatches a non-space character ([^\t\n ])
..Matches any character
$Matches "end of line" if placed at the end of a regular expression
"Allan Rydberg" <alrdbg@southtech.net> wrote in message
news:c497iu$u8e$1@newshispeed.ch...
http://ch.php.net/manual/en/function.preg-replace.php
$out = preg_replace("/^\W+/, "", $in);
Seb wrote:
> Hi,
>
> Has anyone an idee how i can replace every character in a string if it
> is not alphanumeric ?
>
> something like eregi_replace, but i don't know how i say in regex NOT.
>
> Tnx
| |
| John Dunlop 2004-03-30, 12:32 pm |
| Followup-to c.l.p. This is off-topic in two groups.
Seb wrote upsidedown:
> [Allan Rydberg wrote upsidedown:]
>
>
What do you mean? Will you recast the question, please, Seb?
[color=darkred]
(-----------------------------^
A typo there!)
I can't fit the above pattern into any of my interpretations of Seb's
question.
preg_replace('_^\W+_','',$foo)
returns $foo with one or more non-"word" characters at the beginning
stripped off. If $foo were "-_-", the first hyphen would match and
get replaced by an empty string, but the underscore and second hyphen
would remain.
[color=darkred]
> Tnxx, that works for me.
Really? You used an atypical definition of "alphanumeric" then.
Despite Merrium-Webster Online's definition allowing punctuation
marks -- the inclusion of underscores are described as perverse by
FOLDOC -- alphanumerics are usually represented by the character
class [a-zA-Z0-9]. M-W gives the etymology of "alphanumeric" as
"/alpha/bet/ic/ + /numeric/", i.e., it derived from "alphabet" and
"numeric". The Manual's pattern syntax guide, however, doesn't
include underscores in its implicit definition of "alphanumeric".
(Is there an explicit definition, anywhere in the Manual?) C.f. the
character type functions,
http://www.php.net/manual/en/ref.ctype.php
> Here is a reference for other people with same probs :
I reckon a better reference is the Manual, don't you?
http://www.php.net/manual/en/pcre.pattern.syntax.php
> \dMatches a digit (character class [0-9])
> \DMatches a non digit ([^0-9])
Although your character classes are correct and clarify your
definition, it'd be less ambiguous to state that \d matches *decimal*
digits, not just digits, and that \D matches any character that isn't
a *decimal* digit. \d does not match all hexadecimal digits, for
example.
> \wMatches a word character ([a-zA-Z0-9_])
> \WMatches a non-word character ([^a-zA-Z0-9_])
Your character classes are misleading.
| A "word" character is any letter or digit or the underscore
| character, that is, any character which can be part of a Perl
| "word". The definition of letters and digits is controlled by
| PCRE's character tables, and may vary if locale-specific matching
| is taking place. [ ... ]
http://www.php.net/manual/en/pcre.pattern.syntax.php
> \sMatches a space character ([\t\n ])
> \SMatches a non-space character ([^\t\n ])
Your character classes are incorrect and out of sync with your
natural language descriptions, which are also incorrect. The generic
character type \s matches "whitespace" characters, not just the space
character; \S matches any non-"whitespace" character. According to
the Manual, the characters \s matches are, by default, normally:
"space, formfeed, newline, carriage return, horizontal tab, and
vertical tab". The "space" in the above definition covers non-
breaking spaces and spaces, I think.
> .Matches any character
... excluding newlines by default.
| Outside a character class, a dot in the pattern matches any one
| character in the subject, including a non-printing character, but
| not (by default) newline. If the PCRE_DOTALL option is set, then
| dots match newlines as well. [ ... ] Dot has no special meaning in
| a character class.
http://www.php.net/manual/en/pcre.pattern.syntax.php
> $Matches "end of line" if placed at the end of a regular expression
While that may sometimes be true, it doesn't tell the whole story.
The $ isn't a "wildcard" or generic character type metacharacter.
| A dollar character is an assertion which is TRUE only if the
| current matching point is at the end of the subject string, or
| immediately before a newline character that is the last character
| in the string (by default). Dollar need not be the last character
| of the pattern if a number of alternatives are involved, but it
| should be the last item in any branch in which it appears.
|
| [ ... ] The meaning of dollar can be changed so that it matches
| only at the very end of the string, by setting the
| PCRE_DOLLAR_ENDONLY option at compile or matching time.
http://www.php.net/manual/en/pcre.pattern.syntax.php
HTH.
--
Jock
|
|
|
|
|