Home > Archive > PHP Programming > October 2006 > Help me with a regular expression for PHP
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Help me with a regular expression for PHP
|
|
| cendrizzi 2006-10-30, 7:04 pm |
| I have no idea where to get help on RE stuff. Since it's for a PHP app
I thought I would ask here to see if there was some RE pros. Basically
I'm doing some template stuff and I wanted to use a
preg_replace_callback function to call another function when the
criteria of the RE expression is matched but have no idea how to
accomplish it.
So I start with this:
/<(input|select|textarea)[^>]*name\s*\=\s*\"[_a-zA-Z0-9\s]*\"[^>]*>/
but need to modify it so it only matches if it has '{' characters in
the name but to not match if it does not.
So this would not match:
<input name="test">
But this would match:
<input name="test{0}">
Thanks much in advance.
| |
| Pedro Graca 2006-10-30, 7:04 pm |
| cendrizzi wrote:
> So I start with this:
> /<(input|select|textarea)[^>]*name\s*\=\s*\"[_a-zA-Z0-9\s]*\"[^>]*>/
You'd better not use regular expressions to validate HTML.
The following line is perfectly valid HTML (I think in any version)
<input type="text" name="x><y" id="xy">
> but need to modify it so it only matches if it has '{' characters in
> the name but to not match if it does not.
>
> So this would not match:
> <input name="test">
>
> But this would match:
> <input name="test{0}">
Get the name. Verify it has '{' and '}' (in that order and once only?)
<?php
$name = get_name('<input name="test{0}">'); // 'test{0}'
if (name_is_valid($name)) {
// whatever
}
function get_name($html) {
return 'test{0}'; // sorry!
}
function name_is_valid($name) {
if (($p1 = strpos($name, '{')) === false) return false;
if (strpos($name, '{', $p1+1) !== false) return false;
if (($p2 = strpos($name, '}')) === false) return false;
if (strpos($name, '}', $p2+1) !== false) return false;
return $p1 < $p2;
}
?>
--
I (almost) never check the dodgeit address.
If you *really* need to mail me, use the address in the Reply-To
header with a message in *plain* *text* *without* *attachments*.
| |
| cendrizzi 2006-10-30, 7:04 pm |
| It's not for validation. It's for some custom template stuff that
tells my stuff where to store the value of the form element in the
session. That may not make sense but it's what I need for my
application. So I use the ob_start, etc functions and use regular
expressions against the buffer to manipulate the html or change the
behaivor of certain elements. I could just get the name of each
element and check them using strpos or strstr for the '{' character but
I hoped I could use RE to check from the start if it had that so it
wouldn't require the extra string searches.
Hope that makes sense, it's always a bit of a challenge to explain
things clearly, especially if the program is quite a big one.
On Oct 29, 4:17 pm, Pedro Graca <hex...@dodgeit.com> wrote:
> cendrizzi wrote:
> The following line is perfectly valid HTML (I think in any version)
>
> <input type="text" name="x><y" id="xy">
>
>
>
>
> <?php
> $name = get_name('<input name="test{0}">'); // 'test{0}'
> if (name_is_valid($name)) {
> // whatever
> }
>
> function get_name($html) {
> return 'test{0}'; // sorry!
> }
>
> function name_is_valid($name) {
> if (($p1 = strpos($name, '{')) === false) return false;
> if (strpos($name, '{', $p1+1) !== false) return false;
> if (($p2 = strpos($name, '}')) === false) return false;
> if (strpos($name, '}', $p2+1) !== false) return false;
> return $p1 < $p2;
> }
> ?>
>
> --
> I (almost) never check the dodgeit address.
> If you *really* need to mail me, use the address in the Reply-To
> header with a message in *plain* *text* *without* *attachments*.
| |
| Pedro Graca 2006-10-30, 7:04 pm |
| cendrizzi top-posted and totally messed it up:
> I hoped I could use RE to check from the start if it had that so it
> wouldn't require the extra string searches.
<?php
$data = array(
'<input type="text" name="no!" id="test0"> ',
'<input type="text" name="no{!}" id="test0"> ',
'<input type="text" name="test0" id="test0"> ',
'<input type="text" name="test 0" id="test0"> ',
'<input type="text" name="test{0}" id="test0"> ',
'<input type="text" name="test {0}" id="test0"> ',
'<input type="text" name="test{0}test" id="test0"> ',
'<input type="text" name="test {0} test" id="test0">',
);
$rx = '/<(input|select|textarea)[^>]*' .
# 'name\s*\=\s*\"[_a-zA-Z0-9\s]*\"' . // your original version
'name\s*\=\s*\"[_a-zA-Z0-9\s]*{[_a-zA-Z0-9\s]*}[_a-zA-Z0-9\s]*\"' .
# ---^--- ---^---
'[^>]*>/';
### I think there's a few \ too many in there,
### I didn't look at it very attentively
foreach ($data as $val) {
echo $val, ' :: ';
if (preg_match($rx, $val)) {
echo 'M';
} else {
echo 'No m';
}
echo "atch.\n";
}
?>
--
I (almost) never check the dodgeit address.
If you *really* need to mail me, use the address in the Reply-To
header with a message in *plain* *text* *without* *attachments*.
| |
| BKDotCom 2006-10-30, 7:04 pm |
|
Pedro Graca wrote:
> The following line is perfectly valid HTML (I think in any version)
>
> <input type="text" name="x><y" id="xy">
I would have to disagree
<input type="text" name="x> is invalid: no closing quote around
name value
<y" id="xy"> is invalid. y" isn't a valid cname (only
alphanumeric?)
if you want 'x><y' as a value you'd need to use name="x><y"
| |
| BKDotCom 2006-10-30, 7:04 pm |
| I had a similar RE problem and never figured it out, or found an
answer. I basically ended up using two callbacks..or doing the 2nd
check (does it contain "x") in the first callback
Capture and send all name values to the first (whether or not they
contain the {)
check whether or not the name value contains "{" inside that
cendrizzi wrote:
> I have no idea where to get help on RE stuff. Since it's for a PHP app
> I thought I would ask here to see if there was some RE pros. Basically
> I'm doing some template stuff and I wanted to use a
> preg_replace_callback function to call another function when the
> criteria of the RE expression is matched but have no idea how to
> accomplish it.
>
> So I start with this:
> /<(input|select|textarea)[^>]*name\s*\=\s*\"[_a-zA-Z0-9\s]*\"[^>]*>/
>
> but need to modify it so it only matches if it has '{' characters in
> the name but to not match if it does not.
>
> So this would not match:
> <input name="test">
>
> But this would match:
> <input name="test{0}">
>
> Thanks much in advance.
| |
| Chung Leong 2006-10-30, 7:04 pm |
|
cendrizzi wrote:
> I have no idea where to get help on RE stuff. Since it's for a PHP app
> I thought I would ask here to see if there was some RE pros. Basically
> I'm doing some template stuff and I wanted to use a
> preg_replace_callback function to call another function when the
> criteria of the RE expression is matched but have no idea how to
> accomplish it.
>
> So I start with this:
> /<(input|select|textarea)[^>]*name\s*\=\s*\"[_a-zA-Z0-9\s]*\"[^>]*>/
>
> but need to modify it so it only matches if it has '{' characters in
> the name but to not match if it does not.
>
> So this would not match:
> <input name="test">
>
> But this would match:
> <input name="test{0}">
>
> Thanks much in advance.
Well, just change the [_a-zA-Z0-9\s]* part to [\w\s]*{[\w\s]*}. Of
course, you'll need to do proper capturing in order to form the
replacement string.
\w is equivalent to [_a-zA-Z0-9] by the way.
| |
| cendrizzi 2006-10-30, 7:04 pm |
| No I didn't know that \w was the same. What do you mean by proper
capturing. I really am a 2 year old when it comes to RE stuff.
Thanks!
On Oct 29, 10:04 pm, "Chung Leong" <chernyshev...@hotmail.com> wrote:
> cendrizzi wrote:
>
>
>
>
>
> course, you'll need to do proper capturing in order to form the
> replacement string.
>
> \w is equivalent to [_a-zA-Z0-9] by the way.
| |
| John Dunlop 2006-10-30, 7:04 pm |
| BKDotCom:
> Pedro Graca wrote:
>
Yes, yes it is. In any version.
[color=darkred]
> I would have to disagree
Run it through a validator. You'll find it's valid.
The 'name' attribute is defined as CDATA, so pretty much anything goes
if the attribute value is quoted, including literal less-than and
greater-than signs.
> <input type="text" name="x> is invalid: no closing quote around
> name value
Yes, as a start-tag _in itself_. That wasn't Pedro's example though;
his example was the whole
| <input type="text" name="x><y" id="xy">
> <y" id="xy"> is invalid. y" isn't a valid cname
As a tag in itself, it is invalid HTML, yes. It isn't invalid as part
of the example above.
> (only alphanumeric?)
Generic identifiers (aka, element type names) must begin with upper- or
lowercase letters.
> if you want 'x><y' as a value you'd need to use name="x><y"
No. You only need to replace '<' and '>' with references where they
would be understood as something other than character data.
--
Jock
| |
| Pedro Graca 2006-10-30, 7:04 pm |
| Chung Leong wrote:
> \w is equivalent to [_a-zA-Z0-9] by the way.
It is /almost/ equivalent:
~$ php -r 'echo (preg_match("/^\w+$/", "Graça"))?("yes"):("no"), "\n";'
yes
~$ php -r 'echo (preg_match("/^[_a-zA-Z0-9]+$/", "Graça"))?("yes"):("no"), "\n";'
no
--
I (almost) never check the dodgeit address.
If you *really* need to mail me, use the address in the Reply-To
header with a message in *plain* *text* *without* *attachments*.
| |
| Jerry Stuckle 2006-10-30, 7:04 pm |
| BKDotCom wrote:
> Pedro Graca wrote:
>
>
>
> I would have to disagree
> <input type="text" name="x> is invalid: no closing quote around
> name value
> <y" id="xy"> is invalid. y" isn't a valid cname (only
> alphanumeric?)
>
> if you want 'x><y' as a value you'd need to use name="x><y"
>
Actually, it is legal. name="x><y" is a perfectly valid tag and value.
< and > aren't required here because they are within a quoted
string in a tag.
You do need < and > in plain text, however, when they may be
mistaken for the start/end of a tag.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
| |
| Chung Leong 2006-10-30, 7:05 pm |
|
cendrizzi wrote:[color=darkred]
> No I didn't know that \w was the same. What do you mean by proper
> capturing. I really am a 2 year old when it comes to RE stuff.
>
> Thanks!
>
> On Oct 29, 10:04 pm, "Chung Leong" <chernyshev...@hotmail.com> wrote:
By that I mean you need to grab the substrings which precedes and
follows the text inside the quotation marks. If the input is
<input name="test{0}" size="40">
you'd want
<input name="
and
" size="40">
so that you can form the replacement <input name=" + DATA + "
size="40">.
Presumably you'd want 'test' and '0' as well for looking up the data.
| |
| BKDotCom 2006-10-30, 7:05 pm |
|
John Dunlop wrote:
> Run it through a validator. You'll find it's valid.
Will I?
http://validator.w3.org/check
Warning character "<" is the first character of a delimiter but
occurred as data
This message may appear in several cases:
* You tried to include the "<" character in your page: you should
escape it as "<"
* You used an unescaped ampersand "&": this may be valid in some
contexts, but it is recommended to use "&", which is always safe.
* Another possibility is that you forgot to close quotes in a
previous tag.
| |
| Andy Hassall 2006-10-30, 7:05 pm |
| On 30 Oct 2006 09:02:18 -0800, "BKDotCom" <bkfake-google@yahoo.com> wrote:
>John Dunlop wrote:
>
>Will I?
You certainly should. I've just tried it against the W3C validator, and it
agreed it's valid.
>http://validator.w3.org/check
>Warning character "<" is the first character of a delimiter but
>occurred as data
>This message may appear in several cases:
> * You tried to include the "<" character in your page: you should
>escape it as "<"
> * You used an unescaped ampersand "&": this may be valid in some
>contexts, but it is recommended to use "&", which is always safe.
> * Another possibility is that you forgot to close quotes in a
>previous tag.
Result: Passed validation
File: test.html
Encoding: iso-8859-1
Doctype: HTML 4.01 Transitional
This Page Is Valid HTML 4.01 Transitional!
Here's what I uploaded:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>test</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body>
<form method="post" action="test.php">
<input type="text" name="x><y" id="xy">
</form>
</body>
</html>
(the <meta> being there because I validated it by upload rather than from a
real site that would have sent the relevant HTTP header instead)
What did you upload?
--
Andy Hassall :: andy@andyh.co.uk :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
|
|
|
|
|