| colder@php.net 2006-06-17, 8:07 am |
| ID: 36112
Updated by: colder@php.net
Reported By: pornel at despammed dot com
-Status: Open
+Status: Closed
Bug Type: Documentation problem
PHP Version: Irrelevant
Assigned To: colder
New Comment:
This bug has been fixed in the documentation's XML sources. Since the
online and downloadable versions of the documentation need some time
to get updated, we would like to ask you to be a bit patient.
Thank you for the report, and for helping us make our documentation
better.
I simply removed the example for now.
Previous Comments:
------------------------------------------------------------------------
[2006-03-12 17:06:18] colder@php.net
There are lot of inconsistencies in this example:
1) About @<script[^>]*?>.*?</script>@si :
a) the first ? is useless.
2) About @<[\/\!]*?[^<>]*?>@si :
a) / and ! don't have to be escaped.
b) [\/\!]*? is useless, as it's already matched by [^<>]*?.
c) the ? of [^<>]*? is useless.
d) the PCRE_DOTALL modifier is useless, there is no dot.
e) the PCRE_CASELESS modifier is useless.
f) what is the point avoiding "<" in a tag?
3) About @([\r\n])[\s]+@ :
a) no need to put \s in a char class.
b) every \r\n will be changed to \r, as \s matches \n.
I think the whole example has to be reconsidered, because there are
already functions to do some of the job, like strip_tags() and
html_entity_decode().
------------------------------------------------------------------------
[2006-01-20 23:54:03] pornel at despammed dot com
Description:
------------
The code on http://uk.php.net/preg_replace:
$search = array ('@<script[^>]*?>.*?</script>@si', // Strip
out javascript
'@<[\/\!]*?[^<>]*?>@si', // Strip
out HTML tags
doesn't work as advertised. For example it will leave
contents of:
<script>xxx</script >
and worse, it will output valid script tags if given:
<<>script>evil<<>/script>
If these patterns were used on some website (for stripping
markup from user's comments for example), they'd allow XSS
attack.
Since it's near impossible to properly parse HTML with
regular expressions I suggest:
* renaming example from 'Convert HTML to text' to 'Remove
HTML markup'
* adding replacement of '<' as '>'
* suggesting use of more robust methods, like strip_tags,
nl2br, htmlspecialchars or DOM interface.
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/?id=36112&edit=1
|