Wednesday, April 4. 2007
During the last week I was performing some audits and like so often it contained
preg_match() filters that were not correct. Most PHP developers use
^ and
$ within their regular expressions without actually reading the documentation about what they really achieve. You will find a lot of input filters like the following one.
<?php
$clean = array();
if (preg_match("/^[0-9]+:[X-Z]+$/", $_GET['var'])) {
$clean['var'] = $_GET['var'];
}
?>
Quite common way to filter incoming data, isn't it?
However the problem is, that the author of such a regular expression did not correctly read the documentation and mistakes the
$ character for the definitive end of the subject. However the real meaning, as it is even documented in the PHP manual is that
$ means the end of the subject OR not the real end but nearly, only followed by a single
'\n' linebreak. This means that the following request will also pass the filter.
http://server.tld/index.php?var=012345:XYZ%0a
In several circumstances a newline character can be dangerous. For example when you want to stop HTTP Response Splitting or Email Injection attacks. To correct the above regular expression it is necessary to add the
D modifier to it that changes the meaning of the
$ specifier to really mean the end of the subject. Here is the corrected example.
<?php
$clean = array();
if (preg_match("/^[0-9]+:[X-Z]+$/D", $_GET['var'])) {
$clean['var'] = $_GET['var'];
}
?>
I hope this tip helps getting rid of all these wrong filters once and for all. People using ext/filter should prepare for a recompile, too.
PS: The regular expression is now more complicated to make this post easier to understand