For Programmers: Free Programming Magazines  


Home > Archive > PERL Programming > August 2004 > apache regex









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author apache regex
d0x

2004-07-06, 3:59 pm

can someone help me out with a regular expression that can match

*.htm?pkg=1&foo=bar but at the same time NOT match the following *.html

i need to be able to use this for mod Rewrite.


thanks
Gunnar Hjalmarsson

2004-07-06, 3:59 pm

d0x wrote:
> can someone help me out with a regular expression that can match
>
> *.htm?pkg=1&foo=bar but at the same time NOT match the following
> *.html
>
> i need to be able to use this for mod Rewrite.


Why did you ask that in a Perl newsgroup? Wouldn't you think that a
newsgroup dealing with apache had been more appropriate?

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
d0x

2004-07-06, 3:59 pm

On Tue, 06 Jul 2004 10:40:44 +0200, Gunnar Hjalmarsson wrote:

> d0x wrote:
>
> Why did you ask that in a Perl newsgroup? Wouldn't you think that a
> newsgroup dealing with apache had been more appropriate?


because it was a regular expression question. I just happen to be using it
for apache. I find the outside of the perl users, other people arn't as
familiar with regex.
Gunnar Hjalmarsson

2004-07-06, 3:59 pm

d0x wrote:
> Gunnar Hjalmarsson wrote:
>
> because it was a regular expression question. I just happen to be
> using it for apache. I find the outside of the perl users, other
> people arn't as familiar with regex.


Regular expressions in Perl is not the same as regular expressions in
other programming languages. As regards apache and mod_rewrite, I
played with it a couple of years ago, and as far as I can recall, the
syntax differs quite a bit from Perl regexes.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
gnari

2004-07-06, 3:59 pm

"d0x" <dan@no.spam> wrote in message
news:pan.2004.07.06.08.43.53.365790@no.spam...
> can someone help me out with a regular expression that can match
>
> *.htm?pkg=1&foo=bar but at the same time NOT match the following *.html


this question is particularly unclear. a real example might help.

>
> i need to be able to use this for mod Rewrite.


in that case you are probably barking up the wrong tree.
IIRC , the RewriteRule regex does not match the querystring part.
you have to use a combination of RewriteCond and RewriteRuleþ
in other words, not really a perl question, as Gunnar has pointed out.

gnari




Gunnar Hjalmarsson

2004-07-06, 3:59 pm

gnari wrote:
> "d0x" <dan@no.spam> wrote in message
> news:pan.2004.07.06.08.43.53.365790@no.spam...
>
> this question is particularly unclear. a real example might help.
>
>
> in that case you are probably barking up the wrong tree. IIRC , the
> RewriteRule regex does not match the querystring part. you have to
> use a combination of RewriteCond and RewriteRuleþ in other words,
> not really a perl question, as Gunnar has pointed out.


But, still trying to help, are you sure you need to *match* the
querystring? For rewriting a URL like:

http://www.olddomain.com/somedir/my...23&othervar=abc

, this is a working example:

RewriteRule ^(.+)\.cgi$
http://www.newdomain.com/otherdir/$1.pl?%{QUERY_STRING}

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
gnari

2004-07-06, 3:59 pm

"Gunnar Hjalmarsson" <noreply@gunnar.cc> wrote in message
news:2kvicaF6toaqU1@uni-berlin.de...
> gnari wrote:
[color=darkred]
>
> But, still trying to help, are you sure you need to *match* the
> querystring? For rewriting a URL like:
>
> http://www.olddomain.com/somedir/my...23&othervar=abc
>
> , this is a working example:
>
> RewriteRule ^(.+)\.cgi$
> http://www.newdomain.com/otherdir/$1.pl?%{QUERY_STRING}


more and more OT, but no need for the ?%{QUERY_STRING} in the
destination string if you are not going to change the QS

i understood the OP as wanting also to match the pkg=1&foo=bar
in that case he also needs one or more RewriteCond %{QUERY_STRING}

gnari




Gunnar Hjalmarsson

2004-07-06, 3:59 pm

gnari wrote:
> "Gunnar Hjalmarsson" <noreply@gunnar.cc> wrote in message
> news:2kvicaF6toaqU1@uni-berlin.de...
>
> more and more OT,


Admittedly.

> but no need for the ?%{QUERY_STRING} in the destination string if
> you are not going to change the QS


Depends on what it is you are doing, I suppose. The above was a
redirect because a script had been moved to another server, which
required another file extension. In that case, the ?%{QUERY_STRING}
was the whole point, or else I could just have used the Redirect
directive.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
gnari

2004-07-06, 3:59 pm

"Gunnar Hjalmarsson" <noreply@gunnar.cc> wrote in message
news:2l0a9sF797ueU1@uni-berlin.de...
> gnari wrote:

[about using mod_rewrite to change extension/server[color=darkred]
>
>
> Depends on what it is you are doing, I suppose. The above was a
> redirect because a script had been moved to another server, which
> required another file extension. In that case, the ?%{QUERY_STRING}
> was the whole point, or else I could just have used the Redirect
> directive.


I am not following you. I am just saying that
RewriteRule x.foo y.bar
does the same thing as
RewriteRule x.foo y.bar?%{QUERY_STRING}

gnari




Gunnar Hjalmarsson

2004-07-06, 3:59 pm

gnari wrote:
> Gunnar Hjalmarsson wrote:
>
> [about using mod_rewrite to change extension/server
>
>
> I am not following you. I am just saying that
> RewriteRule x.foo y.bar
> does the same thing as
> RewriteRule x.foo y.bar?%{QUERY_STRING}


Hmm.. I tested it, and you are right. Don't know why I thought otherwise.

Thanks for this very OT piece of info. :)

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
Purl Gurl

2004-07-06, 8:57 pm

d0x wrote:

> can someone help me out with a regular expression that can match


> *.htm?pkg=1&foo=bar but at the same time NOT match the following *.html



Are you sure a query string arrives with a request for an .htm file?

This indicates your .htm files are executables. Perhaps you are
passing along your query string to a server side include call?
Yours are most unusual parameters.


RewriteEngine On (if needed)

RewriteCond %{REQUEST_URI} ^.*specific_page\.htm$
RewriteCond %{QUERY_STRING} ^pkg=1&foo=bar$
RewriteRule ^.*$ http://some.server/path/specific_page.htm?%{QUERY_STRING}


Condition one - must match your specific page.

AND

Condition two - must match your specific query string.


Some notes on this. URL encoding might present periodic problems.
Consider adding a specific page title to narrow this down to only
a selected file or selected files. Efficiency will be better by
dropping this into an .htaccess at a directory level rather than
in your httpd.conf file.

You will find usage of verbose conditions and rules to be beneficial;
there is less chance of regex error.


Purl Gurl
Andrew Palmer

2004-08-01, 8:55 pm

Purl Gurl <purlgurl@purlgurl.net> wrote in message news:<40EB0794.67F9CF11@purlgurl.net>...
> d0x wrote:
>
>
>
>
> Are you sure a query string arrives with a request for an .htm file?


This actually isn't all that unusual. A lot of large sites use links
with query strings attatched to URLs with .htm or .html extensions.

wired.com
about.com
cnn.com
nytimes.com
washingtonpost.com
cnet.com

I'm sure I could find others.

>
> This indicates your .htm files are executables. Perhaps you are
> passing along your query string to a server side include call?


Nothing about a URL necessarily indicates how the request is processed
by the web server.

> Yours are most unusual parameters.


I suppose that's somewhat true, but unusual query string paramaters
are also not that unusual ;)

>
>
> RewriteEngine On (if needed)
>
> RewriteCond %{REQUEST_URI} ^.*specific_page\.htm$
> RewriteCond %{QUERY_STRING} ^pkg=1&foo=bar$
> RewriteRule ^.*$ http://some.server/path/specific_page.htm?%{QUERY_STRING}


Like others pointed out, ?%{QUERY_STRING} is unnecessary in the
RewriteRule

>
>
> Condition one - must match your specific page.
>
> AND
>
> Condition two - must match your specific query string.
>
>
> Some notes on this. URL encoding might present periodic problems.


Yes, it's true the %{QUERY_STRING} is never decoded by Apache, while
other pieces of the URL are. However, this is the desired behavior,
since a decoded query string cannot be parsed into paramaters by a CGI
script. It's not that confusing if you think about it.

> Consider adding a specific page title to narrow this down to only
> a selected file or selected files.


"page title" ...???

> Efficiency will be better by
> dropping this into an .htaccess at a directory level rather than
> in your httpd.conf file.


No. This is wrong. The opposite is true.

>
> You will find usage of verbose conditions and rules to be beneficial;
> there is less chance of regex error.


Do whatever is clear.
Purl Gurl

2004-08-03, 9:02 am

Andrew Palmer wrote:

> Purl Gurl wrote:
[color=darkred]
[color=darkred]
[color=darkred]
[color=darkred]
> This actually isn't all that unusual. A lot of large sites use links
> with query strings attatched to URLs with .htm or .html extensions.


An html page is a static page. Use of a query string directly
indicates processing. Mixing static pages with processed pages
is unusual, save for Server Side Includes, which do not usually
rely on a query string.

Use of a query string with a static page is unusual.


[color=darkred]
> Nothing about a URL necessarily indicates how the request is processed
> by the web server.


That is untrue. Use of a query string directly indicates
processing is taking place.


[color=darkred]
[color=darkred]
> Like others pointed out, ?%{QUERY_STRING} is unnecessary in the
> RewriteRule


That is untrue. The originating author specifically requests
how to rewrite parse for a query string.


> It's not that confusing if you think about it.


Reads to me you are very .


[color=darkred]
> No. This is wrong. The opposite is true.


That is untrue. Directives in an httpd.conf file are processed
for every transaction. Directives in a directory level htaccess
file are only processed when encountered by specific URL address.


Purl Gurl
Matt Garrish

2004-08-03, 9:02 am


"Purl Gurl" <purlgurl@purlgurl.net> wrote in message
news:410E449B.11C46A1B@purlgurl.net...
> Andrew Palmer wrote:
>
>
>
>
> That is untrue. Use of a query string directly indicates
> processing is taking place.
>


That is untrue. Client-side javascript can make use of the query string, so
there's no reason to assume that an .htm extension with a query string
indicates that any processing is being done on the server.

Matt


Andrew Palmer

2004-08-03, 9:02 am


"Purl Gurl" <purlgurl@purlgurl.net> wrote in message
news:410E449B.11C46A1B@purlgurl.net...
> Andrew Palmer wrote:
>
>
>
*.html[color=darkred]
>
>
>
> An html page is a static page.


Html pages can be generated dynamically even if there's a visible .html
extension. There's no way to tell by looking at a URL.

> Use of a query string directly
> indicates processing.


Every URL is "processed" in some way by the server. Query strings may be
handled in a variety of ways or not at all. They *usually* (not "directly")
indicate some additional processing by a CGI script, etc., but not
necessarily.

> Mixing static pages with processed pages
> is unusual, save for Server Side Includes, which do not usually
> rely on a query string.


I guess in some way, any dynamic page can be considered part dynamic and
part static, since there's almost always something that stays the same on
each run. I would consider SSI and PHP pages dynamic, or "processed"


> Use of a query string with a static page is unusual.


I agree this is unusual, but not unheard of. Some sites tack on a query
string that does nothing (like "?nav=top") so that when they analyze their
logs they know if visitors clicked the link on the top menu, side menu, etc.



>
>
> That is untrue. Use of a query string directly indicates
> processing is taking place.


We're arguing a very fine point here, but what I said is not wrong. What you
are saying is usually true, but not necessarily.


>
http://some.server/path/specific_page.htm?%{QUERY_STRING}[color=darkred]
>
>
> That is untrue. The originating author specifically requests
> how to rewrite parse for a query string.


No. It doesn't matter. If there is a query string in the original request,
Apache will keep it after the redirect. You don't need to explicitly add it.
(Try it!)


>
>
>
> Reads to me you are very .


I humbly apologize. Thanks for keeping me straight.

>
>
>
> That is untrue. Directives in an httpd.conf file are processed
> for every transaction. Directives in a directory level htaccess
> file are only processed when encountered by specific URL address.


The main httpd.conf file is processed at server start-up, not with each
transaction. Efficiency will be best if you do not use per-directory config
files at all and turn them off. You can use put directives inside
<Directory></Directory> if they must apply only within a specific directory.

Efficiency is worse in any case, though. File name translation takes place
before per-directory .htaccess files are read. That means that if you put
Rewrite's in .htaccess files, apache must reverse the translation and go
from file name back to URL before it can process this contrived URL as if it
was the original request. None of this convoluded logic is necessary if you
just put RewriteRule's in the main configuration file. The mod_rewrite
documentation explains this more thoroughly.




gnari

2004-08-03, 9:02 am

"Purl Gurl" <purlgurl@purlgurl.net> wrote in message
news:410E449B.11C46A1B@purlgurl.net...
> Andrew Palmer wrote:
>
>
>
>
> That is untrue. Directives in an httpd.conf file are processed
> for every transaction. Directives in a directory level htaccess
> file are only processed when encountered by specific URL address.
>


this may possibly be true for global directives (not inside
a <Directory> or <Location> or such) IFF .htaccess file usage
is already enabled for that particular directory level. in general
..htaccess files should be avoided (and disabled) if this level of
efficiency is needed.

here is a discussion:

http://httpd.apache.org/docs/howto/htaccess.html

on the other hand, mod_rewrite (the original topic of this thread)
is amazingly efficient, and it's processing time is likely to
be dwarfed by other considerations in any CGI context.

gnari




Purl Gurl

2004-08-03, 9:02 am

Andrew Palmer wrote:

> Purl Gurl wrote:

(snipped)
[color=darkred]
[color=darkred]
> We're arguing a very fine point here,


Otherwords, you are nit-picking.


[color=darkred]
[color=darkred]
> No. It doesn't matter. If there is a query string in the original request,
> apache will keep it after the redirect. You don't need to explicitly add it.



"...a regular expression that can match *.htm?pkg=1&foo=bar
but at the same time NOT match the following *.html"


Read for comprehension.


Purl Gurl
Purl Gurl

2004-08-03, 9:02 am

gnari wrote:

> Purl Gurl wrote:
[color=darkred]
[color=darkred]
[color=darkred]
[color=darkred]
> this may possibly be true for global directives (not inside
> a <Directory> or <Location> or such)


It is not a matter of "may possibly be true...."

All directives contained in an httpd.conf file are
loaded upon apache boot and become a part of the
Apache operating environment.

Typically, apache parses an httpd.conf directive
for each parent directory and each child directory
and possibly a destination file, as well. Apache
must compare a directive to data encountered
at each step of transversal.

Directives contained outside an httpd.conf file
are not loaded upon apache boot. Directives within
an htaccess are processed only if encountered while
Apache follows a directory tree path.


Purl Gurl
Andrew Palmer

2004-08-04, 3:55 am


"Purl Gurl" <purlgurl@purlgurl.net> wrote in message
news:410EA1D7.265CCEEF@purlgurl.net...
> Andrew Palmer wrote:
>
>
> (snipped)
>
>
>
> Otherwords, you are nit-picking.
>
>
>
>
request,[color=darkred]
it.[color=darkred]
>
>
> "...a regular expression that can match *.htm?pkg=1&foo=bar
> but at the same time NOT match the following *.html"
>
>
> Read for comprehension.


Adding "?%{QUERY_STRING}" to a RewriteRule is not necessary in any event.
This was already covered in this thread.


>
>
> Purl Gurl




Joe Smith

2004-08-04, 8:59 am

Purl Gurl wrote:

> Andrew Palmer wrote:
>
> An html page is a static page.


Not necessarily. I've had .htaccess settings that make "index.html" be
processed for server-side includes. Putting <!--#exec cgi="foo.pl">
inside this index.html page allows it to process any query string that
was included in the URL.
-Joe
Joe Smith

2004-08-04, 8:59 am

Purl Gurl wrote:

> Directives contained outside an httpd.conf file
> are not loaded upon apache boot.


Correct. Directives in .htaccess files cause the server
to do more I/O than it would have if all the directories
were in the http.conf file. This additional I/O is one
reason why people say that using .htaccess is less efficient
than putting everything in the conf file.
-Joe
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com