allow filenames with ‘?’ in apache, or retiring a forum

After I retired a blog, I continued with a forum. Unexpectedly, this step consists of two tasks:
1) convert phpBB2 forum to static pages
2) retain the old URLs

The first task was easy, all I needed is to carefully to change the templates and the source code to remove all the dynamic elements from the forum pages. Then mirrored:

wget -r -np -m -p --wait=5 -o log http://mysite/forum2/

I checked the "log" file to make sure that all the pages downloaded ok:

grep '^   ' log  | grep -v '     0K'

Then I manually looked at the set of the downloaded files. I made several edit-mirror-check cycles before the result was ok.

After I uploaded the static copy, I got a nasty surprise. The names of files are:


The web server (apache) can't deliver such files because of the question sigh, which is reserved to separate the file name and the query. How to stop this correct behaviour?

The task is challenging. My answer is:

* Use mod_rewrite to change the question sign to the underscore sign
* Correspondingly, rename the files: "viewforum.php?f=1" to "viewforum.php_f=1", "viewtopic.php?t=378&start=15" to "viewtopic.php?t=378&start=15" etc.

The .htaccess is:

RewriteEngine On
RewriteCond %{ENV:REDIRECT_STATUS} =""
RewriteCond %{QUERY_STRING} !=""
RewriteRule ^(.*)$ $1_%{QUERY_STRING} [L]

ForceType "text/html; charset=iso-8859-1"

Everything is obvious except the line 2. After applying the rewrite rule, the sever repeats the rewriting process and can fall into a loop. The REDIRECT_STATUS-line detects the second pass and prevents the mod_rewrite from looping.

It was hard to construct this line. The documentation on mod_rewrite does not describe "REDIRECT_STATUS". Thanks log level 9 for explaining the problem, Google for relevant search and Internet for storing knowledge.

As for rewriting the file names, different approaches are possible. I used Perl.

find forum2/ -name '*\?*' >flist
cat flist | perl -nle '$a=$_; s/\?/_/; rename $a,$_;'