Aliases and redirects are a daily request that fulfill the demand to forward traffic from a short, friendly web address to a long, unfriendly web address (e.g. http://domain.com/apply forwards traffic to https://domain.com/application/apply/2012/). We manage aliases and redirects through a plain-text file with each line containing comments or a alias/redirect entry, and we're nearing the 3k lines mark.
I am looking for feedback on how other universities manage aliases/redirects to see if there is a better method especially considering the adoption of university branded URL shorteners (e.g. go.acu.edu). Please let me know how your university manages aliases and redirects.
Previous discussions on redirects:
I'd also be interested in knowing about this. Our alias file has become quite long over the years.
We use a simple mysql table that holds the aliases and redirect url. On our 404 page we check the url that and see if its in the database and then redirect over and log each redirect so we can do simple analytics. There is also a minimal admin interface to add/modify/delete redirects.
I recently rewrote the application which handled our redirections. Our old version was a CodeIgniter app which relied on a MySQL database. It functioned similarly to the one that Tryon described. The problem was that our MySQL was hosted on a finicky machine. Whenever the machine or MySQL went down, so did all of our 3000+ redirects. In the process of finding a better solution, I did a decent amount of research and testing.
My first inclination was to use .htaccess to redirect requests. This works well if you only have several hundred redirects. However, once you go over about 500, it starts slowing down. I imported all 3200 redirects into an .htaccess file and did some testing. It took .htaccess 5 milliseconds to process a request. Our old CodeIgniter/MySQL solution took about 11 milliseconds, so initially I was excited.
However, after some additional research, I decided against using htaccess. Apache accesses the htaccess file for every request made. Although our old application took twice as long to process a redirect, it only did so when handling a 404. The htaccess method would process every time any page or file was requested. This would have tied up significantly more resources on our server, even if there was no noticeable difference to the user (6 milliseconds).
The method which I landed on was a multidimensional php array, which stored each redirect, its target, and whether it was a permanent redirect in a sub-array. A php script would parse through the array trying to match the requested URL. In affect, it replaced the MySQL db from the old application with an array. Matching an array key instead of running a MySQL query was significantly faster. The new version takes approximately .7 milliseconds. I set it up so that the array is built out from our CMS, so there's no need to edit the actual php file. It's very easy to manage.
So overall, here are my findings:
The one downside of my script is that it doesn't keep any analytics data. This could easily be remedied, I just haven't gotten to it yet. The upside is that I'm no longer dependent on MySQL.
Did you benchmark the results in the case where you take a similar .htaccess file; however, it is included on start instead of on request? We use this method where we have to restart the apache service to load changes to the redirects file.
What was your process for capturing the results? I can do a Network Capture to watch the responses but I want to try to follow your same testing methodology.
I'm guessing on these, numbers, but I believe an empty htaccess file took about .5 milliseconds for the request (compared to 5 milliseconds for 3000 redirects). The way ours is set up, we don't have to restart apache.
My benchmarking was fairly rudimentary. I believe I repeated a curl_exec() request 1000 times, measuring the response time for each request. I averaged these values. I'm sure there are better ways to do this. I'm still learning php.
I'm going to use Apache Benchmark to do some tests today. I'll report the results when I'm finished.
The attached image shows the results from performance testing. The tests involved a system operating on Linux 2.6.18, Apache 2.2.3, 8GB RAM to serve as a the "host" and another system Mac OS X 10.7.3, 8GB RAM to serve as the "tester."
The tests were completed using ApacheBench. Each test involved submitting 10 requests, one at a time to the designated URI. After each test the results were recorded, the environment updated for the next test case, and a restart was performed.
Let me know if you have any questions on how testing was done.
Nicely done. I'm confused about what the graph means by "PHP" vs "Include." Could you explain?
"PHP" tests followed this example: https://gist.github.com/0afb75ef9b50d00e690d
"Include" tests involved adding "Include /path/to/configuration/file.conf" to the Apache httpd.conf file. This file is essentially identical to the .htaccess file except how Apache interacts with the file. Included files are loaded once and "remembered forever."
Interesting. This seems to contradict my findings. With the include method, would you need to restart apache every time you made an update to the conf file? We create new redirects every few days, so having to restart apache each time would be a pain.
We had the same concern about restarting apache being a pain; however, it really hasn't been that bad of an experience. The problem that we're facing is the management of all the redirects.
Removing duplicates, expiring unused redirects, tracking redirects, etc.