During my Master of Science, I benchmarked a number of web servers. I learned the hard way a few things that must and must not be done, and decided to share them to help whoever might need it.
The result of my experiments is summarized a technical report.
There is a lot of documentation scattered over the Internet. You will find a lot of (hopefully helpful) links in the rest of this document. The Linux HTTP Benchmarking HOWTO is good reference to start with.
First of all, it is very important to decide what you want to benchmark. Unless you do this very precisely, it is pointless to seek technical solutions.
In my case, I wanted to compare a number of concurrency implementations. Benchmarking web servers was only a means to have a realistic application using those libraries. Keep in mind that, although I benchmarked realistic applications, I did not reproduce realistic load conditions. All I was interested in was to see how well the concurrency was handled, so I wrote very naive web servers, spawning a thread (or forking) on every incoming connection, and used the number of concurrent requests as the sole parameter of my study.
If you want to know whether a given web server is efficient, this is most certainly not what you want to do. A realistic load implies some number a requests per second, with potential burst, not a constant number of concurrent connections. But you should consider reading the rest of this document anyway, since some tricks are common to both situations.
You also need to decide whether you want to benchmark static or dynamic content.
If you want to produce a simple benchmark:
If you want a realistic load:
Some tips I wish I had known when I started.
Here is a checklist of things easily forgotten:
You also need to tune the client: the same advices apply. Do not forget to use a client faster than your server (or to use several client simultaneously) and to link them through a dedicated switch to ensure the bottleneck does not lie in the network.
It very much depends on the tools you use. I used the -g option of Apache Bench to get a gnuplot file I analyzed with R. I made some scripts and graphics available, it might help you to get started. If you are totally lost with R, you might find Vincent Zoonekynd’s site useful (I learned almost everything I know about R there). Since processing many data with R can take a lot of time (especially when you do things as naively as I did), there is first step where I process every information I need and dump it in .Rdata files, and a second step where I use those files to plot the results (that way, you can tweak the graphs without recomputing everything).
The first and last n requests (with n = your concurrency level) are often not significant and might be safely discarded. See the technical report and the graphs for more details.
Dariusz Panasiuk suggests:
In my tests I have used "wget" with "-p" option to fetch all images. I
started multiple wgets from if loop and used my firefox browsing history.
Curl is better to report time, but I can't find how to tell it to fetch all
page, and not only index.html
curl -s -w "%{time_total}\n" -o /dev/null -m 5 --url "http://www.bbc.co.uk" \
-x proxy.local:8080
For more advanced tests I have used Polymix
http://www.web-polygraph.org/docs/workloads/polymix-3/