Posted 18 Nov 2004 by herman
By popular demand, a sixth homework, due 8 December in class.
This homework is fairly simple: the goal is to measure some network delays and report on the results.
Report your results on Paper
Turn in the homework on the 8th of December in class (if not before that time). Your homework will show graphs of what you measured, explanations of the details in how you got your measurements, and conclusions. The graphs can easily be done by hand, or by a program if you prefer.
How to Measure
There are any number of ways to measure network delay, including the ping command. For convenience, I've written a simple Python program that looks for the "robot exclusion" file on a website. The code is simple: robot.py. It uses the Python robotparser module, available as part of Python version 2.3 and many other versions (it works on department Linux machines). The program measures the number of seconds to send a request to a web site and get a response. You can use this program as a basis for your homework.
What to Measure
Choose some website that probably other students won't use, preferably somewhere outside of Iowa, so you can observe significant delay. Please don't overload the website you test, as this could generate complaints. If you'd like to test more than one website, that's fine, but not really necessary.
At any particular time of day, the delay will vary. The very first measurement can be erroneous because it could be that DNS lookup is included in the delay -- but after one DNS lookup, the IP address will be in cache.
Once you measure, you should make sure you get enough samples to have good statistical significance (say, 99% confidence in your results).
But, measuring at one time on one day won't really show how the delay actually varies. Yes, you should get multiple samples, to get a solid "data point" for that measurement, but you should also try the same thing at different times of day, and on different days.
Processing the Data
Here is an example of a program that samples, computes mean and standard deviation (but doesn't do any of the statistical confidence interval calculations): elapse.py. This program just measures how long the computer takes for one million iterations of a loop; multiple measurements are recorded for statistics. Rather than directly calculate mean and standard deviation, the program uses a statistics package: it requires pstat.py, io.py, and stats.py. After all these files are in the same directory, then the elapse.py program should run correctly.
It's nice also to see the distribution of the samples. This can be done by recording the samples and using some graphing program (or even Excel). I've written a version which does graph the result, using gnuplot (this will work in the Linux lab, but maybe not on other Unix machines): elapseGraph.py.
What to Conclude
You should be able to report on what is the average delay (using the appropriate statistical precision), how you calculated it -- print any programs used.
You should also be able to conjecture what kind of distribution the delay values have. You might be able to speculate on how the delay varies over the time scale of hours or days, based on your data. (For such conclusions, you should have some reasonable evidence.)