- Sixth Homework due 9 May
- Fifth Homework due 30 April
- Fourth Homework due 4 April
- Third Homework due Thursday 14 March
- Second Homework due Tuesday 19 February and Tuesday 5 March
- First Homework due Thursday 7 February 2013 in Class, on Paper
- The point of the first homework is to realize that abstract ideas about measuring networks (including queuing theory and distributions of service and arrival times) can be measured on the internet.
- The point of the second homework is to understand that application protocols depend on layers of information, usually encoded by text, at the beginning of messages. This is seen for a HTTP Proxy in the header lines (like "Content-Type: text/html" and so forth). But many students opted to look at transformative proxies, which really only look inside at the HTML, and use prebuilt proxy frameworks, so the major point of the second homework was missed in many cases.
- The point of the third homework is to see, from evidence in logs of actual IP packets, that TCP segments are a mix of different information found in headers. Programs like wireshark make it relatively easy to go through such packets and investigate what the headers contain.
- The point of the fourth homework is to learn in more detail how sockets work (the earlier proxy homework may not have used sockets directly). Also, some level of concurrency and dynamic event handling would be needed to do all the work for this assignment. But a few students managed to find existing chat programs and skipped over the intended level of understanding.
- The point of the fifth homework is to get yet more detail on how sockets and the socket APIs work. Also, the homework emphasizes header processing and performance issues in switching traffic.
- The sixth and final homework is a simple use of encryption (no programming needed, just using commands). It demonstrates several ways to use encryption.
Sixth Homework due 9 May
This is a simple homework using OpenSSL to transform data. The homework is this: download these three files:
The file hidden.txt represents an encrypted message, using a randomly generated key, with the "aes-256-cfb8" method (just a command option in OpenSSL). This random aes key has been encrypted using an RSA public key (of 8192 bits), then encoded using Base64 encoding -- that is attachment rsaaeskey.txt. Once you decrypt rsaaeskey.txt, you can then decrypt hidden.txt (after decoding it from Base64 first). The decrypted contents of hidden.txt asks you a networking question and gives further instructions. Briefly, the further instructions ask that you make a text file with an answer, encrypt that file and Base64-encode it using "rc4" encryption, and put the encrypted answer and your encrypted rc4-key into the dropbox.
All of the assignment is intended to be done using openssl commands. It will help to make a checklist of all the steps, consult the script explained in OpenSSL, and see what needs to be modified from that script.
Fifth Homework due 30 April
The homework is to construct a virtual TCP switch. The switch can be thought of as a box with k input ports and m output ports. The numbers (k,m) aren't hard-coded into the virtual switch, but are command-line parameters. Once the virtual switch is running, input arrives on any of the input ports (often, input bytes are arriving concurrently on all of the input ports). Each port is a TCP connection, and each arriving message has the following format:
The output port part of the message is four bytes (a character string), which specifies which output port of the virtual switch this message should be sent to. The newline of the message is the single character \n. The payload part of the message contains at least one byte, and possibly a much larger number of bytes, but does not contain the \n byte. An example of a message could be:
continuing the results from the previous quarter, the commodity prices are trending
Testing the Virtual Switch
One problem in such a homework is, how can you test the code? Below, two python programs are given to test the code and show the behavior of a sample solution.
harness is a "testing harness" for a virtual switch. It is a Python 2.7 program (you will need to have some version of Python 2 running for this to work). The syntax for the harness program is shown by example:
> harness in=6001 in=6002 in=6003 out=7001 out=7002
This example starts the harness program with three input ports and two output ports. The harness program will randomly send data on two connections through ports 7001 and 7002. Each message sent on these two ports will randomly have either 6001, 6002, or 6003 in the first four bytes. The harness program will also expect to have three clients connections, one on each of 6001-6003.
Now let's say that the virtual switch program is named portswitch and that it uses similar command syntax. Then, with harness running in one window (console command prompt), we start portswitch in another window:
> portswitch in=7001 out=7002 out=6001 out=6002 out=6003
The portswitch program will first create sockets listening on 7001 and 7002, then create TCP connections to ports 6001-6003. After doing this, what happens depends on what is received. If portswitch gets a message that begins with the characters '6002', then that message will be sent out on the connection to port 6002.
You can see from the example above that the "input" communication to harness is the "output" from portswitch, and vice versa. This is initially somewhat confusing, but becomes normal after using it a few times. The harness program is an example of a load generator, which simulates traffic into a switch and also validates (plus measuring performance) the output from the switch.
Extended Port Parameters
It's not required for this homework, but also part of the harness that it allows for the two parts, harness and portswitch, to be on different computers. Here's an example showing option syntax for having the harness on one machine, with IP address 22.214.171.124, and portswitch on another machine at 126.96.36.199:
> harness in=188.8.131.52:6001 in=184.108.40.206:6002 out=220.127.116.11:7001 out=18.104.22.168:7002
> portswitch out=22.214.171.124:6001 out=126.96.36.199:6002 in=188.8.131.52:7001 in=184.108.40.206:7002
Needless to say, this is much to type when testing a program. Most likely, one would want to put these longer commands into script or batch files to run them.
Harness can also have another, optional parameter: size=n specifies the maximum size of a generated message. Of course, a properly written portswitch should be able to handle any message size, and doesn't need to be told what is the maximum message size. But for testing, it may be nice to start with short messages: size=16 would limit messages to be 16 bytes or less.
Writing software in networking, like many situations in modern systems, relies on knowing the specifications rather than looking at source code. For example, when you write a program (like an HTTP proxy) that interacts with a web site, you don't get to look at, much less modify, the code of the web server. Similarly, you won't get to look at the source code of the harness program. This is a bit of inconvenience not only for you, but also in the distribution of harness for this homework. To make executable code apart from the python source code is difficult. We will need to make a number of harness executable programs available for you to use, one for each kind of system and version of Python that is installed.
To download the harness program, visit Harness and Portswitch
Considerations on Socket Programming
- Remember that for a socket S, the primitive S.recv() has some perhaps unexpected behaviors:
- The call S.recv(256), or similar calls in C, C++, C#, or Java, will cause the program (or thread) to wait for bytes to arrive, so that something can be read. While a program (or thread) is waiting, it can't be doing other things.
The number 256 (which can be changed to any other positive number you like) is only the maximum number of bytes to be received. If in fact there are fewer bytes available, then only those bytes in TCP's current segment area are actually received by this call.
- And if there are more than 256 bytes in TCP's current segment area, the call will just receive 256 of them -- leaving the remainder of the bytes in TCP segment area for the next S.recv() call.
- If the call S.recv() returns no bytes, it means the connection got closed (from the other side, most probably).
- The call S.send(message) may cause the program (or thread) to wait, if there is no room available in TCP's outgoing queue area for socket S. Eventually, when the queue empties by sending a segment and getting an ack, the queue will open up so that the S.send() can finish. But, even when a call S.send(message) returns, it may not have taken all the bytes in message. In the worst case, send(message) may only take one byte (but it does always tell you how many it took). The result of S.send(message) is that some number of bytes are taken from message and queued in TCP's outgoing segment area for socket S.
For Java programmers, although socket programming has always been supported in the language, there are newer libraries that improve the support for socket programming. For instance http://netty.io could be one option; another is http://mina.apache.org. These libraries could be simpler to use than New I/O, which is Java's API for nonblocking I/O.
Fourth Homework due 4 April
Due 4 April
The assignment is simple: write a chat program that allows at least two concurrent chat sessions. The application program is, for all the clients, telnet (or equivalently, the telnet.py program shown in Socket Concurrency). Your program has four ports: 9001, 9002, 9003, and 9004 (but, if another student happens to be using these ports on the same computer, you will need to change the port numbers to something else). Each chat session is first-come first-connect. For instance, if the first two clients connect on 9001 and 9002, then they will chat with each other. If two clients attempt to connect on 9001, they won't chat with each other. It's OK if clients A and B connect on 9001, then C and D connect on 9004 -- then there would be two chat sessions, say A--C and B--D. The only requirement is that clients connected to the same port are not in the same chat session. Your program only has to support pairwise chat; don't implement three-party chat or more. The behavior of chat should be the obvious: what one client types in the chat window will show up on the other client's window. Also, it is helpful to have some convention on ending a chat session, by typing "stop".
How will your submission be evaluated? There are several criteria:
- Does it satisfy the basic goal of enabling concurrent chat? Can four windows of telnet be open and sending messages in arbitrary order, showing up on the other end instantly? The test will be having telnet instances connect first to 9001, second to 9002, third to 9003, and fourth to 9004. In this test, 9001 and 9002 are chatting, plus 9003 and 9004 are in another session.
(More Advanced) Can the order be mixed up? What about connecting first to 9004, then to 9002; third to 9001, fourth to 9003. Will this result in two concurrent chat sessions, one for (9004,9002) and the other for (9001,9003)?
Does the program need to have multiple telnet clients on a single port (like two or more using 9001)? No. The program doesn't need to have concurrent users on a single port. However, if two clients are on (9001,9002) in a chat session, then both quit, then later two others can connect on 9001 and 9002 and make a new chat session together.
- Documentation: if you manage to find some existing telnet-based chat somewhere on the internet and adapt that for this homework, will we be able to see what you have done for your own work? (This was a significant problem with Homework 2, where a few students failed to identify which parts they had done and which parts were just copied.) Also, please explain how telnet clients can gracefully quit the chat session, perhaps with a special command (like "stop" or some similar convention). What happens when one end quits and the other does not? Probably the best behavior is to close both sockets when one quits.
Please submit your work to the Homework 4 Dropbox before Friday 5 April
You can submit a program or a compressed folder. If you have some special startup conditions (like command-line parameters) then please include some readme or documentation that explains.
Third Homework due Thursday 14 March
This is a "warm up" homework before Spring Break, to start becoming familiar with WireShark. The homework asks you to very briefly analyze a log of a telnet session using wireshark, write down answers on paper, and turn in these answers for class on the 14th. Here's what you need in order to do the assignment:
You will need to use wireshark. Either you download it and install it on your own computer, or use the wireshark that the CS Department has installed on the Linux machines. (Unfortunately, the Computer Support Group did not install wireshark on Windows, though it is a free package.) For instance, you can use the NoMachine Client (see DIVMS instructions and you can also get there from a department Windows desktop). Once you're logged in on Linux, open a command shell and type in the command "wireshark" to start the analyzer.
Download telnet-raw.pcap from http://wiki.wireshark.org/SampleCaptures#Telnet -- this is the telnet session you will analyze. You can open this file from wireshark.
After you are able to successfully launch wireshark and you have the file to analyze, answer the following questions (on paper):
- Find line 61 and open up various fields within this event to answer the question: What is the RTT to ACK the segment?
- For lines 123 through 128, what are the window sizes?
Line 141 has a different color than other lines, because it is a so-called "TCP Keep-Alive" segment (this is wireshark terminology, not general TCP terminology). What does "TCP Keep-Alive" mean? Consult the documentation in WireShark Documentation to answer this (perhaps a search engine can help also).
Second Homework due Tuesday 19 February and Tuesday 5 March
Questions -- the first is a student question:
I found a site with a web proxy that does almost what I want. Can I just use this and change it? Answer: Yes, please do use the best existing web proxy code you can find. You need to indicate with comments, rather clearly, what things you copied and what part is yours. If there's doubt about this, please ask. Best is that you somehow factor the code to call new functions or methods that you have added, which makes it easier to see the differences.
What if the proxy works for me (on my home computer and LAN) but doesn't work when you try to run it? Answer: Good question, and difficult to answer. We need to reproduce your results, and if the setup you have used is too specialized, we can't do this. Best case is that you have enough documentation and any extra packages or libraries used are already on one of the department's machines. Worse case is that you will need to schedule an appointment with the TA or Professor to demonstrate, on a departmental machine, that your proxy does what you claim it does. Please keep this in mind as you turn in your homework.
My proxy works, but only on some sites. I think it may be because of UTF character sets instead of ASCII. Answer: I attempted to look at this in class, but it didn't work out. One thing I found in researching this problem is that some of the simple Python proxies use Python's urllib2 module. There are more advanced replacements, such as the requests module (which would need to be installed in order to use it). Best advice for this assignment is just to avoid troublesome web sites and document the cases where your proxy works, so that we use these sites in evaluating your work.
Feedback on 19 February Plans
Here is a rough breakdown of what students submitted as plans for HTTP proxies:
- 10 students would like to do something with images, either transforming, substituting, or blocking.
- 7 students are interested in doing some statistics with proxy traffic
- 7 students would like to write proxies that block sites or content
- 4 students had plans to transform the text of a page
- 3 students proposed caching proxies
Languages that students want to use are Java, C#, C, C++, and Python
After reading this warning about the complexity of the body, you should be a bit careful in your ambition about searching and modifying the body of a response. First, only try this for responses that have the correct content-type in them: look at some actual responses first by running a proxy and printing the reponse headers, to get some feel for what pages have in them. Second, either do something quite simple (so that the searching and replacing isn't too ambitious) or look for an HTML parser. There are HTML parsers written in all major programming languages, but if you're not comfortable using a something new to you (including downloading it and perhaps compiling it, learning how it works so you can use it) -- then probably you should simplify your goals. Some things aren't too difficult. Finding the img tags in HTML isn't too hard; finding the a (link) tags is also not too hard.
Neglected in most of the plans is the input/specification or output description. In the case of a blocked site, what will the user see? Would the proxy keep a record (log) of the blocked content? In the case of statistics, how will the statistics be portrayed? How will someone get to see the statistics?
Your assignment will be to implement an HTTP proxy that does something "interesting" (where you can choose what is interesting). There are many, diverse applications for HTTP proxies:
- filtering or firewalling between browsers and the internet; possibly even ad-blocking
- helping to debug and test browser and server development
hacking security, see Fiddler for example
- monitor web traffic, analyzing the statistics
- being a cache of web content, which can improve performance
supporting a Content delivery network
- emulating servers, enhancing browser behavior, tracking and profiling users
- substitution/censoring of images or other selected content
translating web page text, like Google Translate
goofy things, like web page transformations, such as dialect translation
But for this assignment, you need to carefully choose a quite limited function for an HTTP proxy, something that can be finished in about 8-12 hours of work overall. That work will include researching what kind of implementation you would like to use (C, Java, Python, framework, and possibly installing assisting libraries or modules that you will use).
Due 19 February, on paper, in class
A brief description about what you plan to do for your proxy. Your description might say what you hope to do, how you will do it (choice of language and tools), how you will test and validate -- do you have just a few websites in mind or something more ambitious. Overall, this shouldn't be more than a page, and could be around 250 words or ten sentences at minimum.
Due 5 March, by ICON dropbox
The dropbox is now available to deposit your proxy and instructions on how to use it, explanation of what works and what doesn't, what sites to try, and if there are statistics or other features, samples and/or further information that will help us evaluate the work. Remember to distinguish what is your work from what was given in any framework or existing proxy that you copied and modified.
To evaluate what you submit for the software, the TA will need to test your proxy. You will need to explain what is needed to test it, including the commands to launch your proxy, what needs to be done on the browser side (if it only works for one browser, say what kind), what libraries, jar files, or whatever is needed to get your proxy program to compile and run. If you copied much of the proxy code from some other source, please indicate clearly which part is your contribution. If your proxy fails, you might also say what test cases work and which do not work, so that the TA will be able to verify things. In sum, please help the TA to look at what you have done so that we know what to evaluate.
The score for the homework will depend on these factors:
- Did you submit a plan on 19 February? Does the plan relate well to the submission on 26 February?
- Does the homework exercise some knowledge of HTTP or TCP, which supplements concepts from Chapters 1-2 of the text?
- Is the submission well enough documented to easily get it working?
- Is the finished result interesting (particularly in comparison/contrast to what other students have done)?
Please ask questions if these criteria are not sufficiently clear.
Tips and Tricks
The implementation advice for your HTTP proxy depends upon the choice of language, libraries or other tools you will be using. You might need to know about port re-use if your proxy crashes during testing. While it's easy to use extra libraries with Java by inserting a jar file into the classpath, things are trickier in Python, especially if you are using the department's machines. As questions are resolved, more text will be added to this part of the homework description (check back later).
First Homework due Thursday 7 February 2013 in Class, on Paper
Your assignment is to record statistics and show frequency distributions on two network timings for multiple internet sites. Turn in the plots that show that statistics, turn in printed copies of programs that you wrote to do the assignment, plus any explanation of procedures used to produce your statistics. For this assignment, you should test at least three different sites, one in North America, one in Asia, and one in Europe. The timing tests should include at minimum three hundred trials for each site, space apart in time so that no pair of trials occur within 59 seconds of each other. A high-grade assignment will draw conclusions about the distributions of times recorded, and any plausible inferences about the difference between round-trip time and web server request times.
The ping command on the department linux systems can be used to measure round-trip time between sending a low-level packet (datagram) to a site and getting a response. Here are examples:
[herman@serv15 ~]$ ping -c 1 www.google.com PING www.google.com (220.127.116.11) 56(84) bytes of data. 64 bytes from oa-in-f104.1e100.net (18.104.22.168): icmp_req=1 ttl=50 time=19.8 ms --- www.google.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 19.809/19.809/19.809/0.000 ms [herman@serv15 ~]$ ping -c 1 22.214.171.124 PING 126.96.36.199 (188.8.131.52) 56(84) bytes of data. 64 bytes from 184.108.40.206: icmp_req=1 ttl=241 time=50.6 ms --- 220.127.116.11 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 50.657/50.657/50.657/0.000 ms
The parameter -c 1 tells the ping command to try just one test. Without this parameter, ping will try testing indefinitely.
Some internet sites have disabled ping. Here is an example:
[herman@serv15 ~]$ ping -c 1 www.nist.gov PING www.glb.nist.gov (18.104.22.168) 56(84) bytes of data.(this command got no response and eventually had to be killed).
- Sometimes, ping can fail because of various packet drops, congestion, links crashing, or other difficulties.
You can read the full documentation on ping using the man ping command; a search on "man ping" might bring you approximately the same information, though different systems have different variations of the ping command.
You might wish to learn about some background on what ping does: Ping (networking_utility) though it's not necessary for this assignment.
fetching a web page
The other statistic to collect is the time it takes to send a request to a web server and get back the page requested. Your ping address (the site) and the web server should be the same, if you wish to make some inferences about the relation between these two statistics. How can you measure the time taken to send a request to a server? There are several linux utilities that may help.
wget is a command to fetch one page and return it in a file (or even to standard output. Here's an example:
[herman@serv15 ~]$ wget http://www.uiowa.edu --2013-01-28 14:27:14-- http://www.uiowa.edu/ Resolving www.uiowa.edu... 22.214.171.124 Connecting to www.uiowa.edu|126.96.36.199|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 27566 (27K) [text/html] Saving to: "index.html" 100%[==============================================================================>] 27,566--.-K/s in 0s 2013-01-28 14:27:14 (85.7 MB/s) - "index.html." saved [27566/27566]
However, this command does not report timings. To get timings, one idea is to use the linux time command:
[herman@serv15 ~]$ time wget http://www.uiowa.edu --2013-01-28 14:29:48-- http://www.uiowa.edu/ Resolving www.uiowa.edu... 188.8.131.52 Connecting to www.uiowa.edu|184.108.40.206|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 27566 (27K) [text/html] Saving to: "index.html" 100%[==============================================================================>] 27,566--.-K/s in 0.001s 2013-01-28 14:29:48 (22.1 MB/s) - "index.html" saved [27566/27566] real 0m0.013s user 0m0.001s sys 0m0.002sThe last three lines here tell us that the wget command used 13 milliseconds (of which, 1 millisecond was user-space CPU time, and 2 milliseconds were system CPU overhead). We only care about the 13 millisecond time.
curl is another linux command to fetch a web page, showing it on the console. Since we don't need to see the text, this example just redirects output to the system Bit bucket:
[herman@serv15 ~]$ time curl www.uiowa.edu > /dev/null % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 27566 100 27566 0 0 3901k 0 --:--:-- --:--:-- --:--:-- 6729k real 0m0.010s user 0m0.002s sys 0m0.002sWe can see that this request took 10 milliseconds.
Write your own program to fetch a webpage. Here's an example in Python, but without any timing information:
import urllib2 # for Python2; it's different in Python3 U = urllib2.urlopen("http://www.uiowa.edu") S = U.read() U.close()Most other high-level languages also have libraries that make it easy to get a web page. As for timing, most languages also have library functions (in Python, it's the time module) that can be used to calculate the time between sending a request and getting the response.
recording your timings
There are at least two ways to record the timings from your tests.
Make a transcript of your console log. This is a bit tricky, though easy if you pay attention. In linux, there's a command script that starts recording all output to a file. To end the recording, use the exit command. Example:
[herman@serv15 ~]$ script myfile.txt Script started, file is myfile.txt bash-4.2$ ping -c 1 www.uiowa.edu PING www.uiowa.edu (220.127.116.11) 56(84) bytes of data. 64 bytes from www.uiowa.edu (18.104.22.168): icmp_req=1 ttl=252 time=0.348 ms --- www.uiowa.edu ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.348/0.348/0.348/0.000 ms bash-4.2$ exit exit Script done, file is myfile.txt [herman@serv15 ~]$After doing this, the file "myfile.txt" will have all the ping output. Then we can write some code to go through this output (if it has hundreds of pings) and scrape out the numbers we need.
Call ping (or any other command we would like) from a program. An example is available in Homework1.zip but it's easy to do this under linux in many languages (this example is purposefully incomplete; you get to have some fun in completing the assignment). A quick web search, for example, turned up these two links:
- There are two system interfaces for calling other commands like ping, one that invokes the command and has the output go to the console (standard output) and another that redirects the command output to a unix pipe, where it can be read and processed (scraped for numbers) by the calling program. So, this turns out to be another way to record the statistics: each command can be scraped and timed, recording the results directly rather than processing a script log.
pacing your experiments
The assignment asks that you do not have ping or web page fetching closer than 59 seconds. What that means is that your testing should pause a minute between tests. If you write something like a bash script to do the testing, that would imply putting in a sleep 60 command between tests; or if you invoke the commands from a Java or Python program, use some library to sleep between tests. In Python, it's like this:
import time time.sleep(60)
Java has similar facilities.
analysis and presentation
Use whatever you like to analyze the results, plot graphs of the timings, and so on. Two things to be aware of are:
- You may need to do some "binning" of the raw data, to lump it with sufficient granularity so that graphs show distributions.
- Similarly, you might need to adjust bins or arrange your data so that statistical distributions become evident. It's possible that nice normal distribution, or an exponential distribution, etc, might be obvious, though this is the exception with the small number of samples you get for this assignment. A quick web search finds these two links which offer advice:
Time Series or Distribution?
With network statistics, there can be related questions about the meaning of the recorded timing data, what relation it has to ideal models (like exponential, poisson, uniform or normal distributions) and how the statistics evolve over time. This will be part of class discussion on the homework.