1. The point of the first homework is to realize that abstract ideas about measuring networks (including queuing theory and distributions of service and arrival times) can be measured on the internet.
  2. The point of the second homework is to understand that application protocols depend on layers of information, usually encoded by text, at the beginning of messages. This is seen for a HTTP Proxy in the header lines (like "Content-Type: text/html" and so forth). But many students opted to look at transformative proxies, which really only look inside at the HTML, and use prebuilt proxy frameworks, so the major point of the second homework was missed in many cases.
  3. The point of the third homework is to see, from evidence in logs of actual IP packets, that TCP segments are a mix of different information found in headers. Programs like wireshark make it relatively easy to go through such packets and investigate what the headers contain.
  4. The point of the fourth homework is to learn in more detail how sockets work (the earlier proxy homework may not have used sockets directly). Also, some level of concurrency and dynamic event handling would be needed to do all the work for this assignment. But a few students managed to find existing chat programs and skipped over the intended level of understanding.
  5. The point of the fifth homework is to get yet more detail on how sockets and the socket APIs work. Also, the homework emphasizes header processing and performance issues in switching traffic.
  6. The sixth and final homework is a simple use of encryption (no programming needed, just using commands). It demonstrates several ways to use encryption.


Sixth Homework due 9 May

This is a simple homework using OpenSSL to transform data. The homework is this: download these three files:

  1. privkey.pem

  2. hidden.txt

  3. rsaaeskey.txt

The file hidden.txt represents an encrypted message, using a randomly generated key, with the "aes-256-cfb8" method (just a command option in OpenSSL). This random aes key has been encrypted using an RSA public key (of 8192 bits), then encoded using Base64 encoding -- that is attachment rsaaeskey.txt. Once you decrypt rsaaeskey.txt, you can then decrypt hidden.txt (after decoding it from Base64 first). The decrypted contents of hidden.txt asks you a networking question and gives further instructions. Briefly, the further instructions ask that you make a text file with an answer, encrypt that file and Base64-encode it using "rc4" encryption, and put the encrypted answer and your encrypted rc4-key into the dropbox.

All of the assignment is intended to be done using openssl commands. It will help to make a checklist of all the steps, consult the script explained in OpenSSL, and see what needs to be modified from that script.


Fifth Homework due 30 April

The homework is to construct a virtual TCP switch. The switch can be thought of as a box with k input ports and m output ports. The numbers (k,m) aren't hard-coded into the virtual switch, but are command-line parameters. Once the virtual switch is running, input arrives on any of the input ports (often, input bytes are arriving concurrently on all of the input ports). Each port is a TCP connection, and each arriving message has the following format:

output port

payload

newline

The output port part of the message is four bytes (a character string), which specifies which output port of the virtual switch this message should be sent to. The newline of the message is the single character \n. The payload part of the message contains at least one byte, and possibly a much larger number of bytes, but does not contain the \n byte. An example of a message could be:

6075

continuing the results from the previous quarter, the commodity prices are trending

\n

Testing the Virtual Switch

One problem in such a homework is, how can you test the code? Below, two python programs are given to test the code and show the behavior of a sample solution.

harness is a "testing harness" for a virtual switch. It is a Python 2.7 program (you will need to have some version of Python 2 running for this to work). The syntax for the harness program is shown by example:

This example starts the harness program with three input ports and two output ports. The harness program will randomly send data on two connections through ports 7001 and 7002. Each message sent on these two ports will randomly have either 6001, 6002, or 6003 in the first four bytes. The harness program will also expect to have three clients connections, one on each of 6001-6003.

Now let's say that the virtual switch program is named portswitch and that it uses similar command syntax. Then, with harness running in one window (console command prompt), we start portswitch in another window:

The portswitch program will first create sockets listening on 7001 and 7002, then create TCP connections to ports 6001-6003. After doing this, what happens depends on what is received. If portswitch gets a message that begins with the characters '6002', then that message will be sent out on the connection to port 6002.

You can see from the example above that the "input" communication to harness is the "output" from portswitch, and vice versa. This is initially somewhat confusing, but becomes normal after using it a few times. The harness program is an example of a load generator, which simulates traffic into a switch and also validates (plus measuring performance) the output from the switch.

Extended Port Parameters

It's not required for this homework, but also part of the harness that it allows for the two parts, harness and portswitch, to be on different computers. Here's an example showing option syntax for having the harness on one machine, with IP address 109.207.21.35, and portswitch on another machine at 109.207.21.92:

Needless to say, this is much to type when testing a program. Most likely, one would want to put these longer commands into script or batch files to run them.

Message Size

Harness can also have another, optional parameter: size=n specifies the maximum size of a generated message. Of course, a properly written portswitch should be able to handle any message size, and doesn't need to be told what is the maximum message size. But for testing, it may be nice to start with short messages: size=16 would limit messages to be 16 bytes or less.

harness program

Writing software in networking, like many situations in modern systems, relies on knowing the specifications rather than looking at source code. For example, when you write a program (like an HTTP proxy) that interacts with a web site, you don't get to look at, much less modify, the code of the web server. Similarly, you won't get to look at the source code of the harness program. This is a bit of inconvenience not only for you, but also in the distribution of harness for this homework. To make executable code apart from the python source code is difficult. We will need to make a number of harness executable programs available for you to use, one for each kind of system and version of Python that is installed.

To download the harness program, visit Harness and Portswitch


Considerations on Socket Programming


Fourth Homework due 4 April

Due 4 April

The assignment is simple: write a chat program that allows at least two concurrent chat sessions. The application program is, for all the clients, telnet (or equivalently, the telnet.py program shown in Socket Concurrency). Your program has four ports: 9001, 9002, 9003, and 9004 (but, if another student happens to be using these ports on the same computer, you will need to change the port numbers to something else). Each chat session is first-come first-connect. For instance, if the first two clients connect on 9001 and 9002, then they will chat with each other. If two clients attempt to connect on 9001, they won't chat with each other. It's OK if clients A and B connect on 9001, then C and D connect on 9004 -- then there would be two chat sessions, say A--C and B--D. The only requirement is that clients connected to the same port are not in the same chat session. Your program only has to support pairwise chat; don't implement three-party chat or more. The behavior of chat should be the obvious: what one client types in the chat window will show up on the other client's window. Also, it is helpful to have some convention on ending a chat session, by typing "stop".

Evaluation

How will your submission be evaluated? There are several criteria:

Please submit your work to the Homework 4 Dropbox before Friday 5 April

You can submit a program or a compressed folder. If you have some special startup conditions (like command-line parameters) then please include some readme or documentation that explains.


Third Homework due Thursday 14 March

This is a "warm up" homework before Spring Break, to start becoming familiar with WireShark. The homework asks you to very briefly analyze a log of a telnet session using wireshark, write down answers on paper, and turn in these answers for class on the 14th. Here's what you need in order to do the assignment:

  1. You will need to use wireshark. Either you download it and install it on your own computer, or use the wireshark that the CS Department has installed on the Linux machines. (Unfortunately, the Computer Support Group did not install wireshark on Windows, though it is a free package.) For instance, you can use the NoMachine Client (see DIVMS instructions and you can also get there from a department Windows desktop). Once you're logged in on Linux, open a command shell and type in the command "wireshark" to start the analyzer.

  2. Download telnet-raw.pcap from http://wiki.wireshark.org/SampleCaptures#Telnet -- this is the telnet session you will analyze. You can open this file from wireshark.

After you are able to successfully launch wireshark and you have the file to analyze, answer the following questions (on paper):


Second Homework due Tuesday 19 February and Tuesday 5 March

Questions -- the first is a student question:

  1. I found a site with a web proxy that does almost what I want. Can I just use this and change it? Answer: Yes, please do use the best existing web proxy code you can find. You need to indicate with comments, rather clearly, what things you copied and what part is yours. If there's doubt about this, please ask. Best is that you somehow factor the code to call new functions or methods that you have added, which makes it easier to see the differences.

  2. What if the proxy works for me (on my home computer and LAN) but doesn't work when you try to run it? Answer: Good question, and difficult to answer. We need to reproduce your results, and if the setup you have used is too specialized, we can't do this. Best case is that you have enough documentation and any extra packages or libraries used are already on one of the department's machines. Worse case is that you will need to schedule an appointment with the TA or Professor to demonstrate, on a departmental machine, that your proxy does what you claim it does. Please keep this in mind as you turn in your homework.

  3. My proxy works, but only on some sites. I think it may be because of UTF character sets instead of ASCII. Answer: I attempted to look at this in class, but it didn't work out. One thing I found in researching this problem is that some of the simple Python proxies use Python's urllib2 module. There are more advanced replacements, such as the requests module (which would need to be installed in order to use it). Best advice for this assignment is just to avoid troublesome web sites and document the cases where your proxy works, so that we use these sites in evaluating your work.


Feedback on 19 February Plans

Here is a rough breakdown of what students submitted as plans for HTTP proxies:

Languages that students want to use are Java, C#, C, C++, and Python

The plans can be divided into two groups, those that look at the content of a page, and those that don't. Blocking a web site, keeping statistics, substituting for an image --- these can be done without having to look at the body of a request or response. But, to substitute for some text, or to change a link, this requires searching through and/or modifying HTML. Perfection in parsing the body of a response is challenging because the body can be a mix of HTML, Javascript, include statements, and so on. Sometimes the body can be a binary image. Other times, the body may be a CSS file. Another thing to understand is that, if the request uses the keep-alive, then there can be multiple requests and responses in one TCP connection.

After reading this warning about the complexity of the body, you should be a bit careful in your ambition about searching and modifying the body of a response. First, only try this for responses that have the correct content-type in them: look at some actual responses first by running a proxy and printing the reponse headers, to get some feel for what pages have in them. Second, either do something quite simple (so that the searching and replacing isn't too ambitious) or look for an HTML parser. There are HTML parsers written in all major programming languages, but if you're not comfortable using a something new to you (including downloading it and perhaps compiling it, learning how it works so you can use it) -- then probably you should simplify your goals. Some things aren't too difficult. Finding the img tags in HTML isn't too hard; finding the a (link) tags is also not too hard.

Neglected in most of the plans is the input/specification or output description. In the case of a blocked site, what will the user see? Would the proxy keep a record (log) of the blocked content? In the case of statistics, how will the statistics be portrayed? How will someone get to see the statistics?

Other considerations are whether or not you plan to deal with SSL (HTTPS, the encrypted version of HTTP). Since HTTPS uses alternate ports, the task of a proxy becomes complicated for HTTPS websites. If this is your goal, perhaps its better to start with an existing proxy (which can be helpful in any case to get you started) that has HTTPS support. At the bottom of the HTTP Proxy page there is a pointer to a list of some Python and Java proxies. You can also find HTTP proxies in C, C#, C++ using a search engine. Also, please be aware that many large media websites trigger many requests (usually from javascript programs in the browser) to facebook, to google, and many advertising companies. All of these will go through your proxy.


Your assignment will be to implement an HTTP proxy that does something "interesting" (where you can choose what is interesting). There are many, diverse applications for HTTP proxies:

But for this assignment, you need to carefully choose a quite limited function for an HTTP proxy, something that can be finished in about 8-12 hours of work overall. That work will include researching what kind of implementation you would like to use (C, Java, Python, framework, and possibly installing assisting libraries or modules that you will use).

Due 19 February, on paper, in class

A brief description about what you plan to do for your proxy. Your description might say what you hope to do, how you will do it (choice of language and tools), how you will test and validate -- do you have just a few websites in mind or something more ambitious. Overall, this shouldn't be more than a page, and could be around 250 words or ten sentences at minimum.

Due 5 March, by ICON dropbox

The dropbox is now available to deposit your proxy and instructions on how to use it, explanation of what works and what doesn't, what sites to try, and if there are statistics or other features, samples and/or further information that will help us evaluate the work. Remember to distinguish what is your work from what was given in any framework or existing proxy that you copied and modified.

Evaluation

To evaluate what you submit for the software, the TA will need to test your proxy. You will need to explain what is needed to test it, including the commands to launch your proxy, what needs to be done on the browser side (if it only works for one browser, say what kind), what libraries, jar files, or whatever is needed to get your proxy program to compile and run. If you copied much of the proxy code from some other source, please indicate clearly which part is your contribution. If your proxy fails, you might also say what test cases work and which do not work, so that the TA will be able to verify things. In sum, please help the TA to look at what you have done so that we know what to evaluate.

The score for the homework will depend on these factors:

Please ask questions if these criteria are not sufficiently clear.

Tips and Tricks

The implementation advice for your HTTP proxy depends upon the choice of language, libraries or other tools you will be using. You might need to know about port re-use if your proxy crashes during testing. While it's easy to use extra libraries with Java by inserting a jar file into the classpath, things are trickier in Python, especially if you are using the department's machines. As questions are resolved, more text will be added to this part of the homework description (check back later).


First Homework due Thursday 7 February 2013 in Class, on Paper

Your assignment is to record statistics and show frequency distributions on two network timings for multiple internet sites. Turn in the plots that show that statistics, turn in printed copies of programs that you wrote to do the assignment, plus any explanation of procedures used to produce your statistics. For this assignment, you should test at least three different sites, one in North America, one in Asia, and one in Europe. The timing tests should include at minimum three hundred trials for each site, space apart in time so that no pair of trials occur within 59 seconds of each other. A high-grade assignment will draw conclusions about the distributions of times recorded, and any plausible inferences about the difference between round-trip time and web server request times.

ping statistics

The ping command on the department linux systems can be used to measure round-trip time between sending a low-level packet (datagram) to a site and getting a response. Here are examples:

[herman@serv15 ~]$ ping -c 1 www.google.com
PING www.google.com (173.194.64.104) 56(84) bytes of data.
64 bytes from oa-in-f104.1e100.net (173.194.64.104): icmp_req=1 ttl=50 time=19.8 ms
--- www.google.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 19.809/19.809/19.809/0.000 ms

[herman@serv15 ~]$ ping -c 1 199.181.132.250
PING 199.181.132.250 (199.181.132.250) 56(84) bytes of data.
64 bytes from 199.181.132.250: icmp_req=1 ttl=241 time=50.6 ms
--- 199.181.132.250 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 50.657/50.657/50.657/0.000 ms

The parameter -c 1 tells the ping command to try just one test. Without this parameter, ping will try testing indefinitely.

Notes:

fetching a web page

The other statistic to collect is the time it takes to send a request to a web server and get back the page requested. Your ping address (the site) and the web server should be the same, if you wish to make some inferences about the relation between these two statistics. How can you measure the time taken to send a request to a server? There are several linux utilities that may help.

recording your timings

There are at least two ways to record the timings from your tests.

pacing your experiments

The assignment asks that you do not have ping or web page fetching closer than 59 seconds. What that means is that your testing should pause a minute between tests. If you write something like a bash script to do the testing, that would imply putting in a sleep 60 command between tests; or if you invoke the commands from a Java or Python program, use some library to sleep between tests. In Python, it's like this:

import time
time.sleep(60)

Java has similar facilities.

analysis and presentation

Use whatever you like to analyze the results, plot graphs of the timings, and so on. Two things to be aware of are:

Time Series or Distribution?

With network statistics, there can be related questions about the meaning of the recorded timing data, what relation it has to ideal models (like exponential, poisson, uniform or normal distributions) and how the statistics evolve over time. This will be part of class discussion on the homework.

Homeworks (last edited 2014-05-24 21:42:58 by localhost)