How encryption is used in network protocols, following the textbook:
- authentication using cryptographic hash functions
Multi-factor authentication (not in textbook)
Secret sharing (not in textbook)
Cryptonomicon (bonus link, not in textbook)
The openssl command (see OpenSSL), some examples -- to prepare for the homework.
Skip to Chapter 8 on Security -- last topic of this course. Reading for the remainder of the semester is from Chapter 8, but only through the fourth section (skip the parts about securing email, PGP, SSL, IPsec, LANs and Firewalls).
Encryption is one of the Crown Jewels of Computer Science. Also, one of the earliest motivations for algorithms, along with counting, battle calculations, and manufacturing automation.
Lecture: look at simple ciphers DIY Cypto.
- ideal encrypted text: indistinguishable from noise (high entropy)
- ideal encryption algorithm: each input bit potentially changes every output bit
- encryption needs to be reversible (like XOR) so it can be decrypted
- should be hard to reverse engineer / break, even with cleartext available
- principles of encryption algorithms: scrambling, obfuscation, embedding
- minimal key size
- output same size as input (usually)
- algorithm can be put into hardware (like network interface chips)
- more work in setup/encryption should make it more difficult to break the code
- key distribution problems
Multi-Stage Symmetric-Key Encryption/Decryption: Data Encryption Standard and Advanced Encryption Standard. Shown in textbook, but decryption not discussed! Block, chaining, padding techniques are important.
Public-key cryptography: RSA (algorithm) To be shown on blackboard, how it works, what are some properties, how to leverage public-key encryption for bulk data and for authentication. How Certificate authority works (time permitting).
Briefly cover token networks and the FDDI standard as a contrast to Ethernet and WiFi for local area networking. Chapter 6 explains how 802.11 usually relies on infrastructure (access points connected by Ethernet), uses an improvement based on bandwidth reservation (RTS/CTS frames) and handoff between BSS of different access points. Note that the idea of tokens, RTS/CTS, and other explicit Bandwidth managementis found at different layers in the network stack (eg, see Resource Reservation Protocol).
Aside: interesting WiFi Hack.
Discussion of Chapter 5 topics finishes in this lecture. Look briefly at the range of Layer 2 and below (MAC, LAN, and physical layer) to see such things as Metropolitan area network scale. Of practical interest are architectures for CATV and Hybrid fibre-coaxial networks. The textbook does not mention it, but ATM networks which can provide backbone services to the internet often use optical networking standards (see SONET) for fiber communication.
Delving into Chapter 6 (not required reading) to get some details about WiFi and some principles of wireless networks. Also from Chapter 6, a bit about CDMA, GSM, and cellular networks which use spatial multiplexing and handoff protocols.
Ethernet. Origins in Aloha, slotted Aloha; Ethernet also inspires WiFi protocols.
Basic principles: ways to share a link include channel splitting (such as dividing up frequencies like FDMA or WDM), taking turns (TDMA), permission-based shemes (an access point controls who can transmit, or token-based access), spatial multiplexing (how cellular may reuse frequencies), code multiplexing (CDMA), and randomized access. Ethernet uses decentralized, randomized access. Efficiency and fairness are important metrics and the various ideas of sharing media each have their own limitations and advantages.
CSMA was used in Aloha, and two refinements (CSMA/CD for Ethernet, CSMA/CA for WiFi) improve efficiency. The full Ethernet specification, now IEEE 802.11, spans the logical parts of the protocol for framing and behavior down to physical attributes like voltage, cable and wire properties, even the allowable length of cable.
Originally just a bus-type architecture, Ethernet became popular and evolved to include repeaters, hubs and switches. The way 802.11 works in standard IP settings is generalized for other kinds of LAN, and the ARP protocol unifies the way subnets and framing addresses work together. Unlike the network layer, which has to deal with immense scale, a LAN is restricted to be small size. At small scale, simpler and more flexible protocols are practical than would be for the network layer. For instance, LAN broadcast is not difficult or expensive. Also, many LAN protocols are "plug and play" so they work with close to zero administrative overhead.
More about physical properties of signals. Definition of Baud. Theoretical limits, like Shannon limit, Nyquist limit, and Signal-to-noise ratio -- hardware cannot solve all problems! Such things motivate any number of tricks at the physical layer (examples include ADSL, Optical amplifier, and MIMO to name just three). Bit errors are a fact one has to cope with at the link/physical layer. The textbook suggests two main ideas, error detection (which can be used to signal higher layers they need to resend) and forward error correction. Useful examples of these are CRC32 for error detection and Convolutional code for forward error correction. Error detection codes like CRC are often included in frames for link layers to detect errors. Hardware and algorithms for error detection have been highly optimized.
Framing. In a sequence of symbols, where does a byte start, where does it end? Usually this is solved by adding a sequence of extra bits, a preamble or Syncword before the frame starts. In addition, protocols like HDLC can add special characters that signify the start of a frame, the end of a frame, and an escape symbol so that a start-of-frame character can be skipped over by frame processing when it accidentally occurs in a frame payload. (This issue is similar to the problem of how to put the double-quote character inside a string that is delimited with the same, such as "this is a doublequote \"".)
Ethernet. A nice example of a link layer protocol is Ethernet, and the text explains it well. The main issues are the hardware assumptions and restrictions, metrics for efficiency, and the notion of CSMA/CD which is a distributed and randomized protocol for media access.
Switches, Hubs, Repeaters, and ARP. Once we know a bit about Ethernet, we can look at more detail on how the basic protocol has evolved and how it is used quite often in LANs that support IP.
Continue discussion of Homework 5.
Start Chapter 5 (selected topics). Unlike coverage of previous chapters, the reading of Chapter 5 will be selective -- just some of the topics in the chapter will be covered. The chapter really is addressing two layers of the network stack, the link (or LAN) layer, and the physical layer. The discussion of physical properties of networks is very superficial. Unlike the text, we start the first lecture with the physical properties:
- How a bit is represented physically -
- square waves of electrical voltage or light on a fiber
- modulation of a carrier signal (could be amplitude, frequency, phase, or some combination)
- Problems of directionality -
- guided vs unguided radiation
- power, range, and antenna properties
- Problems of timing -
- when does a bit start?
- how "wide" should a bit be?
- how do transmitter and receiver synchronize their timing?
- Problems of noise -
- what do do about errors?
- what about interference of multiple signals on a channel?
- signal/noise ratio and power
Once the problem of representing, sending, and receiving a bit is solved, we have the problem of how to know when a frame (which is the equivalent of a packet at the link/LAN/physical layer) starts, when it ends, so that the right "bits" are included in a frame, and it's known where a byte starts, a frame starts, and where it ends. The textbook has discussion of byte-stuffing and bit-stuffing, though there is more to the story (especially for wireless links).
Multiplexing is a major concern at the link/LAN layer. How can multiple signals concurrently be sent, whether on the same link, adjacent links (where interference is an issue)? If frames can't be concurrent, how can they be scheduled so they don't interfere?
Internet backbone routing depends on Border Gateway Protocol and on Autonomous System (Internet) concepts, finishing Chapter 4. The chapter does not give much detail on how inter-AS traffic works, notably through an Internet exchange point, though much of this knowledge depends on Chapter 5 concepts. The most interesting thing about BGP is that it uses path-based and policy-based techniques to decide how addresses are advertised and how routing tables should be made.
This is a good place to mention Traffic shaping and how the network layer relates to distributions of service times (as we saw in Queuing Models earlier). Another feature of a smarter network layer can be the way it handles Differentiated services.
Also: Homework 5 - a first explanation, see Homeworks.
Almost finished with Chapter 4: routing algorithms, including Link-State Routing and Distance-Vector Routing. Some pathological difficulties in the distance-vector protocol (like count-to-infinity and poison-reverse) merit some examples.
Continue with Chapter 4, resuming discussion of Network address translation but also Firewall (computing) --- both are examples that technically often violate the concept of layer encapsulation. Look at examples of subnetworks again. How does DHCP work? Examples of ICMP - ping and traceroute. IP fragmentation. Time permitting, start routing algorithms.
Questions answered about current homework (especially, the use of queues added to the Socket Concurrency page).
Moving to Sections 4.3--4.4 of the textbook, covering router/switch architecture (like Crossover switch, Crossbar switch, Banyan switch), subnets (Subnetwork) and NATs (Network address translation). It's useful to look at address descriptions in Private network
Covering the first two section of Chapter 4, with student participation, drawing routing tables on the board for both virtual circuit and packet switching (internet) cases. Some brief description of queues to communicate between processes and and threads, but after class some detail about this was added to the Socket Concurrency page. Queues can assist in homework four, if you use Python, Java or some other high level language with similar libraries.
Initiate coverage of Network congestion
Look at how ATM implements Available Bit Rate
Although the chapter emphasizes how to manage congestion (by arranging for protocols to back off on their bandwidth consumption), it's also possible to take preventative action: like Call Admission Control does; this is a planned strategy much like bandwidth reservation used in Virtual Circuit setup.
Just to see how many ways there are to deal with congestion, consider this Taxonomy of congestion control.
- TCP Congestion Control:
- P27 (4th Edition): Consider TCP's estimation of RTT, supposing alpha is 0.1, and let sampleRTT[i] be the i-th most recent sample RTT, so that sampleRTT is the most recent.
- Looking at the four most recent samples, what is EstimatedRTT?
- Generalize this to the n most recent samples.
- What is the formula when n goes to infinity and explain why this is called EMWA.
- P31 (4th Edition): Why does TCP wait until it has received three ACKs before performing a fast retransmission? Wouldn't it be better to start the retransmit after the first duplicate ACK?
- Creating TCP Connections
Chapter 3: Understanding TCP.
- TCP Connections: how they are created using network layer (three-way handshake) and setup negotiation parameters (MSS, MTU). SYN, RST, FIN for setup/teardown.
- Sequence numbers, cumulative ACKs, full-duplex and piggybacking.
- Estimating RTT and timeout. Adjusting timeout under varying conditions.
- Optimizations: fast retransmit, selective ACK.
Last-minute homework questions, if any.
- Show how, if state notation is parameterized, the FSM for the rdt2.1 sender can be reduced to two states.
- Suppose the alternating bit protocol is used on a link from host A to host B. The segment size is 1500 bytes (including header and error detection checksum). The transmission rate is 1 Mbps in either direction. An ACK is 8 bytes (including header and error detection). Propagation delay is 36 ms. What is the link utilization?
- Again, suppose the alternating bit protocol is used to transmit from A to B, under the same rates and segment sizes as the previous question. Now, however, assume there is a 4% probability that a segment is either lost or there is a corruption (in either direction) and that the timeout is perfectly tuned to a segment transmit and ACK reception cycle. How long would one expect, on average, it take for A to fully transfer a file of 8 MB to B?
Continue with Chapter 3 -- read another fifteen pages, which describe the relatively simple UDP and the beginnings of how two nodes communicating across a link can overcome unreliability at lower layers. Topics include:
Alternating bit protocol mostly as a simple introduction to resending
Automatic repeat request (ARQ) protocols, which show different design choices
Sliding window protocol illustrates what kind of bookkeeping is needed
Review of Finite-state machine abstraction
This is the start of Chapter 3, on the transport layer. Plan to read the first 15 pages of Chapter 3 for this lecture. Though the main topic of the lecture is the transport layer, there will also be time to continue discussion of Homework 2. Please look up these topics from Chapter 2, because they will be used in Chapter 3:
- control connection
- data connection
These terms may be found in the explanation of FTP file transfer.
Even as coverage of Chapter 3 and later chapters continues, there will be lecture references to some important things from the application layer, such as remote procedure call, and a few conventions on web services.
Chapter 3 explains the concepts of multiplexing (with ports and sockets) in some detail, but doesn't show how transport layers might work in other networks (besides internet). Much later in the book, there is some explanation of Asynchronous Transfer Mode which was a different type of network used at one time by telephone companies. Part of the lecture will look at ATM to see another way to approach transport layer ideas. (As an interesting historical aside, there used to be more competition between carefully planned and resource-controlled networks like ATM and the ad-hoc, best-effort style that is the internet. Some of the historical debate is found in discussions of Dumb network and the End-to-end principle).
One way to make the discussion of TCP more concrete is to take a look at the TCP Header Format (also see Transmission Control Protocol for a similar figure). In fact, the TCP header is more complex than these figures show.
Problem P18 from the textbook (6th edition): What is a whois database? Use the ARIN database to find the range of IP numbers assigned to the university.
- P2P topics.
- Explain equation (2.1) of the textbook (6th edition), that Dcs = max(NF/us, F/dmin) for a client server file distribution system.
- Explain equation (2.3), that Dp2p = max(F/us, F/dmin, NF/(us + sum(ui))) for a P2P file distribution system.
A look at how BitTorrent works.
The significance of Distributed hash table for P2P systems.
How an Overlay network can support a P2P system.
Continued discussion of proxies and the homework are the first topic. Will one week be enough to complete this assignment? What other information is helpful to complete the homework? Examples:
- Two ways to set up a browser to use a proxy are:
- In the preferences or settings, there may be a place to set an IP address and a port for a proxy server.
Setting the environmental variable http_proxy before running the browser (and launching the browser within the context of that environmental variable). In Windows, an environmental variable is set using a kind of dialog/form. A web search on the http_proxy environmental variable might turn up useful instructions. Under Unix/Linux, there are commands to set and query environmental variables, depending on the kind of command shell being used. Variously, these might have syntax like:
export http_proxy=locahost:8080 or export http_proxy=127.0.0.1:8080
env or set may show you the current environmental variables
setenv http_proxy localhost:8080 works on some command shells
- Change the port number from 8080 to another value if that port is in use (sometimes, a crashed proxy hangs up a port number for a few minutes). Some systems won't let you use port numbers in the range 0--4096 because these are reserved for system servers. If in doubt, try a larger number.
You can work very hard on something unrelated to this course; you could implement a natural language parser, do sentiment analysis, and so on. In other words, you can spend many hours on programming that has nothing to do with networking protocols. Please don't expect to get a good grade for this. The purpose of this homework, as others, is to enhance your understanding of networking, protocols, programming for data communication, and so on.
Second topic for the lecture: review the Chapter 2 topic of DNS and Internet Domains.
More about TCP, HTTP, and Proxies are the main part of this lecture.
First, look at some actual server responses (at least the HTTP part) online, either using a browser plugin or a command, like
$ wget -S http://www.uiowa.edu
The point is that instead of just reading about the web protocol (HTTP) in Chapter 2, you can experiment on your own to see how it works. This kind of experimentation can supplement material like wiki(Hypertext Transfer Protocol) which have some examples and the basic form; one can see historical definitions such as [http://tools.ietf.org/html/rfc2616 RFC2616], and [http://www.w3.org/Protocols/ W3C Documents] for official, but less readable material. Other ways of viewing HTTP include browser plugins, some special websites, and implementing a proxy.
Second, again see what is an HTTP proxy, how browsers work with proxies, and some conventions on hosts, the loopback interface, and standard port numbers. This can be demonstrated in several ways.
Third, introduce the Twisted Framework. Though there's no substitute for knowing about the socket interface between application and transport layers, there is a trend in applications to use higher-level software, such as a framework. The Twisted (software) is an example of such a framework. There are frameworks based on Java, C#, C++, and other languages. The website for Twisted shows on the first page how simple it is to write a TCP server. The advantage of using a framework is that you can use only a few lines of code to accomplish much; the disadvantage is the calls to the socket interface are hidden, which makes it harder to understand what's happening in detail.
Fourth we look at how to implement an HTTP Proxy.
Fifth explain the new homework.
(Lecture starts with a few remarks about previous homework results.)
As background needed to work on applications (for Chapter 2 of the textbook), this lecture will review some of the basics of working with sockets (the nearly universal abstraction for network programming on unix-based and Winsock systems) and a bit about concurrency, since networking inherently uses communication between concurrent systems.
First, a review of C Socket Programming (even though many students don't work with C, it is the most primitive standard interface technique from the application to the transport layer). Most higher-level languages (Java, C#, Python) derive their interfaces from the way the C API works.
Second, a look at Sockets in Python, as an example of how the socket abstraction has been ported to another language.
Queuing Theory questions (examples of store-and-forward paths, which consider not only transmission and propagation delay, but also queuing delay, based on the M/M/1 model's distributions of arrival and service times).
- [Warm up] Suppose a customer arrives to an automotive parts store with just one counter server about once every 5 minutes, and the mean time in the store is 35 minutes. What is the expected number of customers in the store? About how long does service (not waiting) take per customer?
- Suppose a unidirectional data link has one store-and-forward switch between the two endpoints, and the link/switch system can be modeled as an M/M/1 queuing system. The mean arrival rate of packets to the switch is 2500 packets per second. The mean service time is determined by the link bandwidth and packet size. The outgoing link transmission rate is 1.5 Mbps. The mean packet length (assume packet lengths are exponentially distributed, in keeping with the M/M/1 model) is 187.5 bits. How much time, on average does a packet reside in the switch before it is forwarded?
- Again, assume we have an M/M/1 system for a switch. This time, the arrival rate is 50 packets per second, and mean packet length is 1.5 KB. The transmission rate on the outgoing link is 3 Mbps. The question is, how much buffer should the switch have in order to get a lower than 2% probability on having to drop a packet due to congestion?
Start of Chapter 2, the Application Layer.
Problems from the textbook (6th Edition):
- For a communication session between a pair of processes, which is the client and which is the server?
- What information is used by a process on one host to identify a process running on another host?
- Suppose you wanted to do a transaction from a remote client to a server as fast as possible. Which is better for this, UDP or TCP?
Terminology: what do these terms mean? In the next several weeks, topics from Chapter 2 will explore the following list (and more), including programming exercises.
Application Layer, Application Architecture, Client-Server Architecture, P2P Architecture, FTP, telnet, processes and messages, port number, reliable data transfer, loss-tolerant applications, bandwidth-sensitive applications, elastic applications, transport services, Secure Sockets Layer (SSL), HTTP, HTTP RFC (and W3C specification), SMTP, stateless protocol, persistent vs non-persistent communication, RTT, TCP connection, pipelining, HTTP header, HTTP request, HTTP response, Cookies, Web Caching, proxy server, Content Distribution Network (CDN), control connection vs data connection, out-of-band, in-band, email user agent, email server, pull vs push protocols, POP3, IMAP, host aliasing, canonical hostname, load distribution, distributed DNS database, root DNS servers, TLD servers, authoritative servers, recursive queries, iterative queries, DNS cache, Resource Records, Type A records, registration, ICMP, P2P Scalability, Distributed Hash Tables (DHTs).
No class! Use the time to work on the first homework.
Problems from the textbook (6th Edition):
- Suppose two hosts A and B are connected by a single link with rate R bps. The two hosts are separated by m meters, and propagation speed on the link is s meters/sec. Host A needs to send a packet of L bits to host B.
- Express propagation delay in d_prop terms of m and s.
- Find transmission time d_trans in terms of L and R.
- What is the end-to-end delay (ignoring queuing and processing delays).
- Suppose host A starts at t=0. At time t=d_trans where is the last bit of the packet?
If d_prop > d_trans, where is the first bit of the packet at time d_trans?
Otherwise, if d_prop < d_trans, where is it?
- Suppose s is 2.5e8, L=120 bits, R = 56 kbps. Find m so that d_prop = d_trans
- In this question, users are generating data to send at the rate of 100 kbps when they are busy, but they are only busy with probability 0.1. The outbound link that all the users share was initially 1 Mbps, but now we upgrade it to a 1 Gbps link. How many users can be supported under circuit sharing? What if packet switching is used, and there are M users, what is a formula in terms of p, M and N for the probability that more than N users are sending data.
- Consider the queuing delay in a router buffer. Let I be the traffic intensity (I = La/R, where L is size of packets, R is transmit rate, a is like average arrival rate).
- Provide a formula for total delay, equal to queuing delay plus transmit delay out.
- Plot the total delay as a function of L/R.
- Visit www.traceroute.org and perform traceroutes from two different cities in France to the same destination host in the US. How many links are the same in the two traceroutes? Is the transatlantic link the same?
Suppose you urgently need to send 40 terabytes from Boston to Los Angeles. You have a 100 Mbps link (dedicated bandwidth). Which is better, the data link or FedEx overnight? Explain the decision.
- Suppose two hosts A and B are separated by 20,000 kilometers and connected by a direct link of R = 2 Mbps. The propagation speed is 2.5e8 m/s.
- Calculate the bandwidth delay product, R*d_prop
- Consider sending a file of 800,000 bits from A to B as one continuous stream of bits. What is the maximum bits on the link at any instant?
- What is the width (in meters) of one bit on the link?
Continued problems from Chapter 1, continued review of probability distributions, and the start of queuing theory. For background, you can read a general description of Queuing theory. Coverage for this course will be confined to a few basic topics. The lecture will concentrate on probability exercises, then a derivation of M/M/1 queue statistics. The first homework has been assigned, please see the Homeworks page.
Quiz over some of the basic terminology; at least one question on using metrics. Beginning of solving problem from the textbook. Three problems to be solved in class are from the text (6th edition), R11, R12, and R13.
- Suppose there is exactly one packet switch between a sending host and a receiving host. The transmission rates between the sending host and the switch and between the switch and the receiving host are R1 and R2 respectively. Assuming that the switch uses store-and-forward packet switching, what is the total end-to-end delay to send a packet of length L? (Ignore queuing, propagation delay, and processing delay.)
- What advantage does a circuit-switched network have over a packet-switched network? What advantages does TDM have over FDM in a circuit switched network?
- Suppose users share a 2 Mbps link. Also suppose each user transmits continuously at 1 Mbps when transmitting, but each user transmits only 20 percent of the time. (Read discussion of statistical multiplexing in the text.)
- When circuit switching is used, how many users can be supported?
- Suppose packet switching is used. Why will there effectively be no queuing delay if two or fewer users transmit at the same time?
- What is the probability that a given user is transmitting?
- Suppose there are three users. What's the probability that at any given time all three are transmitting simultaneously?
hosts, end systems, network application programs, protocols, TCP, IP, TCP/IP, communication links, physical media, link bandwidth, routers, IP protocol, route, path, packet switching, ISP (Internet Service Provider), access networks, RFC (Internet standard), intranet, connection-oriented service, connectionless service, IETF, W3C, clients, servers, client/server model, reliable data transfer, flow control, congestion-control, UDP, circuit switching, virtual circuit, best effort, FDM (frequency division multiplexing), TDM (time division multiplexing), silent periods, packets, switches, store-and-forward transmission, store-and-forward delay, input buffer, output buffer, queuing delays, packet loss, statistical multiplexing, message switching, header, propagation delay, datagram network, interface numbers, state information, edge router, modem, POTS, DSL, ADSL, CPDP, guided media, unguided media, processing delay, transmission delay, end-to-end delay, packet drop, protocol stack, n-PDU, service model, layered architecture, segmentation and assembly, connection setup, protocol stack, application layer, transport layer, network layer, link layer, physical layer, backbone, link-layer switches, FTTH, Ethernet, WiFi, IEEE 802.11, 3G, LTE, unshielded twisted pair (UTP), coaxial cable, terrestrial radio, LEO, geostationary satellites, forwarding table, PoP, multi-home, IXP, content provider networks, throughput (instantaneous and average), bottleneck link, segment, frame, OSI model, encapsulation, payload, malware, botnet, virus, worm, DoS attack, DDoS, packet sniffer, IP spoofing.
Many of these terms from the book's first chapter can be found also, by some searching, on Wikipedia. After reading and studying the first chapter, you are expected to be able to explain generally where these terms fit into the overall picture of networks.
- Metrics. These are mentioned in the book, and definitely used in some end-of-chapter exercises. You need to know these facts:
Bit rate is the standard measurement of digital bandwidth, with "bps" as the associated suffix. When using "bps" the prefixes T, G, M, and K are powers of ten.
Byte is the standard measurement of data size for digital media, with "B" as the associated suffix. When using "B", the prefixes T, G, M, and K are powers of two.
- Practice with examples, like: how long does it take to transmit 2.2 TB at 115 Gbps ?
Review basic statistics distributions and metrics: mean, standard deviation, and important distributions -- see distribs.pdf