Computer Communications Dietolf Ramm (Draft) (c) Copyright 1995 All rights reserved NOTE: This assumes use of the UNIX workstations run by the Computer As- sist Center. All students now have accounts. 1. Exploration (A) On the way back from a concert by your favorite rock group, you are al- most cut off by an 18 wheeler with a huge trailer. You fume for a while planning all kinds of revenge. You are sorry that you always seem to be fresh out of photon torpedos just when that kind of thing happens. There is another way. Why not send a note to the president noting your dislike of any laws allowing vehicles bigger than yours on the nation's highways. It is no longer necessary to get out paper, envelope, and stamps to do this. All you need to do is log into your networked computer and send electronic mail (e-mail) with your thoughts on this matter. You may not get a handwritten personal reply, but your opinion is likely to be at least counted. +---------------------------------------------------------------+ |mail president@whitehouse.gov | |Subject: Big, bad trucks | |Dear Mr. President | |I'm afraid of the big bad trucks. Please help! | |Sincerely, | |Red | |. | +---------------------------------------------------------------+ | Letter to the President | +---------------------------------------------------------------+ However, you are in luck and the President would like to discuss your concerns with you in person. The President replies that due to the up- coming governor's conference in Seattle, that the meeting will have to be there. The note asks you to suggest a nice place to lunch in Seattle. Before making a reservation, you want to find out what the weather will be. It would be nice to meet in one of these outdoor establishments near the Space Needle. By entering ftp ________________________________________ on your computer you can get the latest satellite pictures and weather maps in just a few minutes. The latest weather forecast can be displayed by typing weather sea where sea is the code for the Seattle area. A typical response is shown below. *------------------------------------------------------------------------* |Weather retrieval script, version 2.2 | |Connecting to downwind.sprl.umich.edu...connected | | | |Weather Conditions at 4 PM PDT on 15 JAN 95 for Seattle, WA. | |Temp(F) Humidity(%) Wind(mph) Pressure(in) Weather | |========================================================================| | 44 92% SSE at 9 29.71 fog | | | |466 | |FQUS1 KSEA 152342 | |LFPSEA | |WAZ001-160530- | | | |SEATTLE TACOMA EVERETT AND VICINITY FORECAST | |NATIONAL WEATHER SERVICE SEATTLE WA | |400 PM PST SUN JAN 15 1995 | | | |.TONIGHT...OCCASIONAL SHOWERS DECREASING LATE, OTHERWISE MOSTLY CLOUDY. | |LOWS IN THE UPPER 30S. SOUTH WIND 5 TO 15 MPH, DECREASING OVERNIGHT. | |.MONDAY...SCATTERED SHOWERS AND PATCHY MORNING FOG. PARTLY SUNNY | |PERIODS. HIGHS MID 40S. LIGHT SOUTH WIND. | |.MONDAY NIGHT...PARTLY CLOUDY BECOMING MOSTLY CLOUDY LATE. SLIGHT CHANCE| |OF RAIN LATE. LOWS 30S. | |.TUESDAY...RAIN DEVELOPING. HIGHS MID 40S. | | | |.< TEMPERATURE / PRECIPITATION | |SEATTLE 38 45 35 44 / 60 40 20 60 | | | | *********************** | | State extended forecast | | *********************** | | EXTENDED FORECAST... | | WEDNESDAY...RAIN TURNING TO SHOWERS. LOWS IN THE MID 30S TO LOWER 40S. | |HIGHS MID 40S TO LOWER 50S. | | THURSDAY...RAIN DEVELOPING AGAIN AND WIND INCREASING. LOWS IN THE UPPER| |30S TO LOWER 40S. HIGHS MID 40S TO LOWER 50S. | | FRIDAY...RAIN AT TIMES. WINDY. LOWS IN THE MID 30S TO LOWER 40S. HIGHS | |MID 40S TO LOWER 50S. | | | |The National Weather Service information is provided by the University | |of Michigan Weather Underground project and the National Science | |Foundation-funded Unidata project, from a data feed broadcast by | |Alden/Zephyr Electronics, Inc. | *------------------------------------------------------------------------* These are just a few examples of the opportunities afforded by using the Internet. Our goal in this chapter is to get a better understanding of how this marvelous network operates and to get some practical experience in making use of it. 2. Layers and Local Area Networks (LANs) (B) One recurring theme in computing is that of layers. We have already looked at programming at the Pascal and the Assembler level. Communica- tions software also deals with the problems to be solved in layers. The bottom layer is the actual hardware used for communications. The systems used at this low level are often described as Local Area Networks or LANs. Two common low-level methods used for computer communications are call Ethernet and Token Ring. Token Ring will be discussed briefly at a later point. First we will concentrate on Ethernet. Ethernet was developed by the Xerox Corporation and now has become a standard. It is called a bus architecture because many computers communicate on or share the same bus as is illustrated in Figure A. ----- ----- ----- ----- ----- | | | | | | | | | | | A | | B | | C | | D | | E | Computers | | | | | | | | | | | # | | # | | # | | # | | # | Ethernet Controller --|-- --|-- --|-- --|-- --|-- | | | | | | | | | | o=======o=======o=======o=======o=======o=======o Bus Figure A. An Ethernet based Local Area Network In this figure, the letters A through E are used to designate computers (also called _h_o_s_t_s). Each computer contains an Ethernet controller with a connection to the bus. It works a little bit like many people trying to communicate in a large room. Ideally, everyone waits until the room is quiet. Then Judy might say: ``John, how did you like the book?'' Everyone can hear the ques- tion. However, it is intended for John and, in polite company, only John will listen to the message and he will probably formulate a reply. John will wait for a quiet moment in the room and might then say ``Judy, the book was great!'' As long as everyone is polite and orderly, any two people in the room can communicate. No more than two can communicate at any one point in time. While everyone can hear all messages, all listeners are expected to ignore messages not intended for them. Real Ethernet works a lot like this except that the signals on the bus are electrical and the bus takes the form of a piece of coaxial cable. (You may have seen coaxial cable in your cable TV hookup or the cable connecting your VCR to your TV in a typical installation.) There also is a maximum allowed message length to keep any two computers from hogging the bus. All Ethernet controllers are guaranteed to have a unique ad- dress (manufacturers deal with a central clearinghouse) which is sort of like a serial number and takes the form of a 12 digit hexadecimal (base 16 arithmetic) number written in the following form: 5A 34 B2 31 90 1C With a unique address, messages can always have an unambiguous destination---not like the example given earlier where we would have problems if there were more than one Judy in the room. Another problem dealt with at this low level is one of collisions. Even in polite company, two people may start to begin talking exactly at the same time. For Ethernet this is called a collision. Ethernet controll- ers can detect that a collision has occurred and know that this means that the data is likely to be garbled. In this case, each controller wishing to use the bus waits for a random amount of time before attempt- ing to communicate again. This so-called random back-off makes it un- likely that one collision is immediately followed by another one. A concept used with Ethernet that is widely used in computer communica- tions is that of breaking a message into smaller chunks. It was men- tioned above that messages had a maximum length for fairness reasons. This means that larger messages must routinely be broken down into small- er segments called packets. As illustrated in Figure B., each packet in- cludes important control information including the destination address (where it's going) and the source address (where it came from). +-----------+-------------+---------------------+------------+ | dest addr | source addr | message | check info | +-----------+-------------+---------------------+------------+ Figure B. The Format of an Ethernet Packet Think of it as taking a manuscript and mailing it on several standard postcards. Each has independent address information on the front. At the destination, all postcards can be be ordered and the manuscript re- assembled. To aid in this, each packet also includes sequencing infor- mation. This idea is illustrated in Figure C. +-----------+-------------+---------------------+------------+ | dest addr | source addr | FOUR SCORE AND | check info | +-----------+-------------+---------------------+------------+ +-----------+-------------+---------------------+------------+ | dest addr | source addr | SEVEN YEARS AGO | check info | +-----------+-------------+---------------------+------------+ +-----------+-------------+---------------------+------------+ | dest addr | source addr | OUR FOREFATHERS | check info | +-----------+-------------+---------------------+------------+ +-----------+-------------+---------------------+------------+ | dest addr | source addr | BROUGHT FORTH ... | check info | +-----------+-------------+---------------------+------------+ ... Figure C. A message distributed over multiple packets. Note that there is an obvious potential security problem in the Ethernet communications method. As mentioned earlier, every controller on the bus can see every message, regardless of it destination. Standard software causes the controllers to ignore all messages not intended for that com- puter. However, users may be able to obtain rogue software that does not follow that convention. This is akin to wiretapping and is illegal. Un- fortunately such software is not too hard to obtain since it does have legitimate trouble-shooting applications. So, we leave this hardware level with a warning. Unless you are sure of the other machines and/or users on you local Ethernet, you can not rule out that what you are typ- ing is being monitored. Of course, you should keep this in a healthy perspective. Your phone may be tapped, with a court order or illegally, and with a warrant police may intercept your mail as well or without, a thief can easily take something out of a street-side mail box. 3. Wide Area Networks (B) o " " o------------ " --|-- ----- ----- ----- " | # | | | | | | | " | | | | | | | | " | A | | B | | C | | D | Computers " | | | | | | | | " | # | | # | | # | | # | " --|-- --|-- --|-- --|-- " | | | | " | | | | " o=======o=======o=======o=======o=======o Bus " Network 1 " " " # = Ethernet Controllers " o = Connection to Ethernet " Network 3 " " " o------------ " --|-- ----- ----- ----- " | # | | | | | | | " | | | | | | | | o | W | | X | | Y | | Z | Computers | | | | | | | | | # | | # | | # | | # | --|-- --|-- --|-- --|-- | | | | | | | | o=======o=======o=======o=======o=======o Bus Network 2 Figure D. An internet: two or more networks connected by another network As is illustrated in Figure D, we can take two Ethernet based networks (Network 1 with hosts A, B, C, and D and Network 2 with hosts W, X, Y, and Z) and connect them together using a third Ethernet based network, Network 3 (Network 3 could have many more connections but here shows con- nections to only hosts A and W). To accomplish this and we have set up machine A with two Ethernet controllers and connections to Networks 1 and 3. A machine that is connected to more than one network is sometimes called a gateway machine. (One can also buy a special purpose piece of hardware called router to accomplish the same thing.) Machine W is also a gateway. Machine C, for example, can now communicated with machine X by using gateways A and W. A message from C intended for X would first go from C to A using network 1. Then A would forward it, using network 3, to W. W would then forward the message, via network 2 to X. If the message is long enough, it would consist of a number of packets, each taking the same route. A collection of networks connected together is called an internet. The large network often referred to as the basis for the ``Information High- way'' is call the Internet (first letter capitalized). Ethernets are designed to be fairly fast, but have certain size restric- tions placed on them. Under ideal conditions, an Ethernet can be almost a mile long. Other hardware and software constraints often reduce that to several hundred feet. This would mean that an internet consisting of only Ethernet based networks would have to be fairly small physically, or it would take an very large number of ``hops'' from gateway to gateway to cover very large distances. However, it is possible to replace the Eth- ernet in network 3, with other communications hardware. Such hardware, often leased from a common carrier, can provide connections over very large distances. The principles are similar to that used on the Ether- net. Information is sent in the form of one or more packets. Each pack- et includes appropriate addressing and other control information. The IP Layer We have discussed one physical communications medium, Ethernet, in some detail. We have mentioned Token Ring as and other possibility and there are others. ATM (standing for Asynchronous Transfer Mode) is another fairly new transmission medium which can handle very large amounts of ma- terial over long distance. In order insulate the users of computer com- munications media from worrying about this kind of detail, a layer of software has been written called the Internet Protocol or IP. This layer provides a standard way of handling packets of information regardless if the underlying transport mechanism is Ethernet, Token Ring, or some other system. Each computer (or host) is given an IP address, regardless of how it is connected. IP packets look a lot like Ethernet packets with a source and destination address. If you know the IP address of the host you want to contact, the software will do the rest. It will automatically choose the appropriate networks (be they Token Ring, Ethernet, or whatever). The IP layer makes all networks seem the same. If an IP packet is sent over an Ethernet, the complete IP packet is treated as data by the Ethernet packet. In other words, we have a packet within a packet. The (somewhat stretched) analogy is that if we sent our manuscript on postcards to a person in France, but that person was on a business trip in Belgium, the person's secretary might put the postcards into envelopes with French stamps and forward them to Belgium. More on Addressing (C) You have probably seen the word address or addressing often enough in the last few pages to suggest that there is something important here. We use addresses all of the time in everyday life and often there are alternate ways to address the same location. For example in Washington, DC, ``The White House'' and ``1600 Pennsylvania Avenue'', would both work with the US Postal Service. Terms like ``presidential residence'' would probably also work. If you were already nearby and giving directions, other things would also work like ``third house on the right'', ``house on the corner'', or ``the white house with the columns-- you can't miss it''. If you are a courier having to deliver something, it all eventually has to translate down to walking a certain number of steps, turning right or left, taking more steps, etc. The upshot of this is that we deal with any number of addressing schemes, and in the end we have to translate these into actual physical motion or action. When dealing with computer networks, we also need to be aware that there are several different ways of addressing things. In the end, when Ether- net is involved, all addressing has to be translated to the 12 digit hex- adecimal number unique to the Ethernet controller you are trying to reach. Fortunately, we don't have to memorize such addresses, since there are other addressing methods that are better suited for human use, and with the help of computer software, we can use higher level, more descriptive, addresses. Let's use a high level address and follow the process that gets us to the right place. Assume we are sending a message to Tom at the University of California at Berkeley. If his login ID on his computer is TOM and the machine he is using is named JONES, then we might type mail tom@jones.berkeley.edu This utilizes a high level addressing scheme called a domain address. The domain address is an hierarchical addressing scheme going (left to write) from local to general (just like a typical mail address). For the United States, the last portion is one of: com (commercial), net (net- work), edu (educational institution), org (organization), mil (military), or a state code. (International extensions to this scheme end in a coun- try code.) Other levels are appropriately descriptive, the number of them is dependent on the size and complexity of the organization covered. The leftmost entry to the domain address is often the name of the host machine on which the user reads his or her e-mail. This statement will cause a query to be sent to the machine JONES at Berkeley. There it will probably encounter a piece of software called the name server which will produce a lower level address called the IP address (discussed briefly above). This IP address is a number of the form 128.123.34.56 (4 decimal numbers each ranging from 0 through 255, separated by periods). For all practical purposes, throughout the Inter- net, this IP address uniquely identifies the computer and the network. (Actually, more accurately, it identifies a connection point of a comput- er to a network. A gateway machine has more than one IP address. Most computers have only one connection point, thus only one IP address.) Again, the beauty of the IP address is that it is independent of the kind of hardware that it is used for. The actual lower level hardware may be Ethernet. Remember that Ethernet used the 12 hexadecimal digits. It may be token ring or some other communication method. Everything has an IP address. What this means is that once we have the IP address we can start sending packets to our destination. These packets include the IP addresses involved. To actually communicate, they must then be carried on the actual physical medium being used. Again, let us assume Ethernet. In that case an IP packet is carried as data in an Ethernet packet which uses the 12 hexadecimal digits for addressing. Since this translation to the other addresses is done automatically, we normally don't have to wor- ry about it. In our message to Berkeley, the software will figure out the addresses of the various Ethernet controllers that may be involved. If other types of communications hardware are involved, then the address- ing specific to that method will automatically be utilized. 4. Other Applications (A) Our main illustrations of application of the Internet so far has been e-mail although we briefly discussed getting a weather map. Below we briefly cover some other applications. News. The interchange of information of general interest by computer started when two graduate students at Duke University got together with a gradu- ate student from the University of North Carolina at Chapel Hill and started posting general information for others to read. The connections between the computers were made via modems over dial-up phone lines. They developed the software for posting and sending news. Soon several Duke computers were involved and the system went national when AT&T Bell Labs in New Jersey became the first non-North Carolina site. This sys- tem, dubbed Usenet, grew rapidly with participants all over the world. You can read news by typing ``rn'' and following the instructions. Other programs to organize your new reading are available. Using Remote Computers. If you have accounts on a remote computer that is on the Internet, you can log into it using the telent command. For example you could type telnet info.berkeley.edu and providing you user id and password to log into the machine named info. If you just want to down-load files from such a computer you use the file transfer protocol or ftp by typing ftp info.berkeley.edu and providing your id and password and then using the appropriate down- load commands (e.g. get filename ). Many Internet sites provide public information by allowing any person to log in by using the user id anonymous and using their own e-mail address as the password. This is called anonymous ftp. 4. Surfing the Internet (A) Over the years, many computer programmers used basic tools like telnet and ftp to gather information from other computers. Many computer sites keep special archives of programs and other information available for down- loading. Some of the archives are sponsored by computer clubs or the government. Institutions and individuals made general information available to people in this manner. Scientists kept copies of their la- test publications on the computer so that other scientists could get copies of them. As the kind of person using the Internet became more of a lay person as far as programming goes, tools were developed to make it easier to get information from other sites. Except or anonymous ftp, you needed ac- counts on the other machines to get information from them. These was a great hindrance to the free flow of information. Many information services are now available over the internet. These have names (often acronyms) like WAIS, GOPHER, and World Wide Web. These are meant to be user friendly to allow the Internet to be navigated by someone with a minimum of computer skills. They set up new mechanisms to make information available to people who didn't have accounts on the machine. Currently the most popular system is the Word Wide Web. It is way of organizing information in computer files in a standard way and including references to other sources of information. Information on the World Wide Web can be accessed by a number of programs specifically designed to make this process a painless as possible. Frequently used program are Mosaic, netscape, and lynx. The first two require a high quality display system with good graphics. Lynx makes it practical to access web information on systems with no or minimal graphics. These programs use a feature called hyper-text to allow you to follow the references from one information source to another. In hyper-text, cer- tain words are high-lighted (by underlines or different colors). By mov- ing your cursor to such a word with your mouse and clicking, you go to another ``page'' associated with that word. Most other pages have other highlighted words. There are usually other words in boxes that you can click on to back up, quit, etc. Invoke mosaic by typing Mosaic 5. Problems in Paradise (C) The time has come to discuss a more sober side of communications. Ease in communications brings with it the increased chance that security or confidentiality may be compromised. For our purposes, security involves the integrity of the message. Having the message altered or destroyed is a breach of security. Confidentiality involves someone unauthorized gaining access to the information. No harm to the information is im- plied. Of course, in most cases we desire both security and confiden- tiality. To be fair, other means of communication have the same problems. The U.S. Mail is relatively secure only because the U.S. Postal service is willing to put a lot of police power behind insuring the security of your mail. There is, in fact, very little that keeps you from getting the mail out of someone's mail box at the side of the road. The threat of severe penalties if you get caught deters most people. The telephone system has similar potential for easy abuse and fairly severe penalties if the abuser is apprehended. Cellular phone use and satellite long dis- tance circuits make the phone a little less secure than it may have once been, but it is still considered fairly safe. The problems related to computer communications are twofold. One, there are, unfortunately, a fair number of computer users who consider breach- ing computer security a game or a challenge. Not enough persons have been caught and punished. The laws are there, but enforcement is some- times difficult. The other is the nature of bus oriented systems like Ethernet. The communications channel is shared. Proper software does not allow you to eavesdrop. However, in many cases it is a simple matter to get or write rogue software that lets you ``listen in'' on everything anyone else on the local Ethernet segment is sending. Since the hardware is ``required'' to listen for proper operation, there is almost no way, short of actually physically observing the perpetrator, of detecting that eavesdropping is taking place. The only practical way to ensure privacy is through encryption. In the simplest system, the sender of the message encrypts the message using a secret key or password. The receiver of the message must then supply that key to decode the message on the target machine. The problem is that the keys must somehow be shared between sender and receiver. Since the computer communications channel is assumed to be compromised, they must be exchanged by some other method such as telephone or mail (assum- ing that it is secure!). A related problem is that of password security. The computer network al- lows you to log into a remote computer if you have the proper password. If someone (let's call this person the cracker ) is eavesdropping, he/she can ``watch'' what password you type. At some other time, the cracker can then log into that remote machines, pretending to be you, (this is called a ``replay'' attack) and do whatever you could have done---use computer time, read files, delete files, etc. The cracker can then send mail, pretending to be you. Password security is an important problem. Complicated schemes, using encryption, have been devised to avoid having your passwords go over the network in a readable form. The password problem is a special case of a more general problem --- that of authentication. For example, the university registrar gets a message from an instructor stating the John Smith's grade for the course was in- correct and should be changed from a C to an A. How does the registrar know that the message came from the instructor? Is it a forgery? A stock broker receives a message to buy or sell stocks. Is it genuine? Having special passwords or conventions may not be sufficient since some- one could have been eavesdropping the last time passwords were exchanged or conventions discussed. Possible Solutions One widely used solution the the password problem is in use. It was developed at MIT in the Athena Project and is called Kerberos (after the mythical two headed dog guarding the entrance to Hades). By having a physically secure arbiter of passwords (the Kerberos server) you can have your computer send essentially encrypted versions of your password to the remote machine. (More accurately, the Kerberos server sends a ``shared secret'' encrypted by your password, to the machine you are attempting to log in on. If the password you type in can decrypt this ``shared secret'', your password is assumed to be correct. Your password is thus never sent across the network while logging in.) To completely defeat the ``replay attack'', this encryption involves the time of day, so that just replaying a previously encrypted message recorded from eavesdropping will not work since the correct encrypted message will changes with time. It does require that all of the computers in the network be synchronized and agree on the time of day (within certain tolerances). Public Key Encryption One of several more general solutions to the general confidentiality problem is public key encryption. This depends on the discovery of encryption- decryption techniques that use one key to encrypt, a dif- ferent key the decrypt. It is like having one key to lock your house, a different one to unlock it. In this scheme the locking key is published. If this ever catches on in a big way, you could imagine the equivalent of a telephone directory for you to look up someone's public key. (In prac- tice, since keys must be rather long to provide security, it will prob- ably have to be a computerized equivalent of a telephone book.) Then if you wanted to send a secret message to, say Joe Jones, you would use Joe's published key to encrypt the message. However, since only Joe has the other key, used for decryption, only Joe can read this message. If someone wanted to send a secret message to you, they would use your pub- lished key to encrypt the message. You would then use your secret de- cryption key to read the message. Free software to support this scheme has recently been published by MIT and is called PGP which stands for Pretty Good Privacy. There had been some legal action regarding this claiming copyright and patent infringe- ment. The latest version is legal (as long it is used within this coun- try and not exported) and free for non-commercial use.