📄 Page
2
ptg11539634 i TCP/IP Illustrated The Implementation Volume 2 W. Richard Stevens Gary R. Wright Addison-Wesley Professional
📄 Page
3
ptg11539634 Addison-Wesley Professional Computing Series Brian W. Kernighan, Consulting Editor Matthew H. Austern, Generic Programming and the STL: Using and Extending the C++ Standard Template Library David R. Butenhof, Programming with POSIX® Threads Brent Callaghan, NFS Illustrated Tom Cargill, C++ Programming Style William R. Cheswick/Steven M. Bellovin/Aviel D. Rubin, Firewalls and Internet Security, Second Edition: Repelling the Wily Hacker David A. Curry, UNIX® System Security: A Guide for Users and System Administrators Stephen C. Dewhurst, C++ Gotchas: Avoiding Common Problems in Coding and Design Dan Farmer/Wietse Venema, Forensic Discovery Erich Gamma/Richard Helm/Ralph Johnson/John Vlissides, Design Patterns: Elements of Reusable Object- Oriented Software Erich Gamma/Richard Helm/Ralph Johnson/John Vlissides, Design Patterns CD: Elements of Reusable Object- Oriented Software Peter Haggar, Practical Java™ Programming Language Guide David R. Hanson, C Interfaces and Implementations: Techniques for Creating Reusable Software Mark Harrison/Michael McLennan, Effective Tcl/Tk Programming: Writing Better Programs with Tcl and Tk Michi Henning/Steve Vinoski, Advanced CORBA® Programming with C++ Brian W. Kernighan/Rob Pike, The Practice of Programming S. Keshav, An Engineering Approach to Computer Networking: ATM Networks, the Internet, and the Telephone Network John Lakos, Large-Scale C++ Software Design Scott Meyers, Effective C++ CD: 85 Specific Ways to Improve Your Programs and Designs Scott Meyers, Effective C++, Third Edition: 55 Specific Ways to Improve Your Programs and Designs Scott Meyers, More Effective C++: 35 New Ways to Improve Your Programs and Designs Scott Meyers, Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library Robert B. Murray, C++ Strategies and Tactics David R. Musser/Gillmer J. Derge/Atul Saini, STL Tutorial and Reference Guide, Second Edition: C++ Programming with the Standard Template Library John K. Ousterhout, Tcl and the Tk Toolkit Craig Partridge, Gigabit Networking Radia Perlman, Interconnections, Second Edition: Bridges, Routers, Switches, and Internetworking Protocols Stephen A. Rago, UNIX® System V Network Programming Eric S. Raymond, The Art of UNIX Programming Marc J. Rochkind, Advanced UNIX Programming, Second Edition Curt Schimmel, UNIX® Systems for Modern Architectures: Symmetric Multiprocessing and Caching for Kernel Programmers W. Richard Stevens, TCP/IP Illustrated, Volume 1: The Protocols W. Richard Stevens, TCP/IP Illustrated, Volume 3: TCP for Transactions, HTTP, NNTP, and the UNIX® Domain Protocols W. Richard Stevens/Bill Fenner/Andrew M. Rudoff, UNIX Network Programming Volume 1, Third Edition: The Sockets Networking API W. Richard Stevens/Stephen A. Rago, Advanced Programming in the UNIX® Environment, Second Edition W. Richard Stevens/Gary R. Wright, TCP/IP Illustrated Volumes 1-3 Boxed Set John Viega/Gary McGraw, Building Secure Software: How to Avoid Security Problems the Right Way Gary R. Wright/W. Richard Stevens, TCP/IP Illustrated, Volume 2: The Implementation Ruixi Yuan/W. Timothy Strayer, Virtual Private Networks: Technologies and Solutions Visit www.awprofessional.com/series/professionalcomputing for more information about these titles. ii
📄 Page
4
ptg11539634 iii Table of Contents Copyright Preface Chapter 1. Introduction 1 Section 1.1. Introduction 1 Section 1.2. Source Code Presentation 1 Section 1.3. History 3 Section 1.4. Application Programming Interfaces 4 Section 1.5. Example Program 4 Section 1.6. System Calls and Library Functions 6 Section 1.7. Network Implementation Overview 8 Section 1.8. Descriptors 9 Section 1.9. Mbufs (Memory Buffers) and Output Processing 13 Section 1.10. Input Processing 18 Section 1.11. Network Implementation Overview Revisited 21 Section 1.12. Interrupt Levels and Concurrency 22 Section 1.13. Source Code Organization 25 Section 1.14. Test Network 26 Section 1.15. Summary 27 Chapter 2. Mbufs: Memory Buffers 29 Section 2.1. Introduction 29 Section 2.2. Code Introduction 33 Section 2.3. Mbuf Definitions 34 Section 2.4. mbuf Structure 35 Section 2.5. Simple Mbuf Macros and Functions 37 Section 2.6. m_devget and m_pullup Functions 41 Section 2.7. Summary of Mbuf Macros and Functions 48 Section 2.8. Summary of Net/3 Networking Data Structures 51 Section 2.9. m_copy and Cluster Reference Counts 53 Section 2.10. Alternatives 57 Section 2.11. Summary 57 Chapter 3. Interface Layer 59 Section 3.1. Introduction 59 Section 3.2. Code Introduction 59 Section 3.3. ifnet Structure 61 Section 3.4. ifaddr Structure 70 Section 3.5. sockaddr Structure 72 Section 3.6. ifnet and ifaddr Specialization 73 Section 3.7. Network Initialization Overview 75 Section 3.8. Ethernet Initialization 77 Section 3.9. SLIP Initialization 80 Section 3.10. Loopback Initialization 83 Section 3.11. if_attach Function 83 Section 3.12. ifinit Function 91 3.13 Summary 93 Chapter 4. Interfaces: Ethernet 94 Section 4.1. Introduction 94 Section 4.2. Code Introduction 95 Section 4.3. Ethernet Interface 98
📄 Page
5
ptg11539634 iv Section 4.4. ioctl System Call 115 Section 4.5. Summary 127 Chapter 5. Interfaces: SLIP and Loopback 128 Section 5.1. Introduction 128 Section 5.2. Code Introduction 128 Section 5.3. SLIP Interface 129 Section 5.4. Loopback Interface 149 Section 5.5. Summary 152 Chapter 6. IP Addressing 153 Section 6.1. Introduction 153 Section 6.2. Code Introduction 155 Section 6.3. Interface and Address Summary 155 Section 6.4. sockaddr_in Structure 157 Section 6.5. in_ifaddr Structure 158 Section 6.6. Address Assignment 159 Section 6.7. Interface ioctl Processing 176 Section 6.8. Internet Utility Functions 179 Section 6.9. ifnet Utility Functions 179 Section 6.10. Summary 180 Chapter 7. Domains and Protocols 182 Section 7.1. Introduction 182 Section 7.2. Code Introduction 182 Section 7.3. domain Structure 183 Section 7.4. protosw Structure 184 Section 7.5. IP domain and protosw Structures 187 Section 7.6. pffindproto and pffindtype Functions 193 Section 7.7. pfctlinput Function 194 Section 7.8. IP Initialization 195 Section 7.9. sysctl System Call 197 Section 7.10. Summary 200 Chapter 8. IP: Internet Protocol 202 Section 8.1. Introduction 202 Section 8.2. Code Introduction 203 Section 8.3. IP Packets 205 Section 8.4. Input Processing: ipintr Function 208 Section 8.5. Forwarding: ip_forward Function 216 Section 8.6. Output Processing: ip_output Function 224 Section 8.7. Internet Checksum: in_cksum Function 232 Section 8.8. setsockopt and getsockopt System Calls 236 Section 8.9. ip_sysctl Function 241 Section 8.10. Summary 242 Chapter 9. IP Option Processing 244 Section 9.1. Introduction 244 Section 9.2. Code Introduction 244 Section 9.3. Option Format 245 Section 9.4. ip_dooptions Function 246 Section 9.5. Record Route Option 249 Section 9.6. Source and Record Route Options 251
📄 Page
6
ptg11539634 v Section 9.7. Timestamp Option 258 Section 9.8. ip_insertoptions Function 262 Section 9.9. ip_pcbopts Function 266 Section 9.10. Limitations 270 Section 9.11. Summary 270 Chapter 10. IP Fragmentation and Reassembly 272 Section 10.1. Introduction 272 Section 10.2. Code Introduction 273 Section 10.3. Fragmentation 274 Section 10.4. ip_optcopy Function 279 Section 10.5. Reassembly 280 Section 10.6. ip_reass Function 283 Section 10.7. ip_slowtimo Function 296 Section 10.8. Summary 297 Chapter 11. ICMP: Internet Control Message Protocol 299 Section 11.1. Introduction 299 Section 11.2. Code Introduction 302 Section 11.3. icmp Structure 305 Section 11.4. ICMP protosw Structure 306 Section 11.5. Input Processing: icmp_input Function 307 Section 11.6. Error Processing 311 Section 11.7. Request Processing 314 Section 11.8. Redirect Processing 319 Section 11.9. Reply Processing 321 Section 11.10. Output Processing 322 Section 11.11. icmp_error Function 323 Section 11.12. icmp_reflect Function 327 Section 11.13. icmp_send Function 332 Section 11.14. icmp_sysctl Function 333 Section 11.15. Summary 334 Chapter 12. IP Multicasting 336 Section 12.1. Introduction 336 Section 12.2. Code Introduction 338 Section 12.3. Ethernet Multicast Addresses 339 Section 12.4. ether_multi Structure 340 Section 12.5. Ethernet Multicast Reception 342 Section 12.6. in_multi Structure 343 Section 12.7. ip_moptions Structure 345 Section 12.8. Multicast Socket Options 346 Section 12.9. Multicast TTL Values 347 Section 12.10. ip_setmoptions Function 349 Section 12.11. Joining an IP Multicast Group 354 Section 12.12. Leaving an IP Multicast Group 365 Section 12.13. ip_getmoptions Function 370 Section 12.14. Multicast Input Processing: ipintr Function 372 Section 12.15. Multicast Output Processing: ip_output Function 373 Section 12.16. Performance Considerations 378 Section 12.17. Summary 378 Chapter 13. IGMP: Internet Group Management Protocol 380 Section 13.1. Introduction 380 Section 13.2. Code Introduction 381
📄 Page
7
ptg11539634 vi Section 13.3. igmp Structure 382 Section 13.4. IGMP protosw Structure 383 Section 13.5. Joining a Group: igmp_joingroup Function 384 Section 13.6. igmp_fasttimo Function 386 Section 13.7. Input Processing: igmp_input Function 390 Section 13.8. Leaving a Group: igmp_leavegroup Function 394 Section 13.9. Summary 395 Chapter 14. IP Multicast Routing 396 Section 14.1. Introduction 396 Section 14.2. Code Introduction 396 Section 14.3. Multicast Output Processing Revisited 398 Section 14.4. mrouted Daemon 399 Section 14.5. Virtual Interfaces 402 Section 14.6. IGMP Revisited 410 Section 14.7. Multicast Routing 416 Section 14.8. Multicast Forwarding: ip_mforward Function 424 Section 14.9. Cleanup: ip_mrouter_done Function 434 Section 14.10. Summary 435 Chapter 15. Socket Layer 436 Section 15.1. Introduction 436 Section 15.2. Code Introduction 437 Section 15.3. socket Structure 437 Section 15.4. System Calls 443 Section 15.5. Processes, Descriptors, and Sockets 447 Section 15.6. socket System Call 448 Section 15.7. getsock and sockargs Functions 458 Section 15.8. bind System Call 460 Section 15.9. listen System Call 462 Section 15.10. tsleep and wakeup Functions 463 Section 15.11. accept System Call 465 Section 15.12. sonewconn and soisconnected Functions 469 Section 15.13. connect System call 472 Section 15.14. shutdown System Call 476 Section 15.15. close System Call 479 Section 15.16. Summary 482 Chapter 16. Socket I/O 484 Section 16.1. Introduction 484 Section 16.2. Code Introduction 484 Section 16.3. Socket Buffers 485 Section 16.4. write, writev, sendto, and sendmsg System Calls 489 Section 16.5. sendmsg System Call 492 Section 16.6. sendit Function 494 Section 16.7. sosend Function 498 Section 16.8. read, readv, recvfrom, and recvmsg System Calls 510 Section 16.9. recvmsg System Call 511 Section 16.10. recvit Function 513 Section 16.11. soreceive Function 515 Section 16.12. soreceive Code 520 Section 16.13. select System Call 522 Section 16.14. Summary 536 Chapter 17. Socket Options 550
📄 Page
8
ptg11539634 vii Section 17.1. Introduction 550 Section 17.2. Code Introduction 551 Section 17.3. setsockopt System Call 551 Section 17.4. getsockopt System Call 557 Section 17.5. fcntl and ioctl System Calls 561 Section 17.6. getsockname System Call 567 Section 17.7. getpeername System Call 568 Section 17.8. Summary 570 Chapter 18. Radix Tree Routing Tables 571 Section 18.1. Introduction 571 Section 18.2. Routing Table Structure 571 Section 18.3. Routing Sockets 580 Section 18.4. Code Introduction 581 Section 18.5. Radix Node Data Structures 584 Section 18.6. Routing Structures 589 Section 18.7. Initialization: route_init and rtable_init Functions 592 Section 18.8. Initialization: rn_init and rn_inithead Functions 596 Section 18.9. Duplicate Keys and Mask Lists 599 Section 18.10. rn_match Function 603 Section 18.11. rn_search Function 610 Section 18.12. Summary 611 Chapter 19. Routing Requests and Routing Messages 613 Section 19.1. Introduction 613 Section 19.2. rtalloc and rtalloc1 Functions 613 Section 19.3. RTFREE Macro and rtfree Function 616 Section 19.4. rtrequest Function 618 Section 19.5. rt_setgate Function 625 Section 19.6. rtinit Function 628 Section 19.7. rtredirect Function 613 Section 19.8. Routing Message Structures 635 Section 19.9. rt_missmsg Function 639 Section 19.10. rt_ifmsg Function 641 Section 19.11. rt_newaddrmsg Function 643 Section 19.12. rt_msg1 Function 645 Section 19.13. rt_msg2 Function 647 Section 19.14. sysctl_rtable Function 651 Section 19.15. sysctl_dumpentry Function 657 Section 19.16. sysctl_iflist Function 659 Section 19.17. Summary 661 Chapter 20. Routing Sockets 663 Section 20.1. Introduction 663 Section 20.2. routedomain and protosw Structures 663 Section 20.3. Routing Control Blocks 664 Section 20.4. raw_init Function 665 Section 20.5. route_output Function 666 Section 20.6. rt_xaddrs Function 681 Section 20.7. rt_setmetrics Function 681 Section 20.8. raw_input Function 682 Section 20.9. route_usrreq Function 684 Section 20.10. raw_usrreq Function 686 Section 20.11. raw_attach, raw_detach, and raw_disconnect Functions 691
📄 Page
9
ptg11539634 viii Section 20.12. Summary 693 Chapter 21. ARP: Address Resolution Protocol 695 Section 21.1. Introduction 695 Section 21.2. ARP and the Routing Table 695 Section 21.3. Code Introduction 697 Section 21.4. ARP Structures 700 Section 21.5. arpwhohas Function 702 Section 21.6. arprequest Function 703 Section 21.7. arpintr Function 706 Section 21.8. in_arpinput Function 707 Section 21.9. ARP Timer Functions 714 Section 21.10. arpresolve Function 715 Section 21.11. arplookup Function 720 Section 21.12. Proxy ARP 722 Section 21.13. arp_rtrequest Function 723 Section 21.14. ARP and Multicasting 730 Section 21.15. Summary 731 Chapter 22. Protocol Control Blocks 733 Section 22.1. Introduction 733 Section 22.2. Code Introduction 735 Section 22.3. inpcb Structure 736 Section 22.4. in_pcballoc and in_pcbdetach Functions 737 Section 22.5. Binding, Connecting, and Demultiplexing 739 Section 22.6. in_pcblookup Function 745 Section 22.7. in_pcbbind Function 749 Section 22.8. in_pcbconnect Function 756 Section 22.9. in_pcbdisconnect Function 762 Section 22.10. in_setsockaddr and in_setpeeraddr Functions 762 Section 22.11. in_pcbnotify, in_rtchange, and in_losing Functions 763 Section 22.12. Implementation Refinements 771 Section 22.13. Summary 772 Chapter 23. UDP: User Datagram Protocol 775 Section 23.1. Introduction 775 Section 23.2. Code Introduction 775 Section 23.3. UDP protosw Structure 778 Section 23.4. UDP Header 778 Section 23.5. udp_init Function 780 Section 23.6. udp_output Function 780 Section 23.7. udp_input Function 789 Section 23.8. udp_saveopt Function 801 Section 23.9. udp_ctlinput Function 803 Section 23.10. udp_usrreq Function 805 Section 23.11. udp_sysctl Function 812 Section 23.12. Implementation Refinements 812 Section 23.13. Summary 814 Chapter 24. TCP: Transmission Control Protocol 817 Section 24.1. Introduction 817 Section 24.2. Code Introduction 817 Section 24.3. TCP protosw Structure 821 Section 24.4. TCP Header 822 Section 24.5. TCP Control Block 824
📄 Page
10
ptg11539634 ix Section 24.6. TCP State Transition Diagram 826 Section 24.7. TCP Sequence Numbers 833 Section 24.8. tcp_init Function 828 Section 24.9. Summary 836 Chapter 25. TCP Timers 837 Section 25.1. Introduction 837 Section 25.2. Code Introduction 838 Section 25.3. tcp_canceltimers Function 840 Section 25.4. tcp_fasttimo Function 840 Section 25.5. tcp_slowtimo Function 841 Section 25.6. tcp_timers Function 843 Section 25.7. Retransmission Timer Calculations 850 Section 25.8. tcp_newtcpcb Function 852 Section 25.9. tcp_setpersist Function 854 Section 25.10. tcp_xmit_timer Function 856 Section 25.11. Retransmission Timeout: tcp_timers Function 862 Section 25.12. An RTT Example 868 Section 25.13. Summary 869 Chapter 26. TCP Output 871 Section 26.1. Introduction 871 Section 26.2. tcp_output Overview 871 Section 26.3. Determine if a Segment Should be Sent 873 Section 26.4. TCP Options 885 Section 26.5. Window Scale Option 886 Section 26.6. Timestamp Option 887 Section 26.7. Send a Segment 891 Section 26.8. tcp_template Function 907 Section 26.9. tcp_respond Function 909 Section 26.10. Summary 912 Chapter 27. TCP Functions 915 Section 27.1. Introduction 915 Section 27.2. tcp_drain Function 915 Section 27.3. tcp_drop Function 915 Section 27.4. tcp_close Function 917 Section 27.5. tcp_mss Function 921 Section 27.6. tcp_ctlinput Function 928 Section 27.7. tcp_notify Function 929 Section 27.8. tcp_quench Function 930 Section 27.9. TCP_REASS Macro and tcp_reass Function 931 Section 27.10. tcp_trace Function 941 Section 27.11. Summary 946 Chapter 28. TCP Input 947 Section 28.1. Introduction 949 Section 28.2. Preliminary Processing 949 Section 28.3. tcp_dooptions Function 958 Section 28.4. Header Prediction 961 Section 28.5. TCP Input: Slow Path Processing 967 Section 28.6. Initiation of Passive Open, Completion of Active Open 968 Section 28.7. PAWS: Protection Against Wrapped Sequence Numbers 978 Section 28.8. Trim Segment so Data is Within Window 981 Section 28.9. Self-Connects and Simultaneous Opens 988
📄 Page
11
ptg11539634 x Section 28.10. Record Timestamp 990 Section 28.11. RST Processing 991 Section 28.12. Summary 993 Chapter 29. TCP Input (Continued) 995 Section 29.1. Introduction 995 Section 29.2. ACK Processing Overview 995 Section 29.3. Completion of Passive Opens and Simultaneous Opens 996 Section 29.4. Fast Retransmit and Fast Recovery Algorithms 998 Section 29.5. ACK Processing 1003 Section 29.6. Update Window Information 1010 Section 29.7. Urgent Mode Processing 1012 Section 29.8. tcp_pulloutofband Function 1016 Section 29.9. Processing of Received Data 1018 Section 29.10. FIN Processing 1020 Section 29.11. Final Processing 1023 Section 29.12. Implementation Refinements 1026 Section 29.13. Header Compression 1026 Section 29.14. Summary 1035 Chapter 30. TCP User Requests 1037 Section 30.1. Introduction 1037 Section 30.2. tcp_usrreq Function 1037 Section 30.3. tcp_attach Function 1050 Section 30.4. tcp_disconnect Function 1051 Section 30.5. tcp_usrclosed Function 1052 Section 30.6. tcp_ctloutput Function 1054 Section 30.7. Summary 1058 Chapter 31. BPF: BSD Packet Filter 1059 Section 31.1. Introduction 1059 Section 31.2. Code Introduction 1059 Section 31.3. bpf_if Structure 1060 Section 31.4. bpf_d Structure 1065 Section 31.5. BPF Input 1073 Section 31.6. BPF Output 1079 Section 31.7. Summary 1081 Chapter 32. Raw IP 1082 Section 32.1. Introduction 1082 Section 32.2. Code Introduction 1082 Section 32.3. Raw IP protosw Structure 1084 Section 32.4. rip_init Function 1086 Section 32.5. rip_input Function 1086 Section 32.6. rip_output Function 1089 Section 32.7. rip_usrreq Function 1091 Section 32.8. rip_ctloutput Function 1096 Section 32.9. Summary 1098 Epilogue 1100 Solutions to Selected Exercises 1102 Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5
📄 Page
12
ptg11539634 xi Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13 Chapter 14 Chapter 15 Chapter 16 Chapter 17 Chapter 18 Chapter 19 Chapter 20 Chapter 21 Chapter 22 Chapter 23 Chapter 24 Chapter 25 Chapter 26 Chapter 27 Chapter 28 Chapter 29 Chapter 30 Chapter 31 Chapter 32 Source Code Availability 1127 URLs: Uniform Resource Locators 4.4BSD-Lite Operating Systems that Run the 4.4BSD-Lite Networking Software RFCs GNU Software PPP Software mrouted Software ISODE Software RFC 1122 Compliance 1129 Section C.1. Link-Layer Requirements Section C.2. IP Requirements Section C.3. IP Options Requirements Section C.4. IP Fragmentation and Reassembly Requirements Section C.5. ICMP Requirements Section C.6. Multicasting Requirements Section C.7. IGMP Requirements Section C.8. Routing Requirements Section C.9. ARP Requirements Section C.10. UDP Requirements Section C.11. TCP Requirements Bibliography 1157
📄 Page
13
ptg11539634 xii Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and we were aware of a trademark claim, the designations have been printed in initial capital letters or in all capitals. The programs and applications presented in this book have been included for their instructional value. They have been tested with care, but are not guaranteed for any particular purpose. The publisher does not offer any warranties or representations, nor does it accept any liabilities with respect to the programs or applications. The publisher offers discounts on this book when ordered in quantity for special sales. For more information please contact: Pearson Education Corporate Sales Division One Lake Street Upper Saddle River, NJ 07458 (800) 382-3419 corpsales@pearsontechgroup.com Visit AW on the Web: www.awl.com/cseng/ Library of Congress Cataloging-in-Publication Data (Revised for vol. 2) Stevens, W. Richard. TCP/IP illustrated. (Addison-Wesley professional computing series) Vol. 2 by Gary R. Wright, W. Richard Stevens. Includes bibliographical references and indexes. Contents: v. 1. The protocols – v.2. The implementation 1. TCP/IP (Computer network protocol) I Wright, Gary R.., II. Title. III. Series. TK5105.55.S74 1994 004.6'2 93–40000 ISBN 0-201-63346-9 (v.l) ISBN 0-201-63354-X (v.2) The BSD Daemon used on the cover of this book is reproduced with the permission of Marshall Kirk McKusick. Copyright © 1995 by Addison-Wesley. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording, or other-wise, without the prior consent of the publisher. Printed in the United States of America. Published simultaneously in Canada. Text printed on recycled and acid-free paper. 23 2425262728 CRW 09 08 07 23rd Printing January 2008 ISBN 0-201-63354-X
📄 Page
14
ptg11539634 xiii Dedication To my parents and my sister, for their love and support. —G.R.W. To my parents, for the gift of an education, and the example of a work ethic. —W.R.S.
📄 Page
15
ptg11539634 xiv Preface Introduction This book describes and presents the source code for the common reference implementation of TCP/IP: the implementation from the Computer Systems Research Group (CSRG) at the University of California at Berkeley. Historically this has been distributed with the 4.x BSD system (Berkeley Software Distribution). This implementation was first released in 1982 and has survived many significant changes, much fine tuning, and numerous ports to other Unix and non-Unix systems. This is not a toy implementation, but the foundation for TCP/IP implementations that are run daily on hundreds of thousands of systems worldwide. This implementation also provides router functionality, letting us show the differences between a host implementation of TCP/IP and a router. We describe the implementation and present the entire source code for the kernel implementation of TCP/IP, approximately 15,000 lines of C code. The version of the Berkeley code described in this text is the 4.4BSD-Lite release. This code was made publicly available in April 1994, and it contains numerous networking enhancements that were added to the 4.3BSD Tahoe release in 1988, the 4.3BSD Reno release in 1990, and the 4.4BSD release in 1993. (Appendix B describes how to obtain this source code.) The 4.4BSD release provides the latest TCP/IP features, such as multicasting and long fat pipe support (for high-bandwidth, long-delay paths). Figure 1.1 (p. 4) provides additional details of the various releases of the Berkeley networking code. This book is intended for anyone wishing to understand how the TCP/IP protocols are implemented: programmers writing network applications, system administrators responsible for maintaining computer systems and networks utilizing TCP/IP, and any programmer interested in understanding how a large body of nontrivial code fits into a real operating system.
📄 Page
16
ptg11539634 xv Organization of the Book The following figure shows the various protocols and subsystems that are covered. The italic numbers by each box indicate the chapters in which that topic is described. We take a bottom-up approach to the TCP/IP protocol suite, starting at the data-link layer, then the network layer (IP, ICMP, IGMP, IP routing, and multicast routing), followed by the socket layer, and finishing with the transport layer (UDP, TCP, and raw IP).
📄 Page
17
ptg11539634 xvi Intended Audience This book assumes a basic understanding of how the TCP/IP protocols work. Readers unfamiliar with TCP/IP should consult the first volume in this series, [Stevens 1994], for a thorough description of the TCP/IP protocol suite. This earlier volume is referred to throughout the current text as Volume 1. The current text also assumes a basic understanding of operating system principles. We describe the implementation of the protocols using a data-structures approach. That is, in addition to the source code presentation, each chapter contains pictures and descriptions of the data structures used and maintained by the source code. We show how these data structures fit into the other data structures used by TCP/IP and the kernel. Heavy use is made of diagrams throughout the text—there are over 250 diagrams. This data-structures approach allows readers to use the book in various ways. Those interested in all the implementation details can read the entire text from start to finish, following through all the source code. Others might want to understand how the protocols are implemented by understanding all the data structures and reading all the text, but not following through all the source code. We anticipate that many readers are interested in specific portions of the book and will want to go directly to those chapters. Therefore many forward and backward references are provided throughout the text, along with a thorough index, to allow individual chapters to be studied by themselves. The inside back covers contain an alphabetical cross-reference of all the functions and macros described in the book and the starting page number of the description. Exercises are provided at the end of the chapters; most solutions are in Appendix A to maximize the usefulness of the text as a self-study reference.
📄 Page
18
ptg11539634 xvii Source Code Copyright All of the source code presented in this book, other than Figures 1.2 and 8.27, is from the 4.4BSD-Lite distribution. This software is publicly available through many sources (Appendix B). All of this source code contains the following copyright notice. /* * Copyright (c) 1982, 1986, 1988, 1990, 1993, 1994 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by the University of * California, Berkeley and its contributors. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
📄 Page
19
ptg11539634 xviii * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */
📄 Page
20
ptg11539634 xix Acknowledgments We thank the technical reviewers who read the manuscript and provided important feedback on a tight timetable: Ragnvald Blindheim, Jon Crowcroft, Sally Floyd, Glen Glater, John Gulbenkian, Don Hering, Mukesh Kacker, Berry Kercheval, Brian W. Kernighan, Ulf Kieber, Mark Laubach, Steven McCanne, Craig Partridge, Vern Paxson, Steve Rago, Chakravardhi Ravi, Peter Salus, Doug Schmidt, Keith Sklower, Ian Lance Taylor, and G. N. Ananda Vardhana. A special thanks to the consulting editor, Brian Kernighan, for his rapid, thorough, and helpful reviews throughout the course of the project, and for his continued encouragement and support. Our thanks (again) to the National Optical Astronomy Observatories (NOAO), especially Sidney Wolff, Richard Wolff, and Steve Grandi, for providing access to their networks and hosts. Our thanks also to the U.C. Berkeley CSRG: Keith Bostic and Kirk McKusick provided access to the latest 4.4BSD system, and Keith Sklower provided the modifications to the 4.4BSD-Lite software to run under BSD/386 V1.1. G.R.W. wishes to thank John Wait, for several years of gentle prodding; Dave Schaller, for his encouragement; and Jim Hogue, for his support during the writing and production of this book. W.R.S. thanks his family, once again, for enduring another "small" book project. Thank you Sally, Bill, Ellen, and David. The hardwork, professionalism, and support of the team at Addison-Wesley has made the authors' job that much easier. In particular, we wish to thank John Wait for his guidance and Kim Dawley for her creative ideas. Camera-ready copy of the book was produced by the authors. It is only fitting that a book describing an industrial-strength software system be produced with an industrial-strength text processing system. Therefore one of the authors chose to use the Groff package written by James Clark, and the other author agreed begrudgingly. We welcome electronic mail from any readers with comments, suggestions, or bug fixes: tcpipiv2- book@aw.com. Each author will gladly blame the other for any remaining errors. Gary R. Wright W. Richard Stevens http://www.connix.com/~gwright http://www.kohala.com/~rstevens Middletown, Connecticut Tucson, Arizona November 1994