Support Statistics
¥.00 ·
0times
Text Preview (First 20 pages)
Registered users can read the full content for free
Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.
Page
1
(This page has no text content)
Page
2
(This page has no text content)
Page
3
HTTP The Definitive Guide
Page
4
(This page has no text content)
Page
5
HTTP The Definitive Guide David Gourley and Brian Totty with Marjorie Sayer, Sailu Reddy, and Anshu Aggarwal Beijing • Cambridge • Farnham • Köln • Paris • Sebastopol • Taipei • Tokyo
Page
6
HTTP: The Definitive Guide by David Gourley and Brian Totty with Marjorie Sayer, Sailu Reddy, and Anshu Aggarwal Copyright © 2002 O’Reilly Media, Inc. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly Media, Inc. books may be purchased for educational, business, or sales promotional use. On- line editions are also available for most titles (safari.oreilly.com). For more information, contact our cor- porate/institutional sales department: (800) 998-9938 or corporate@oreilly.com. Editor: Linda Mui Production Editor: Rachel Wheeler Cover Designer: Ellie Volckhausen Interior Designers: David Futato and Melanie Wang Printing History: September 2002: First Edition. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. HTTP: The Definitive Guide, the image of a thirteen-lined ground squirrel, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. This book uses RepKover™, a durable and flexible lay-flat binding. ISBN-10: 1-56592-509-2 ISBN-13: 978-1-56592-509-0 [C] [01/08]
Page
7
v Table of Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Part I. HTTP: The Web’s Foundation 1. Overview of HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 HTTP: The Internet’s Multimedia Courier 3 Web Clients and Servers 4 Resources 4 Transactions 8 Messages 10 Connections 11 Protocol Versions 16 Architectural Components of the Web 17 The End of the Beginning 21 For More Information 21 2. URLs and Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Navigating the Internet’s Resources 24 URL Syntax 26 URL Shortcuts 30 Shady Characters 35 A Sea of Schemes 38 The Future 40 For More Information 41 3. HTTP Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 The Flow of Messages 43 The Parts of a Message 44
Page
8
vi | Table of Contents Methods 53 Status Codes 59 Headers 67 For More Information 73 4. Connection Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 TCP Connections 74 TCP Performance Considerations 80 HTTP Connection Handling 86 Parallel Connections 88 Persistent Connections 90 Pipelined Connections 99 The Mysteries of Connection Close 101 For More Information 104 Part II. HTTP Architecture 5. Web Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Web Servers Come in All Shapes and Sizes 109 A Minimal Perl Web Server 111 What Real Web Servers Do 113 Step 1: Accepting Client Connections 115 Step 2: Receiving Request Messages 116 Step 3: Processing Requests 120 Step 4: Mapping and Accessing Resources 120 Step 5: Building Responses 125 Step 6: Sending Responses 127 Step 7: Logging 127 For More Information 127 6. Proxies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Web Intermediaries 129 Why Use Proxies? 131 Where Do Proxies Go? 137 Client Proxy Settings 141 Tricky Things About Proxy Requests 144 Tracing Messages 150 Proxy Authentication 156
Page
9
Table of Contents | vii Proxy Interoperation 157 For More Information 160 7. Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Redundant Data Transfers 161 Bandwidth Bottlenecks 161 Flash Crowds 163 Distance Delays 163 Hits and Misses 164 Cache Topologies 168 Cache Processing Steps 171 Keeping Copies Fresh 175 Controlling Cachability 182 Setting Cache Controls 186 Detailed Algorithms 187 Caches and Advertising 194 For More Information 196 8. Integration Points: Gateways, Tunnels, and Relays . . . . . . . . . . . . . . . . . . . . 197 Gateways 197 Protocol Gateways 200 Resource Gateways 203 Application Interfaces and Web Services 205 Tunnels 206 Relays 212 For More Information 213 9. Web Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Crawlers and Crawling 215 Robotic HTTP 225 Misbehaving Robots 228 Excluding Robots 229 Robot Etiquette 239 Search Engines 242 For More Information 246 10. HTTP-NG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 HTTP’s Growing Pains 247 HTTP-NG Activity 248
Page
10
viii | Table of Contents Modularize and Enhance 248 Distributed Objects 249 Layer 1: Messaging 250 Layer 2: Remote Invocation 250 Layer 3: Web Application 251 WebMUX 251 Binary Wire Protocol 252 Current Status 252 For More Information 253 Part III. Identification, Authorization, and Security 11. Client Identification and Cookies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 The Personal Touch 257 HTTP Headers 258 Client IP Address 259 User Login 260 Fat URLs 262 Cookies 263 For More Information 276 12. Basic Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 Authentication 277 Basic Authentication 281 The Security Flaws of Basic Authentication 283 For More Information 285 13. Digest Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 The Improvements of Digest Authentication 286 Digest Calculations 291 Quality of Protection Enhancements 299 Practical Considerations 300 Security Considerations 303 For More Information 306 14. Secure HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 Making HTTP Safe 307 Digital Cryptography 309
Page
11
Table of Contents | ix Symmetric-Key Cryptography 313 Public-Key Cryptography 315 Digital Signatures 317 Digital Certificates 319 HTTPS: The Details 322 A Real HTTPS Client 328 Tunneling Secure Traffic Through Proxies 335 For More Information 336 Part IV. Entities, Encodings, and Internationalization 15. Entities and Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 Messages Are Crates, Entities Are Cargo 342 Content-Length: The Entity’s Size 344 Entity Digests 347 Media Type and Charset 348 Content Encoding 351 Transfer Encoding and Chunked Encoding 354 Time-Varying Instances 359 Validators and Freshness 360 Range Requests 363 Delta Encoding 365 For More Information 369 16. Internationalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 HTTP Support for International Content 370 Character Sets and HTTP 371 Multilingual Character Encoding Primer 376 Language Tags and HTTP 384 Internationalized URIs 389 Other Considerations 392 For More Information 392 17. Content Negotiation and Transcoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 Content-Negotiation Techniques 395 Client-Driven Negotiation 396 Server-Driven Negotiation 397 Transparent Negotiation 400
Page
12
x | Table of Contents Transcoding 403 Next Steps 405 For More Information 406 Part V. Content Publishing and Distribution 18. Web Hosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 Hosting Services 411 Virtual Hosting 413 Making Web Sites Reliable 419 Making Web Sites Fast 422 For More Information 423 19. Publishing Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424 FrontPage Server Extensions for Publishing Support 424 WebDAV and Collaborative Authoring 429 For More Information 446 20. Redirection and Load Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448 Why Redirect? 449 Where to Redirect 449 Overview of Redirection Protocols 450 General Redirection Methods 452 Proxy Redirection Methods 462 Cache Redirection Methods 469 Internet Cache Protocol 473 Cache Array Routing Protocol 475 Hyper Text Caching Protocol 478 For More Information 481 21. Logging and Usage Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483 What to Log? 483 Log Formats 484 Hit Metering 492 A Word on Privacy 495 For More Information 495
Page
13
Table of Contents | xi Part VI. Appendixes A. URI Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 B. HTTP Status Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505 C. HTTP Header Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508 D. MIME Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 E. Base-64 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570 F. Digest Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574 G. Language Tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581 H. MIME Charset Registry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617
Page
14
(This page has no text content)
Page
15
xiii Preface The Hypertext Transfer Protocol (HTTP) is the protocol programs use to communi- cate over the World Wide Web. There are many applications of HTTP, but HTTP is most famous for two-way conversation between web browsers and web servers. HTTP began as a simple protocol, so you might think there really isn’t that much to say about it. And yet here you stand, with a two-pound book in your hands. If you’re wondering how we could have written 650 pages on HTTP, take a look at the Table of Contents. This book isn’t just an HTTP header reference manual; it’s a veritable bible of web architecture. In this book, we try to tease apart HTTP’s interrelated and often misunderstood rules, and we offer you a series of topic-based chapters that explain all the aspects of HTTP. Throughout the book, we are careful to explain the “why” of HTTP, not just the “how.” And to save you time chasing references, we explain many of the critical non-HTTP technologies that are required to make HTTP applications work. You can find the alphabetical header reference (which forms the basis of most conventional HTTP texts) in a conveniently organized appendix. We hope this conceptual design makes it easy for you to work with HTTP. This book is written for anyone who wants to understand HTTP and the underlying architecture of the Web. Software and hardware engineers can use this book as a coherent reference for HTTP and related web technologies. Systems architects and network administrators can use this book to better understand how to design, deploy, and manage complicated web architectures. Performance engineers and ana- lysts can benefit from the sections on caching and performance optimization. Mar- keting and consulting professionals will be able to use the conceptual orientation to better understand the landscape of web technologies. This book illustrates common misconceptions, advises on “tricks of the trade,” pro- vides convenient reference material, and serves as a readable introduction to dry and confusing standards specifications. In a single book, we detail the essential and inter- related technologies that make the Web work.
Page
16
xiv | Preface This book is the result of a tremendous amount of work by many people who share an enthusiasm for Internet technologies. We hope you find it useful. Running Example: Joe’s Hardware Store Many of our chapters include a running example of a hypothetical online hardware and home-improvement store called “Joe’s Hardware” to demonstrate technology concepts. We have set up a real web site for the store (http://www.joes-hardware. com) for you to test some of the examples in the book. We will maintain this web site while this book remains in print. Chapter-by-Chapter Guide This book contains 21 chapters, divided into 5 logical parts (each with a technology theme), and 8 useful appendixes containing reference data and surveys of related technologies: Part I, HTTP: The Web’s Foundation Part II, HTTP Architecture Part III, Identification, Authorization, and Security Part IV, Entities, Encodings, and Internationalization Part V, Content Publishing and Distribution Part VI, Appendixes Part I, HTTP: The Web’s Foundation, describes the core technology of HTTP, the foundation of the Web, in four chapters: • Chapter 1, Overview of HTTP, is a rapid-paced overview of HTTP. • Chapter 2, URLs and Resources, details the formats of uniform resource locators (URLs) and the various types of resources that URLs name across the Internet. It also outlines the evolution to uniform resource names (URNs). • Chapter 3, HTTP Messages, details how HTTP messages transport web content. • Chapter 4, Connection Management, explains the commonly misunderstood and poorly documented rules and behavior for managing HTTP connections. Part II, HTTP Architecture, highlights the HTTP server, proxy, cache, gateway, and robot applications that are the architectural building blocks of web systems. (Web browsers are another building block, of course, but browsers already were covered thoroughly in Part I of the book.) Part II contains the following six chapters: • Chapter 5, Web Servers, gives an overview of web server architectures. • Chapter 6, Proxies, explores HTTP proxy servers, which are intermediary serv- ers that act as platforms for HTTP services and controls. • Chapter 7, Caching, delves into the science of web caches—devices that improve performance and reduce traffic by making local copies of popular documents.
Page
17
Preface | xv • Chapter 8, Integration Points: Gateways, Tunnels, and Relays, explains gateways and application servers that allow HTTP to work with software that speaks dif- ferent protocols, including Secure Sockets Layer (SSL) encrypted protocols. • Chapter 9, Web Robots, describes the various types of clients that pervade the Web, including the ubiquitous browsers, robots and spiders, and search engines. • Chapter 10, HTTP-NG, talks about HTTP developments still in the works: the HTTP-NG protocol. Part III, Identification, Authorization, and Security, presents a suite of techniques and technologies to track identity, enforce security, and control access to content. It con- tains the following four chapters: • Chapter 11, Client Identification and Cookies, talks about techniques to identify users so that content can be personalized to the user audience. • Chapter 12, Basic Authentication, highlights the basic mechanisms to verify user identity. The chapter also examines how HTTP authentication interfaces with databases. • Chapter 13, Digest Authentication, explains digest authentication, a complex proposed enhancement to HTTP that provides significantly enhanced security. • Chapter 14, Secure HTTP, is a detailed overview of Internet cryptography, digi- tal certificates, and SSL. Part IV, Entities, Encodings, and Internationalization, focuses on the bodies of HTTP messages (which contain the actual web content) and on the web standards that describe and manipulate content stored in the message bodies. Part IV contains three chapters: • Chapter 15, Entities and Encodings, describes the structure of HTTP content. • Chapter 16, Internationalization, surveys the web standards that allow users around the globe to exchange content in different languages and character sets. • Chapter 17, Content Negotiation and Transcoding, explains mechanisms for negotiating acceptable content. Part V, Content Publishing and Distribution, discusses the technology for publishing and disseminating web content. It contains four chapters: • Chapter 18, Web Hosting, discusses the ways people deploy servers in modern web hosting environments and HTTP support for virtual web hosting. • Chapter 19, Publishing Systems, discusses the technologies for creating web con- tent and installing it onto web servers. • Chapter 20, Redirection and Load Balancing, surveys the tools and techniques for distributing incoming web traffic among a collection of servers. • Chapter 21, Logging and Usage Tracking, covers log formats and common questions.
Page
18
xvi | Preface Part VI, Appendixes, contains helpful reference appendixes and tutorials in related technologies: • Appendix A, URI Schemes, summarizes the protocols supported through uni- form resource identifier (URI) schemes. • Appendix B, HTTP Status Codes, conveniently lists the HTTP response codes. • Appendix C, HTTP Header Reference, provides a reference list of HTTP header fields. • Appendix D, MIME Types, provides an extensive list of MIME types and explains how MIME types are registered. • Appendix E, Base-64 Encoding, explains base-64 encoding, used by HTTP authentication. • Appendix F, Digest Authentication, gives details on how to implement various authentication schemes in HTTP. • Appendix G, Language Tags, defines language tag values for HTTP language headers. • Appendix H, MIME Charset Registry, provides a detailed list of character encod- ings, used for HTTP internationalization support. Each chapter contains many examples and pointers to additional reference material. Typographic Conventions In this book, we use the following typographic conventions: Italic Used for URLs, C functions, command names, MIME types, new terms where they are defined, and emphasis Constant width Used for computer output, code, and any literal text Constant width bold Used for user input Comments and Questions Please address comments and questions concerning this book to the publisher: O’Reilly & Associates, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 (800) 998-9938 (in the United States or Canada) (707) 829-0515 (international/local) (707) 829-0104 (fax)
Page
19
Preface | xvii There is a web page for this book, which lists errata, examples, or any additional information. You can access this page at: http://www.oreilly.com/catalog/httptdg/ To comment or ask technical questions about this book, send email to: bookquestions@oreilly.com For more information about books, conferences, Resource Centers, and the O’Reilly Network, see the O’Reilly web site at: http://www.oreilly.com Acknowledgments This book is the labor of many. The five authors would like to hold up a few people in thanks for their significant contributions to this project. To start, we’d like to thank Linda Mui, our editor at O’Reilly. Linda first met with David and Brian way back in 1996, and she refined and steered several concepts into the book you hold today. Linda also helped keep our wandering gang of first-time book authors moving in a coherent direction and on a progressing (if not rapid) time- line. Most of all, Linda gave us the chance to create this book. We’re very grateful. We’d also like to thank several tremendously bright, knowledgeable, and kind souls who devoted noteworthy energy to reviewing, commenting on, and correcting drafts of this book. These include Tony Bourke, Sean Burke, Mike Chowla, Shernaz Daver, Fred Douglis, Paula Ferguson, Vikas Jha, Yves Lafon, Peter Mattis, Chuck Neer- daels, Luis Tavera, Duane Wessels, Dave Wu, and Marco Zagha. Their viewpoints and suggestions have improved the book tremendously. Rob Romano from O’Reilly created most of the amazing artwork you’ll find in this book. The book contains an unusually large number of detailed illustrations that make subtle concepts very clear. Many of these illustrations were painstakingly cre- ated and revised numerous times. If a picture is worth a thousand words, Rob added hundreds of pages of value to this book. Brian would like to personally thank all of the authors for their dedication to this project. A tremendous amount of time was invested by the authors in a challenge to make the first detailed but accessible treatment of HTTP. Weddings, childbirths, killer work projects, startup companies, and graduate schools intervened, but the authors held together to bring this project to a successful completion. We believe the result is worthy of everyone’s hard work and, most importantly, that it provides a valuable service. Brian also would like to thank the employees of Inktomi for their enthusiasm and support and for their deep insights about the use of HTTP in real- world applications. Also, thanks to the fine folks at Cajun-shop.com for allowing us to use their site for some of the examples in this book.
Page
20
xviii | Preface David would like to thank his family, particularly his mother and grandfather for their ongoing support. He’d like to thank those that have put up with his erratic schedule over the years writing the book. He’d also like to thank Slurp, Orctomi, and Norma for everything they’ve done, and his fellow authors for all their hard work. Finally, he would like to thank Brian for roping him into yet another adventure. Marjorie would like to thank her husband, Alan Liu, for technical insight, familial support and understanding. Marjorie thanks her fellow authors for many insights and inspirations. She is grateful for the experience of working together on this book. Sailu would like to thank David and Brian for the opportunity to work on this book, and Chuck Neerdaels for introducing him to HTTP. Anshu would like to thank his wife, Rashi, and his parents for their patience, sup- port, and encouragement during the long years spent writing this book. Finally, the authors collectively thank the famous and nameless Internet pioneers, whose research, development, and evangelism over the past four decades contrib- uted so much to our scientific, social, and economic community. Without these labors, there would be no subject for this book.
The above is a preview of the first 20 pages. Register to read the complete e-book.
Comments 0
Loading comments...
Reply to Comment
Edit Comment