HTTP Made Really Easy

A Practical Guide to Writing Clients and Servers

Home > Web Technology Made Really Easy > HTTP Made Really Easy

December 10, 2012-- Updated the links about robots.

HTTP is the network protocol of the Web. It is both simple and powerful. Knowing HTTP enables you to write Web browsers, Web servers, automatic page downloaders, link-checkers, and other useful tools.

This tutorial explains the simple, English-based structure of HTTP communication, and teaches you the practical details of writing HTTP clients and servers. It assumes you know basic socket programming. HTTP is simple enough for a beginning sockets programmer, so this page might be a good followup to a sockets tutorial. This Sockets FAQ focuses on C, but the underlying concepts are language-independent.

Since you're reading this, you probably already use CGI. If not, it makes sense to learn that first.

The whole tutorial is about 15 printed pages long, including examples. The first half explains basic HTTP 1.0, and the second half explains the new requirements and features of HTTP 1.1. This tutorial doesn't cover everything about HTTP; it explains the basic framework, how to comply with the requirements, and where to find out more when you need it. If you plan to use HTTP extensively, you should read the specification as well-- see the end of this document for more details.

Before getting started, understand the following two paragraphs:

<LECTURE>

Writing HTTP or other network programs requires more care than programming for a single machine. Of course, you have to follow standards, or no one will understand you. But even more important is the burden you place on other machines. Write a bad program for your own machine, and you waste your own resources (CPU time, bandwidth, memory). Write a bad network program, and you waste other people's resources. Write a really bad network program, and you waste many thousands of people's resources at the same time. Sloppy and malicious network programming forces network standards to be modified, made safer but less efficient. So be careful, respectful, and cooperative, for everyone's sake.
In particular, don't be tempted to write programs that automatically follow Web links (called robots or spiders) before you really know what you're doing. They can be useful, but a badly-written robot is one of the worst kinds of programs on the Web, blindly following a rapidly increasing number of links and quickly draining server resources. If you plan to write anything like a robot, please read more about them. There may already be a working program to do what you want. If you really need to write your own, please support the robots.txt de-facto standard.

</LECTURE>

OK, enough of that. Let's get started.

Top of Page

Several related topics are discussed on a "footnotes" page:

What is HTTP?

HTTP stands for Hypertext Transfer Protocol. It's the network protocol used to deliver virtually all files and other data (collectively called resources) on the World Wide Web, whether they're HTML files, image files, query results, or anything else. Usually, HTTP takes place through TCP/IP sockets (and this tutorial ignores other possibilities).

A browser is an HTTP client because it sends requests to an HTTP server (Web server), which then sends responses back to the client. The standard (and default) port for HTTP servers to listen on is 80, though they can use any port.

What are "Resources"?

HTTP is used to transmit resources, not just files. A resource is some chunk of information that can be identified by a URL (it's the R in URL). The most common kind of resource is a file, but a resource may also be a dynamically-generated query result, the output of a CGI script, a document that is available in several languages, or something else.

While learning HTTP, it may help to think of a resource as similar to a file, but more general. As a practical matter, almost all HTTP resources are currently either files or server-side script output.

HTTP Made Really Easy

A Practical Guide to Writing Clients and Servers

Table of Contents

Using HTTP 1.0

Upgrading to HTTP 1.1

Appendix

What is HTTP?

What are "Resources"?

Structure of HTTP Transactions

Initial Request Line

Initial Response Line (Status Line)

Header Lines

The Message Body

Sample HTTP Exchange

Other HTTP Methods, Like HEAD and POST

The HEAD Method

The POST Method

HTTP Proxies

Being Tolerant of Others

Conclusion

HTTP 1.1

HTTP 1.1 Clients

Host: Header

Chunked Transfer-Encoding

Persistent Connections and the "Connection: close" Header

The "100 Continue" Response

HTTP 1.1 Servers

Requiring the Host: Header

Accepting Absolute URL's

Chunked Transfer-Encoding

Persistent Connections and the "Connection: close" Header

Using the "100 Continue" Response

The Date: Header

Handling Requests with If-Modified-Since: or If-Unmodified-Since: Headers

Supporting the GET and HEAD methods

Supporting HTTP 1.0 Requests

The HTTP Specification