📄 Page
1
(This page has no text content)
📄 Page
2
Build Your Own Redis with C/C++ Learn network programming and data structures James Smith 2023-01-31
📄 Page
3
Build Your Own Redis with C/C++ Build Your Own Redis with C/C++ 01. Introduction 02. Introduction to Sockets 03. Hello Server/Client 04. Protocol Parsing 05. The Event Loop and Nonblocking IO 06. The Event Loop Implementation 07. Basic Server: get, set, del 08. Data Structure: Hashtables 09. Data Serialization 10. The AVL Tree: Implementation & Testing 11. The AVL Tree and the Sorted Set 12. The Event Loop and Timers 13. The Heap Data Structure and the TTL
📄 Page
4
14. The Thread Pool & Asynchronous Tasks A1: Hints to Exercises
📄 Page
5
Build Your Own Redis with C/C++
📄 Page
6
01. Introduction What Is This Book About? This book contains a step-by-step walkthrough of a simple implementation of a Redis-like server. It is intended as a practical guide or tutorial to network programming and the implementation and application of basic data structures in C. What to Learn From This Book? Redis could be considered one of the building blocks of modern computing that stood the test of time. The knowledge required for building such a project is broader and deeper than usual application- level development. Learning from such
📄 Page
7
projects is a good way for software developers to level up their skills. Redis is a good target for learning because it covers two important subjects of software engineering: network programming and data structures. While there are many guides on socket APIs or high-level libraries, network programming is more than calling APIs or libraries. It is important to understand core concepts such as the event loop, protocols, timers, etc, which this book will cover. The lack of understanding can result in fatal mistakes even if you are just employing high-level networking libraries or frameworks in your applications. Although many people learned some basic data structures from textbooks, there is still something more to learn. Data structures implemented in real
📄 Page
8
projects often have some practical considerations which are not touched by textbooks. Learning how data structures are used in a non-toy environment (especially in C) is a unique experience from building Redis. Like most real-world projects, Redis is a complex project built with lots of effort, which can be hard to grasp for beginners. Instead, this book takes an opposite approach: learning by building things from scratch. Why From Scratch? A couple of points: To learn faster. By building things from scratch, concepts can be introduced gradually. Starting from the small, adding things incrementally, and getting the big picture in the end.
📄 Page
9
To learn deeper. While there are many materials explaining how an existing stuff works, the understanding obtained by reading those materials is often not the same as building the stuff yourself. It is easy to mistake memorization for understanding, and it’s easier to pick up unimportant details than principles and basics. To learn more. The “from scratch” approach forces you to touch every aspect of the subject — there are no shortcuts to knowledge! And often not every aspect is known to you beforehand, you may discover “things I don’t know I don’t know” in the process. Summarized in a quote from Feynman: “What I cannot create, I do not understand”.
📄 Page
10
How to Use This Book? This book follows a step-by-step approach. Each step builds on the previous one, adding a new concept. The full source code is provided on the web for reference purposes, readers are advised to tinker with it or DIY without it. The code is written as direct and straightforwardly as the author could. It’s mostly plain C with minimal C++ features. Don’t worry if you don’t know C, you just have the opportunity to do it in another language by yourself. The end result is a mini Redis alike with only about 1200 lines of code. 1200 LoC seems low, but it illustrates many important aspects the book attempts to cover. The techniques and approaches used in the book are not exactly the same as the real
📄 Page
11
Redis. Some are intentionally simplified, and some are chosen to illustrate a general topic. Readers can learn even more by comparing different approaches. The code used in this book is intended to run on Linux only, and can be downloaded at this URL: https://build-your-own.org/redis/src.tgz The contents and the source code of this book can be browsed online at: https://build-your-own.org
📄 Page
12
02. Introduction to Sockets This chapter is an introduction to socket programming. Readers are assumed to have basic knowledge of computer networking but no experience in network programming. This book does not contain every detail on how to use socket APIs, you are advised to read manpages and other network programming guides while learning from this book. (https://beej.us/ is a good source for socket APIs.) Redis is an example of the server/client system. Multiple clients connect to a single server, and the server receives requests from TCP connections and sends responses back. There are several Linux system calls
📄 Page
13
we need to learn before we can start socket programming. The socket() syscall returns an fd. Here is a rough explanation of “fd” if you are unfamiliar with Unix systems: An fd is an integer that refers to something in the Linux kernel, like a TCP connection, a disk file, a listening port, or some other resources, etc. The bind() and listen() syscall: the bind() associates an address to a socket fd, and the listen() enables us to accept connections to that address. The accept() takes a listening fd, when a client makes a connection to the listening address, the accept() returns an fd that represents the connection socket. Here is the pseudo-code that explains the typical workflow of a server:
📄 Page
14
The read() syscall receives data from a TCP connection. The write() syscall sends data. The close() syscall destroys the resource referred by the fd and recycles the fd number. We have introduced the syscalls needed for server-side network programming. For the client side, the connect() syscall takes a socket fd and address and makes a TCP connection to that address. Here is the pseudo-code for the client: fd = socket() bind(fd, address) listen(fd) while True: conn_fd = accept(fd) do_something_with(conn_fd) close(conn_fd) fd = socket() connect(fd, address) do_something_with(fd) close(fd)
📄 Page
15
The next chapter will help you get started using real code.
📄 Page
16
03. Hello Server/Client This chapter continues the introduction of socket programming. We’ll write 2 simple (incomplete and broken) programs to demonstrate the syscalls from the last chapter. The first program is a server, it accepts connections from clients, reads a single message, and writes a single reply. The second program is a client, it connects to the server, writes a single message, and reads a single reply. Let’s start with the server first. First, we need to obtain a socket fd: int fd = socket(AF_INET, SOCK_STREAM, 0); The AF_INET is for IPv4, use AF_INET6 for IPv6 or dual-stack socket. For simplicity,
📄 Page
17
we’ll just use AF_INET throughout this book. The SOCK_STREAM is for TCP. We won’t use anything other than TCP in this book. All the 3 parameters of the socket() call are fixed in this book. Next, we’ll introduce a new syscall: The setsockopt() call is used to configure various aspects of a socket. This particular call enables the SO_REUSEADDR option. Without this option, the server won’t able to bind to the same address if restarted. Exercise to reader: find out what exactly is SO_REUSEADDR and why it is needed. The next step is the bind() and listen(), we’ll bind on the wildcard address int val = 1; setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &val, sizeof(val));
📄 Page
18
0.0.0.0:1234: Loop for each connection and do something with them. // bind, this is the syntax that deals with IPv4 addresses struct sockaddr_in addr = {}; addr.sin_family = AF_INET; addr.sin_port = ntohs(1234); addr.sin_addr.s_addr = ntohl(0); // wildcard address 0.0.0.0 int rv = bind(fd, (const sockaddr *)&addr, sizeof(addr)); if (rv) { die("bind()"); } // listen rv = listen(fd, SOMAXCONN); if (rv) { die("listen()"); } while (true) { // accept struct sockaddr_in client_addr =
📄 Page
19
The do_something() function is simply read and write. {}; socklen_t socklen = sizeof(client_addr); int connfd = accept(fd, (struct sockaddr *)&client_addr, &socklen); if (connfd < 0) { continue; // error } do_something(connfd); close(connfd); } static void do_something(int connfd) { char rbuf[64] = {}; ssize_t n = read(connfd, rbuf, sizeof(rbuf) - 1); if (n < 0) { msg("read() error"); return; } printf("client says: %s\n", rbuf); char wbuf[] = "world";
📄 Page
20
Note that the read() and write() call returns the number of read or written bytes. A real programmer must deal with the return value of functions, but in this chapter, I have omitted lots of things for brevity. And the code in this chapter is not the correct way to do networking anyway. The client program is much simpler: write(connfd, wbuf, strlen(wbuf)); } int fd = socket(AF_INET, SOCK_STREAM, 0); if (fd < 0) { die("socket()"); } struct sockaddr_in addr = {}; addr.sin_family = AF_INET; addr.sin_port = ntohs(1234); addr.sin_addr.s_addr = ntohl(INADDR_LOOPBACK); // 127.0.0.1 int rv = connect(fd, (const struct sockaddr *)&addr, sizeof(addr));