Postgres The First Experience (Pavel Luzanov, Egor Rogov, Igor Levshin) (Z-Library)

(This page has no text content)

Introduction We have written this small book for those who only start getting acquainted with the world of PostgreSQL. From this book, you will learn: I PostgreSQL — what is it all about? . . . . . . . . . . . . . . . . . . . . 3 II What’s new in PostgreSQL 15 . . . . . . . . . . . . . . . . . . . . . . . . . 15 III Installation on Linux and Windows . . . . . . . . . . . . . . . . . . 23 IV Connecting to a server, writing SQL queries, and using transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 V Learning the SQL language on a demo database . . . .59 VI Using PostgreSQL with your application . . . . . . . . . . . . . . 87 VII Minimal server setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .101 VIII About a useful pgAdmin application . . . . . . . . . . . . . . . .109 IX Advanced features: full-text search, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 JSON format, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 foreign data wrappers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 X Education and certification opportunities . . . . . . . . . . 145 XI Keeping up with all updates . . . . . . . . . . . . . . . . . . . . . . . . .167 XII About the Postgres Professional company . . . . . . . . . .171 We hope that our book will make your first experience with PostgreSQL more pleasant and help you blend into the Post- greSQL community. Good luck!

(This page has no text content)

I About PostgreSQL PostgreSQL is the most feature-rich free open-source data- base system. Originally developed in the academic environ- ment, it has managed to bring together a wide developer community through its long history. Nowadays, PostgreSQL offers everything that most customers need, and it is actively used all over the world to create high-load business-critical systems. Some History Modern PostgreSQL originates from the POSTGRES project, which was led by Michael Stonebraker, professor of the University of California, Berkeley. Before this work, Michael Stonebraker had beenmanaging INGRES development; it was one of the first relational database systems, and POSTGRES appeared as the result of rethinking all the previous work and the desire to overcome the limitations of its rigid type system. The project was started in 1985, and by 1988 a number of scientific articles had been published that described the data model, POSTQUEL query language (SQL was not an accepted standard at the time), and data storage structure.

4 i POSTGRES is sometimes considered to be a so-called post- relational database system. The relational model had always been criticized for its restrictions, even though they were the flip side of its strictness and simplicity. As computer technologies were spreading in all spheres of life, new types of applications started to appear, and databases had to support custom data types and such features as inheritance or creating and managing complex objects. The first version of this database system appeared in 1989. It was being improved and enhanced for several years, but in 1993, when version 4.2 was released, the project was shut down. However, despite its official cancellation, UC Berkeley alumni Andrew Yu and Jolly Chen revived the project and resumed its development in 1994, taking advantage of its liberal BSD license and open source. They replaced POSTQUEL query language with SQL, which had become a generally accepted standard by that time. The project was renamed to Postgres95. In 1996, it became obvious that the Postgres95 name would not stand the test of time, and a new name was selected: PostgreSQL. This name reflects the connection both with the original POSTGRES project and the SQL adoption. This name may be quite hard to articulate, but nevertheless, we should pronounce it as “Post-Gres-Q-L” or simply “postgres,” but not as “postgre.” The first PostgreSQL release had version 6.0, keeping the original numbering scheme. The project grew, and its man- agement was taken over by at first a small group of active users and developers, which was named PostgreSQL Global Development Group.

5 i Development All the main decisions about developing and releasing new PostgreSQL versions are taken by the Core team, which consists of seven people at the moment. Apart from developers who contribute to the project from time to time, there is a group of main developers who have made significant contributions to PostgreSQL. They are called major contributors. There is also a group of committers, who have the write access to the source code repository. Group members change over time, new developers join the community, others leave the project. The current list of developers is published on the PostgreSQL’s official website: postgresql.org/community/contributors. The contribution of Russian developers into PostgreSQL is compelling. This is arguably the largest global open-source project with such a vast Russian representation. Vadim Mikheev, a software programmer from Krasnoyarsk who used to be a member of the Core team, played an important role in PostgreSQL evolution and development. He created such key core features as multi-version concurrency control (MVCC), vacuum, write-ahead log (WAL), subqueries, triggers. Vadim is not involved with the project anymore. In 2015, Oleg Bartunov, a professional astronomer and research scientist at Sternberg Astronomical Institute of Lomonosov Moscow State University, teamed up with Teodor Sigaev and Alexander Korotkov to start the Postgres Profes- sional company, which is now the main talent foundry in Russia when it comes to database system development. The main areas of their contribution are PostgreSQL local- ization (national encodings and Unicode support), full-text

6 i search, working with arrays and semi-structured data (hstore, json, jsonb), new index methods (GiST, SP-GiST, GIN and RUM, Bloom). They have also created a lot of popular extensions. PostgreSQL release cycle usually takes about a year. In this timeframe, the community receives patches with bug fixes, updates, and new features from everyone willing to contribute. Traditionally, all patches are discussed in the pgsql-hackers mailing list. If the community finds the idea useful, its implementation is correct, and the code passes a mandatory code review by other developers, the patch is included into the next release. At some point (usually in spring, about half a year before the release), code stabilization is announced: all new features get postponed till the next version, only bug fixes and improvements for the already included patches are accepted. Within the release cycle, beta versions appear. Closer to the end of the release cycle a release candidate is built, and soon a new major version of PostgreSQL is released. Major versions used to be defined by two numbers, but in 2017 it was decided to start using a single number. Thus, version 9.6 was followed by PostgreSQL 10, while the latest available version is PostgreSQL 15, which was released in October 2022. As the new version is being prepared, developers can find and fix bugs in it. The most critical fixes are backported to the previous versions. The community usually releases updates quarterly; these minor versions accumulate such fixes. For example, version 14.5 contains bug fixes for the 14.4 release, while version 15.2 provides fixes for PostgreSQL 15.1.

7 i Support PostgreSQL Global Development Group supports major re- leases for five years. Both support and development are managed through mailing lists. A correctly filed bug report has all the chances to be addressed very fast: bug fixes can be released as fast as 24 hours. Apart from the community support, 24x7 commercial sup- port for PostgreSQL is also provided by a number of compa- nies in different countries, including Postgres Professional (www.postgrespro.com). Current State PostgreSQL is one of the most popular databases. Based on the solid foundation of academic development, over several decades PostgreSQL has evolved into an enterprise- level product that is now a real alternative to commercial databases. You can see it for yourself by looking at the key features of PostgreSQL 15, which is the latest released version right now. Reliability and Stability Reliability is especially important in enterprise-level appli- cations that handle business-critical data. For this purpose, PostgreSQL provides support for hot standby servers, point- in-time recovery, different types of replication (synchronous, asynchronous, cascade).

8 i Security PostgreSQL supports secure SSL connections and provides various authentication methods, such as password authen- tication (including SCRAM), client certificates, and external authentication services (LDAP, RADIUS, PAM, Kerberos). For user management and database access control, the following features are provided: • creating and managing new users and group roles • role- and group-based access control to database objects • row-level and column-level security • SELinux support via a built-in SE-PostgreSQL functionality (Mandatory Access Control) Russian Federal Service for Technical and Export Control has certified a custom PostgreSQL version released by Postgres Professional for use in data processing systems for personal data and classified information. Conformance to the SQL Standard As the ANSI SQL standard is evolving, its support is constantly being added to PostgreSQL. This is true for all versions of the standard, from SQL-92 to the most recent SQL:2016, which standardizes JSON support. Much of this functionality is already implemented in PostgreSQL 15. In general, PostgreSQL provides a high rate of conformance to the SQL standard, supporting 170 out of 177 mandatory features and many optional ones.

9 i Transaction Support PostgreSQL provides full support for ACID properties and efficient transaction isolation based on the multi-version concurrency control (MVCC). This method allows us to avoid locking in all cases except for concurrent updates of the same row by different processes. Reading transactions never block writing ones, and writing never blocks reading. This is true even for the serializable isolation level, which is the strictest one. Using an innovative Serializable Snapshot Isolation system, this level ensures that there are no serializa- tion anomalies and guarantees that concurrent transaction execution produces the same result as sequential one. For Application Developers Application developers get a rich toolset for creating appli- cations of any type: • Support for various server programming languages: built- in PL/pgSQL (which is closely integrated with SQL), C for performance-critical tasks, Perl, Python, Tcl, as well as JavaScript, Java, etc. • APIs to access the database from applications written in virtually any language, including the standard ODBC and JDBC APIs. • A rich set of database objects that allow you to efficiently implement the logic of any complexity on the server side: tables and indexes, sequences, integrity constraints, views and materialized views, partitioning, subqueries and WITH-queries (including recursive ones), aggregate and window functions, stored functions, triggers, etc.

10 i • A flexible full-text search system that supports a variety of languages, extended with efficient index access methods. • Semi-structured data typical of NoSQL: hstore (storage of key–value pairs), xml, json (represented as text or in a more robust jsonb binary format). • Foreign Data Wrappers. This feature allows adding new data sources as external tables by the SQL/MED standard. You can use any major database as an external data source. PostgreSQL provides full support for foreign data, including write access and distributed query execution. Scalability and Performance PostgreSQL takes advantage of the modern multi-core CPU architecture. Its performance grows almost linearly as the number of cores increases. PostgreSQL can parallelize queries and some commands (such as index creation and vacuuming). In this mode, reads and joins are performed by several concurrent processes. JIT- compilation of queries can speed up operations thanks to better use of hardware resources. Each PostgreSQL version adds new parallelization features. Horizontal scaling can rely on both physical and logical replication. It allows you to build PostgreSQL-based clus- ters to achieve high performance, fault tolerance, and geo- distribution. Some examples of such systems are Citus (Citus- data), Postgres-BDR (2ndQuadrant), Multimaster (Postgres Professional), Patroni (Zalando).

11 i Query Planner PostgreSQL relies on a cost-based query planner. Using the collected statistics and taking into account both disk operations and CPU time in its mathematical models, the planner can optimize even the most complex queries. It can use all access paths and join methods available in state-of- the-art commercial database systems. Indexing PostgreSQL provides various types of indexes. Apart from tra- ditional B-trees, you can use many other access methods. • Hash, a hash-based index. Unlike B-trees, such indexes work only for equality checks, but in some cases they can prove to be more efficient and compact. • GiST, a generalized balanced search tree. This access method is used for the data that cannot be ordered. For example, R-trees that are used to index points on a plane and can serve to implement fast k-nearest neighbor (k-NN) search, or indexing overlapping intervals. • SP-GiST, a generalized non-balanced tree based on dividing the search space into non-intersecting nested partitions. For example, quad-trees for spatial data and radix trees for text strings. • GIN, a generalized inverted index used for compound multi- element values. It is mainly applied in full-text search to

12 i find documents that contain the words used in the search query. Another example is search for elements in data arrays. • RUM, an enhancement of the GIN method for full-text search. Available as an extension, this index type can speed up phrase search and return the results in the order of relevance without any additional computations. • BRIN, a compact structure that provides a trade-off between the index size and search efficiency. Such index is useful for huge clustered tables. • Bloom, an index based on the Bloom filter. Having a compact rep- resentation, this index can quickly filter out non-matching tuples, but the remaining ones have to be re-checked. Many index types can be built upon both a single column and multiple columns. Regardless of the type, you can build indexes not only on columns, but also on arbitrary expressions. It is also possible to create partial indexes for specific sets of rows. Covering indexes can speed up queries, since all the required data is retrieved from the index itself, avoiding heap access. The planner can use a bitmap scan, which allows combining several indexes together for faster access. Cross-Platform Support PostgreSQL runs both on Unix operating systems (including server and client Linux distributions, FreeBSD, Solaris, and macOS) and on Windows systems.

13 i Its portable open-source C code allows building PostgreSQL on a variety of platforms, even if there is no package sup- ported by the community. Extensibility One of the main advantages of PostgreSQL architecture is extensibility. Without changing the core system code, users can add various features, such as: • data types • functions and operators to support new data types • index and table access methods • server programming languages • foreign data wrappers • loadable extensions Full-fledged support of extensions enables you to develop new features of any complexity that can be installed on demand without changing the PostgreSQL core. For example, the following complex systems are built as extensions: • CitusDB, which implementsmassively parallel query execution and data distribution between different PostgreSQL instances (sharding). • PostGIS, one of the most popular and powerful geoinformation data processing systems.

14 i • TimescaleDB, which provides support for time-series data, including special partitioning and sharding. The standard PostgreSQL 15 distribution alone includes about fifty extensions that have proved to be useful and reliable. Availability A liberal PostgreSQL license, which is similar to BSD and MIT licenses, allows unrestricted use of PostgreSQL; you may also modify PostgreSQL code without any limitations and integrate it into other products, including commercial and closed-source software. Independence PostgreSQL does not belong to any company; it is developed by the international community, which includes developers from all over the world. It means that systems using Post- greSQL do not depend on a particular vendor, thus keeping the investment safe in any circumstances.

II What’s New in PostgreSQL 15 If you are familiar with the previous versions of PostgreSQL, this chapter can give you a sense of what has changed over the past year. It mentions only some of the updates; for the full list of changes, see the Release Notes: postgrespro.com/ docs/postgresql/15/release-15. SQL Commands After an unsuccessful attempt made as early as PostgreSQL 11, the MERGE command is finally implemented. This com- mand is defined by the SQL standard. It is more flexible and sometimes also more efficient than the previously available INSERT ... ON CONFLICT command. Should NULL values be considered distinct in integrity con- straints? There used to be only one correct answer to this question (negative), but now the NULLS [NOT] DISTINCT clause allows choosing the desired behavior. You can now provide a list of columns in the ON DELETE SET NULL clause for composite foreign keys: instead of resetting values in all the columns, simply specify the exact fields to be affected when the parent entry is deleted.

16 ii The COPY command can now take the table header in the first data row as input, as well as include it in the output. Functions The unnest and range_agg functions have been added for multidimensional data types introduced in version 14. New regular expression functions have been added to facili- tate migration: regexp_like, regexp_count, regexp_instr, and regexp_substr. Functions pg_size_pretty and pg_size_bytes have been improved to support petabyte units. Prefixes for larger units have been reserved for future enhancements. Partitioning If a trigger on a partitioned table gets renamed, triggers on table partitions are renamed automatically. The CLUSTER command is now supported for partitioned tables. Planning is now performed faster if only a few partitions are relevant to the query.

17 ii Write-Ahead Log LZ4 and ZStd compression algorithms for full page images have been added. As compared to the standard PGLZ algo- rithm, LZ4 usually consumes less resources but is just as efficient, while ZStd is more CPU-intensive but shows better compression results. The LZ4 algorithm is also supported by the pg_receivewal utility. WAL records for recovery can now be prefetched, which may speed up server restart after a failure, backup restore, and application of WAL records during replication. Continuous archiving now allows using custom modules instead of an OS command. Developers of backup solutions can now rely on this feature. A new feature that is not directly available to end users is the ability to create custom resource managers. This functionality is important for developers of both index and table access methods. Incidentally, table access methods can now be altered using the ALTER TABLE ... SET METHOD command (but sadly, there is no alternative available yet). Added the pg_walinspect extension, which enables viewing WAL records via an SQL query (in addition to the already available pg_waldump utility). Logical Replication Logical replication has been significantly improved. There is hope that such long-awaited features as replication of sequences and DDL commands will be soon implemented as well.

18 ii When creating a publication, you can now filter rows and columns to be replicated, as well as replicate all tables that belong to a particular schema (FOR ALL TABLES IN SCHEMA). Added support for prepared transactions to logical repli- cation. Subscriber processes are now executed with the privileges of the subscription owner; superuser rights are not required anymore. Subscriptions can now be stopped on conflict. Previously, the WAL receiver process was restarted every second; now the conflicting transaction can be skipped (ALTER SUBSCRIPTION ... SKIP). Backup The good-old pg_basebackup utility was refactored. You can now choose the backup location: it can be created either on the client or on the server. Following the principle of extensibility, you can create custom backup targets; for example, the new basebackup_to_shell extension passes the backup to an OS command. Data compression on the server side can now be performed by gzip, LZ4, and ZStd algorithms, which can be useful if the network throughput is low.

19 ii Security The CREATE privilege has been revoked from the public schema; this privilege used to be inherited by all users from the public pseudorole. The public schema is now owned by a new pseudorole called pg_database_owner, which im- plicitly includes the owner of the current database. PostgreSQL now provides more opportunities for delegating system administration tasks to non-privileged users. The right to perform the CHECKPOINT command is granted to the new role called pg_checkpointer. The access to statistics on backend memory contexts (pg_backend_memory_contexts and pg_shmem_allocations catalog views) is granted to the pg_read_all_stats role. You can also control access to configuration parameters (GRANT SET and GRANT ALTER SYSTEM). It is now possible to create views with the caller’s privileges: the user must have the right to access these views’ objects in this case. Regular users can no longer manage their own role mem- bership (that is, use the ADMIN OPTION) by default to add or remove members of its own role). Monitoring Cumulative statistics is now stored in shared memory; a separate statistics collector process is not used anymore, and there is no need to mount tmpfs for the files that accumulate statistics.

Statistics

Uploader

Postgres The First Experience (Pavel Luzanov, Egor Rogov, Igor Levshin) (Z-Library)

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Recommended for You

Statistics

Uploader

Postgres The First Experience (Pavel Luzanov, Egor Rogov, Igor Levshin) (Z-Library)

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Reply to Comment

Edit Comment

Recommended for You