Postgres Survival Guide

postgres survival guide banner

And it happened. It’s slow. It wasn’t supposed to be slow. Index time.
Something feels off. It’s executing for a while. Longer than usual. Suspiciously long. Did it always take this long? No, no way. Rollback? Did I wrap this in a transaction? Can you even rollback an index?

It went through. It did? It did. Easy.

DATABASE_PROCESSOR_100_PERCENT_SOMETHING_IS_GROWLING_AT_ME

Some query started crashing the system. Which one? How do I stop it?

I could always drop the index and revert to the pre-index state.
Good thing there’s a backup. There is a backup, right…? Yeah, there is. Automatic. Just restore it. How do I restore it? Where do I restore it? Why can’t Scaleway just let me zip my database so I can spin it up locally and poke around like caveman?

Why did adding the index make the query slower? Could I have predicted that? In the next part of the article, I’ll try to answer these questions.

Identify if you’re in trouble.

There are few ways of indentifying if you need an index. Your query is slow may be a good starting point. But is it really slow?

-- Likely provided by your DB provider
CREATE EXTENSION pg_stat_statements;

--- ACTIVE QUERIES
SELECT pid, usename, datname, state, query, query_start
FROM pg_stat_activity
WHERE state='active';

--- HEAVIEST QUERIES
--- big == bad
SELECT * FROM pg_stat_statements ORDER BY total_exec_time DESC;

-- Reset statistics. I.E. after a change that completly obsoletes the old stats.
SELECT pg_stat_statements_reset();

When your database is under heavy load, the processor might start going “wrrr.” When that happens - proceed with caution.

Pro Tip: Check the database condition before doing migrations.

Backup and restore.

Have you enabled the WAL (Write-Ahead Logging)? Enable it for your own safety. You may also double check if you need wal archiving.

-- Most likely you would have those defaults already.
-- Check WAL status
SHOW wal_level;

-- Enable WAL archiving if not already. It would let you have a point in time recovery.
ALTER SYSTEM SET wal_level = 'replica';
SELECT pg_reload_conf();

Make a backup. Are you making backups? That makes me happy. Now spin up the database. What, you won’t? Try it. Maybe you will.

If your database is big (in my world, a big database is over 10GB. (Over 100GB is massive, and anything over 1TB is a yeti – everyone knows it exists, but I’ve never seen it (sames goes to the megalodon)) then making a backup and restoring it on a new/testing environment might not be that straightforward.

Different providers, different customs. It’s worth testing the environment restoration process before everything’s on fire.

Personally, when recovering from the backup I’ve had the most trouble with roles and permissions. Sometimes things didn’t back up as I wanted, sometimes users were on different schemas, sometimes types were shared.

Pro Tip: Backup your db. Test out your backups. Restore them, see how it goes.

Vacuum

The big part of administrating the Postgres database is vacuum management.

Running vacuum is usually a good idea after making changes to your database, like adding a column or a large insert. Vacuum helps clean out dead tuples (rows that still take up space but haven’t been fully removed yet).
It also helps the query planner make better decisions by updating statistics.

-- Vacuum and analyze a specific table
VACUUM ANALYSE schema.table_name;

-- Vacuum the entire database (this one is expensive)
VACUUM ANALYZE;

Autovacuum

-- SHOW AUTOVACUUM SETTINGS
SHOW autovacuum;

SELECT name, setting FROM pg_settings WHERE name LIKE 'autovacuum%';

Check for Bloat

-- You need the extension for this; it’s probably already installed if you're using a managed DB.
CREATE EXTENSION pgstattuple;

-- SEE THE BLOAT. More dead, worse performance.
SELECT * FROM pgstattuple('neck_breaking_idx');
SELECT * FROM pgstattuple('hell_schema.nightmare_table');

Regular vacuuming keeps bloat low and things flow smoothly.

Query which I use to analyze what I’m dealing with, table sizes etc.

-- TABLE INFO, sizes, indexes, etc.
SELECT
   relname AS table_name,
   pg_size_pretty(pg_total_relation_size(relid)) AS "Total Size",
   pg_size_pretty(pg_indexes_size(relid)) AS "Index Size",
   pg_size_pretty(pg_relation_size(relid)) AS "Actual Size"
   FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;

Pro Tip: From time to time revisit the autovacuum in your db. It may went rouge when you were not looking.

Who sows indexes

Adding an index will change how PostgreSQL attempts to execute queries. In most cases, it will speed things up. But in some cases, it might slow down a query.

The bigger your table, the more likely it is that the query planner will pick a different plan. The plan might not be the best, and the estimates you made locally might be off.

There are different types of indexes.

Types:

• B-tree: The default, great for most queries. • Hash: Optimized for simple equality checks. • GIN: Perfect for array values and full-text search. • GiST: Supports complex data types like geometric data. • BRIN: Efficient for large tables with ordered data. It’s very small in size (mb vs gb) compared to other indexes.

Even thought you may think that particular index is a perfect match for your needs, make sure to test out your theory in practice. Try the pgbench to see how your database actually behave. Most of the times, default would be the best, and the others are the to trick you into thinking that you have some control over query planner.

If your database handles a lot of writes, adding an index without concurrency can be dangerous (it may make the processor go wrrrr, and freeze writes).

If you are concerned about your database accessability, you may want to create index concurrently. If you’re using some sort of orm for the database migrations, be aware that the concurrent indexes won’t work in the transactions. (rust sqlx is doing that).

No table locks. You can still insert to the table, read from it. It’s specially important for larger tables and the tables that are having a lot of wrights and you cannot afford locking it for few minutes or even hours.
It’s not a magical index. It’s the same kind of index you’ll create without concurrency keyword, so it would look as a normal one when you’ll be restoring db from backup.

-- CREATE INDEX
CREATE INDEX CONCURRENTLY burning_idx ON dumpster_on_fire (column_name);

Partitioning the database by index is something you reach for when you’re know what you’re doing. I won’t be covering it in this article.

Temp tables are a great way to test your queries and migrations. Instead of running a migration on the whole table, you can run it on a subset of data. The same goes for indexes. It might not give you the full picture, but it’ll give you a hint.

-- CREATE TEMP TABLE
BEGIN;
CREATE TEMP TABLE temp_nightmare_table AS
SELECT * FROM hell_schema.nightmare_table LIMIT 100;

-- RUN MIGRATION
ALTER TABLE temp_nightmare_table ADD COLUMN new_column_name TEXT();

-- DROP TEMP TABLE
DROP TABLE temp_nightmare_table;
ROLLBACK;

Sample index which wraps two columns. Great if you actually do most of the queries by those fields.

-- Indexes on most queried fields
CREATE TABLE guests (
    id INT PRIMARY KEY,
    name TEXT,
    surname TEXT
);

CREATE INDEX idx_guests_name_surname ON guests (name, surname);

Pro Tip: Test the indexes extensively.

Reaps query plans

Postgres uses query planner to determinate how the query would be executed.

The decision is bases on the statistics.

If the correlation between columns is strong, the query planner would more likely decide to trigger index scan as it reduces the random disk reads. On another hand, lower correlaction will more likely lead to the sequential scan.

-- CORRELATION BETWEEN COLUMNS. THE CLOSER TO -1 or 1, the stronger the correlation.
SELECT CORR(mmr.leaf, ab.element_index) AS correlation_leaf_element
FROM merkle_mountain_range mmr;

Using EXPLAIN and EXPLAIN ANALYZE

EXPLAIN shows the execution plan, without running the query.
EXPLAIN ANALYZE: Executes the query and provides the performance data.

I always run them few times, in case postgres query planner decides to change its mind out of the blue.

The Secret Weapon:

-- WOMBO COMBO EXPLAIN
-- EXPLAIN (ANALYZE, BUFFERS, VERBOSE, SETTINGS, SUMMARY, format json)
SELECT * FROM schema.table_name;

-- https://www.pgmustard.com/

There are a bunch of tools online to help visualize how PostgreSQL plans and executes a query.

Tl;dr query planner analysis

Read it from the inside out.
Numbers. If the estimated row count is way off, the query planner may pick a suboptimal plan. If possible, try reducing the number of rows returned by the query by improving your WHERE clause. An index may not be necessary just yet. You probably don’t need to tweak Postgres settings. You most likely don’t need to partition the table. You very unlikely need to do sharding.
Have a loot at join strategies. Nested loops, Merge, Hash etc. It would give you more context of how Postgres (and other rdbs) are handling joins across sets.
Pg configuration affect the estimation cost for the join strategies.

Conventions and constrains

Name things consistently and avoid hyphens. If you separate names with underscores, you don’t need to use quotes.

-- pretty
SELECT some_column FROM long_schema.even_longer_table;

-- not so
SELECT "some-column" FROM "long-schema"."even-longer-table";

Foreign keys. They are not required, perse, you could make your queries as fast using only indexes with constrain. I like them mostly for the consistency, as they won’t let you delete a row which depends on other row etc.

If for whatever reasons you cannot add foreign keys, you can use constraints.

There are few constraints that I was initially unaware of like:

-- Composite Primary key over two columns
CREATE TABLE hotel (
    room_number INT,
    room_floor INT,
    PRIMARY KEY (room_number, room_floor)
);

-- More conventional primary key
-- unique on 2 columns
CREATE TABLE hotel (
	id INT PRIMARY,
    room_number INT,
    room_floor INT,
    UNIQUE (room_number, room_floor)
);

-- Sanity checking inserts
CREATE TABLE room_booking (
    id INT PRIMARY KEY,
    start_date TIMESTAMPTZ,
    end_date TIMESTAMPTZ,
    CHECK (end_date > start_date)
);

If you design the database so that invalid states are impossible to represent, you’ll have fewer problems. Or more problems depending on how often you need to make changes etc, there is no silver bullet.

Hiding business logic in db is bad, having corrupted state persisted in db is bad.

My rule of thumb is to generally:

foreign keys aren’t as important as I initially thought they are. If you have proper indexex and constrains query speed will be great. Foreign keys are great for data integrity, which may be a problem if you need to force a system in the goofy state to hotfix something.
triggers, from my experience they are were more painful than helpful.
sequences, I didn’t had to use them since I had them in very old oracle db I was maintaing. Nowdays I go with ulid for my ids. (they are lexicaly sorted, you can extract date from them so freeish created at).
Avoid CDC - Change Data Capture. I don’t belive anyone claiming that the CDC works great for them. I’m more than happy to be proven wrong. I’m highly alergic to Debezium.

Most likely you want to use timestampz for dates, not timestamp or date or unix timestamp as int. Most likely you want to use text vs varchar.

Here are a few more anti-recommendations from the PostgreSQL wiki:
https://wiki.postgresql.org/wiki/Don’t_Do_This

Transactions. Begin, Rollback, Commit.

The first thing I do when working on a database is to configure my editor (currently I’m on TablePlus but the DBBever was always with me when I needed it) to wrap everything I do in transactions. Especially when making schema changes, I run a lot of dry runs with transactions that I can roll back in one go.

BEGIN;
UPDATE hell_schema.nightmare_table SET column_name = 'new_value' WHERE column_name = 'old_value';

SELECT column_name FROM hell_schema.nightmare_table;

-- I'm not happy with results
ROLLBACK;

Fine tunning database

Your provider’s default settings may not be optimal for your database. Especially if you’re your provider.

When in doubt, use the configs provided by this great tool. https://pgtune.leopard.in.ua/

A great description of what each parameter does: https://postgresqlco.nf/doc/en/param/random_page_cost/15/

The juiciest parameters you may want to tweak are:

• random_page_cost: The estimated cost for a random disk read. Affects index scans. Lower values favor index use. • seq_page_cost: The estimated cost for a sequential disk read. Lower values favor full table scans. • shared_buffers: The amount of memory PostgreSQL uses for caching data in memory. Larger values can improve performance by reducing disk I/O. • work_mem: The amount of memory allocated for operations like sorting and hashing during query execution. Larger values can improve performance for complex queries but increase memory usage per connection.

-- For current session only, great for testing query planner
SET random_page_cost = 1.1;

-- Now run the slow query with EXPLAIN ANALYZE
-- see if the estimations are more accurate and if actual execution speed has improved

-- Back to previous settings.
RESET random_page_cost;

-- Persist across restarts
ALTER SYSTEM SET random_page_cost = 1.1;

-- Reload configs
SELECT pg_reload_conf();

Conclusion

This article serves as a memorial of my own investigation into the tangled world of database performance. Hopefully, it can save you some of the headaches I encountered along the way, or at least point you in the right direction when your database starts acting up.

Remember, optimization is an ongoing process full of trials and errors, and there’s always more to learn—but sharing knowledge makes the journey a little easier for all of us.