Postgres database and table storage and index optimizations

Optimizating database storage is always a key focus area when running databases. Lets get started with some basics.

Database storage used

Run the following query to see how much space your database currently using

SELECT datname AS database_name,
pg_size_pretty(pg_database_size(datname)) AS total_size
FROM pg_database
ORDER BY pg_database_size(datname) DESC;

And you may get further break down here:-

Table size spaces used

Let's see how much space used between data and index

SELECT
relname AS table_name,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
pg_size_pretty(pg_relation_size(relid)) AS data_size,
pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) AS index_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;

And we may get the following outputs. We can see how much actual data stored in the data (data_size) and how much taken up by the index_size. It is important to keep index size smaller because this take up database memory as index are loaded into memory to speed up query.

The index size is ok, not great.

Take a look at this index vs data composition

You will notice that index size is 14G while data size is only 229M. This is quite alarming and a clear indication that your index is bloated and action must be taken to fix this

Here we can

1. Reindex table

REINDEX TABLE CONCURRENTLY KC_ISPN__offlineSessions

A standard REINDEX locks the table. Adding CONCURRENTLY builds a brand new, perfectly packed index in the background while your application continues to read and write. Once the new index is fully built, Postgres seamlessly swaps it with the old, bloated one and drops the old one.

Please note: This will generate WAL (database logs) because it is writing a new index, but it is the only safe way to remove bloat in production without extensions.

2. Reconfigure table for more optimized clean up. For high curning table, the vacuum threshold here is less significant.

ALTER TABLE "KC_ISPN__offlineSessions" SET ( -- Trigger vacuum after 4% of rows are dead (Aggressive to prevent bloat) autovacuum_vacuum_scale_factor = 0.04, -- Trigger analyze after 0.5% change (Keep stats fresh for query planner) autovacuum_analyze_scale_factor = 0.005, -- Base threshold (500 is fine, prevents vacuum on tiny fluctuations) autovacuum_vacuum_threshold = 500, -- THROTTLE SETTINGS (Crucial for High Churn) -- Allow vacuum to consume more 'cost' before pausing (Default is often 200) -- Increase this to let vacuum work faster during quiet periods autovacuum_vacuum_cost_limit = 2000, -- Delay in milliseconds when cost limit is reached (Default 2ms) -- Increase this slightly to smooth out I/O during busy times autovacuum_vacuum_cost_delay = 2ms );

3. Use VACUUM to clean table and Index. This is closely related to configuration above - instead of waiting for the clean up criteria to complete, we initate vacuum manually.

VACUUM is used to clean up data tuple (record that has been marked as deleted).

Please note: VACUUM generate WAL too. WAL file are chunked to 16M per file and as long as we give some room for postgres to catch up, these files are clean up automatically. In Azure portal you can see this via its metrics called "Transaction Log Used"

You can use the following sql to check size of WAL

SELECT pg_size_pretty(sum(size)) AS total_wal_size FROM pg_ls_waldir();

We should avoid using VACUUM FULL, instead we can use

BEGIN;
-- 1. Throttle I/O to prevent contention with normal queries
-- Lower limit = vacuum pauses more often
SET LOCAL vacuum_cost_limit = 50; 
-- Longer delay = vacuum sleeps longer between pauses
SET LOCAL vacuum_cost_delay = 10ms; 
-- 2. Run Standard Vacuum (Non-blocking, Low WAL)
-- Do NOT add 'FULL'
VACUUM VERBOSE "KC_ISPN__offlineSessions";
COMMIT;

Cost	How it is addresses
No Locking	Standard `VACUUM` acquires only a `Share Update Exclusive` lock. It does not block `SELECT`, `INSERT`, `UPDATE`, or `DELETE` operations.
Low I/O Impact	By lowering `vacuum_cost_limit` (default is usually 200) and increasing `vacuum_cost_delay`, you force the vacuum process to "sleep" frequently. This leaves I/O bandwidth available for your application queries.
Modest WAL	Standard `VACUUM` does not rewrite data pages. It only updates the Visibility Map and Free Space Map. This generates negligible WAL compared to `VACUUM FULL` (which logs every row move).

Search This Blog

mitzen

Postgres database and table storage and index optimizations

Comments

Popular posts from this blog

gemini cli getting file not defined error

mongosh install properly

vllm : Failed to infer device type