info@solusidb.com

Configurations in PostgreSQL for Heavy Write Workloads

Optimizing PostgreSQL and the operating system for a heavy write workload involves careful tuning of both PostgreSQL settings and OS configurations. Here’s a concise guide to the best configurations:

 

PostgreSQL Configuration

  1. Memory Settings:
    • shared_buffers: Set 25-40% of total RAM. Purpose It stores frequently accessed data and index pages to reduce the need for disk I/O, thereby improving performance.
    • work_mem: Set 16MB to 64MB (higher for complex queries). Each time a query is executed, PostgreSQL can use work_mem for operations like sorting and hashing. If the memory required for these operations exceeds the work_mem setting, PostgreSQL will spill data to disk, which can significantly degrade performance.
    • maintenance_work_mem: Set 256MB to 1GB for maintenance operations. For heavy write operations like VACUUM or CREATE INDEX, consider setting this to a few hundred MB. This setting allows PostgreSQL to perform maintenance tasks more efficiently by using more memory for these operations, reducing the need for disk I/O and speeding up the execution time of tasks like index creation and table vacuuming.
  2. Write-Ahead Logging (WAL):
    • wal_level: Set to minimal if replication is not needed. It controls the level of detail in the WAL, which affects the ability to support replication and point-in-time recovery (PITR). Set to minimal if replication is not needed.
    • synchronous_commit: Set to off to improve write performance (some data loss possible on crash).  This setting dictates whether a transaction must wait for the WAL to be flushed to disk before the transaction is considered committed.
    • commit_delay: Set to 10-20 ms to allow batching of commits. It specifies a delay (in microseconds) that PostgreSQL will wait before flushing the WAL (Write-Ahead Log) to disk after a commit operation. This can allow multiple transactions to be batched together, potentially improving efficiency.
    • wal_buffers: Increase to 16MB or more, depending on workload. Purpose for provides a buffer area for WAL data to be stored in memory temporarily. This helps reduce the frequency of disk writes, improving performance during heavy write operations.
  3. Checkpoints:
    • checkpoint_timeout: Increase to 15-30 minutes. This setting defines the maximum time interval (in seconds) between two consecutive checkpoints. A checkpoint is crucial for database recovery and ensures that data is safely written to disk.
    • max_wal_size: Increase to 2GB or more.  This setting controls the maximum amount of WAL data that can be generated before PostgreSQL initiates a checkpoint. A larger max_wal_size allows more WAL data to accumulate, which can lead to fewer checkpoints and potentially improve performance during heavy write operations.
    • checkpoint_completion_target: Set to 0.7 or higher. It defines the fraction of the checkpoint_timeout that PostgreSQL aims to use for the checkpoint process. A higher value indicates that PostgreSQL will spread the I/O generated by the checkpoint over a longer period, reducing I/O spikes.
  4. Autovacuum:
    • Ensure it runs effectively. Adjust:
      • autovacuum_vacuum_cost_limit: Increase to 2000 or more. This setting helps regulate the resource usage of the autovacuum process by limiting how much I/O it can perform during each run. This is important for balancing database maintenance with the performance of normal database operations.
      • autovacuum_naptime: Set to 1 minute. This setting controls the frequency of autovacuum operations, impacting how quickly the database can reclaim space and update statistics for query planning.
  5. Connection Settings:
    • max_connections: Set according to your application needs, but monitor RAM usage. This setting limits the number of simultaneous client connections to the database. It includes connections from applications, users, and background processes.
  6. Other Important Parameters:
    • effective_cache_size: Set to about 75% of total RAM. It gives the query planner an estimate of how much memory is available for the database to cache disk pages, including both the operating system’s file system cache and PostgreSQL’s own shared buffers.
    • default_statistics_target: Set to 100 for better query planning. This setting specifies the number of sample values that PostgreSQL will collect for each column when gathering statistics. Higher values lead to more detailed statistics, which can improve the accuracy of query planning and execution.
    • Max_files_per_process: used to control the maximum number of file descriptors that can be opened by a single PostgreSQL backend process. The default value for max_files_per_process is usually set to 1000, but this can vary based on the operating system and PostgreSQL version

Operating System Configuration

  1. Filesystem:
    • Use XFS or EXT4 with the noatime mount option to reduce write overhead.
  2. I/O Scheduler:
    • Use the deadline or noop I/O scheduler for improved performance under heavy write loads.
  3. Swappiness:
    • Set vm.swappiness to a low value (e.g., 10) to minimize swapping.
  4. Disk Configuration:
    • Use SSDs for data directories and WAL for lower latency.
    • Consider RAID 10 for redundancy and performance.
  5. Kernel Parameters:
    • Adjust fs.file-max and fs.nr_open to accommodate more open files.
    • Example:
      echo "fs.file-max = 100000" >> /etc/sysctl.conf
      sysctl -p
  6. Memory Management:
    • Ensure adequate RAM and adjust limits (ulimit) appropriately.
  7. Networking:
    • Optimize network settings if using replication or high traffic.
  8. Open Files
    1. Set  65536.
      You can set the limits in the /etc/security/limits.conf file:

      postgres soft nofile 65536 
      postgres hard nofile 65536

      You may also need to adjust the system-wide limit in /etc/sysctl.conf or with the ulimit command.

      To check the current limits for open files, you can use: ulimit – n

      Open files refer to any files that are currently in use by the PostgreSQL server, including data files, WAL (Write-Ahead Logging) files, configuration files, and temporary files. Each connection to the database also consumes a file descriptor

Monitoring and Tuning

  • Use pg_stat_activity, pg_stat_statements, and tools like top, iostat, or vmstat for monitoring.
  • Benchmark using pgbench or similar tools to simulate workloads and adjust configurations based on results.
  • Use PMM for dashboard monitoring database.

Conclusion

These configurations provide a solid starting point for optimizing PostgreSQL and the OS for heavy write workloads. Always monitor performance and adjust settings based on your specific workload characteristics and system resources.