Identifying The Hottest Tables (PostgreSQL)

In recent versions of PostgreSQL there is a magical view called pg_stat_user_tables which provides per table information on usage; there is one row per table and eight counters per row.
relname - The name of the table.
seq_scan - The number of sequential scans that have been performed on the table. A sequential scan is a read of the table from beginning to end, either because the table is very small or no indexes were available that could satisfy the filter criteria in an efficient way. Sequential scans are probably the most expensive operation the database server performs, some are however unavoidable. If proper indexing cannot resolve the need to sequentially scan a table it is imperative that the PostrgeSQL configuration provide enough resources to maintain a high cache rate.
seq_tup_read - The number of rows processed through sequential scans. This is not the number of records returned to the applications as results but the number of records processed in order to create the result set, which is probably a significant subset of this number. For example, if a query returns ten records but requires a sequential scan of the table then this value will increase by the number of records in the table, not by ten.
idx_scans - The number of indexes scans of the tables.
idx_tup_fetch - The number of rows processes through indexed scans. As with seq_tup_read this is not the count of records returned as the results of queries but those evaluated for queries due to index entries.
seq_tup_read - The number of records processed in order to create the result set of a query, not the number of records returned to the applications.
n_tup_ins - The number of rows inserted into the table.
n_tup_upd - The number of rows updated.
n_tup_del - The number of rows deleted.
Using this view an administrator can isolate the busiest tables in the database.
SELECT relname AS table_name,
seq_tup_read, idx_tup_fetch
FROM pg_stat_user_tables
WHERE (seq_tup_read + idx_tup_fetch) > 0
Text 1: Query to return the ten hottest tables with their sequential and index tuple fetch values.
These results will reveal both table usage and the effectiveness of your indexes.  If you have lots of sequential scans occurring then the query engine isn't finding indexes that match the queries being performed.
     table_name                           seq_tup_read      idx_tup_fetch
doc                                       1,423,407,729,074    349,028,985,971
job_history                                        71,378,301     4,213,364,118
job_history_info                               74,454,363      4,207,594,850
date_company_assignment           31,059,671      1,305,469,897
enterprise                                    3,551,311,871     1,083,015,878
date_x                                               12,884,498        982,418,723
object_acl                                  15,942,621,939        137,179,721
job                                               39,956,712,914          46,912,825
project_info                                 1,709,329,011                         23
team                                           1,141,035,688                            0
Text 2: Example results.
In these example results it is apparent that the table doc is one of the hottest objects and while many records are being identified using index entries there is also a very large number of sequential processes occurring.  This may be because either the indexes do not match the queries being performed or the cardinality of the indexed values is too low. Now we know where to look. 
So don't grope about speculating about how to improve database performance or scalability - ask where to look, PostgreSQL wants to help you.  Much more information can be found at the PostgreSQL stats monitoring documentation.

No comments:

Post a Comment