Monday, January 8, 2018

Exploring SP-GiST and BRIN indexes visually using pg_hexedit

Support for both BRIN and SP-GiST access methods was recently added to pg_hexedit, the experimental hex editor framework for PostgreSQL relation files. These were the final access methods among the standard Postgres index access methods that required support.

SP-GiST (Space-Partitioned GiST)



Beginning of an SP-GiST leaf page

SP-GiST is unique among index access methods whose index structure is tree-like, in that it supports tree structures that are unbalanced. SP-GiST operator classes exist that support k-d trees, quadtrees, and suffix trees. These structures are traditionally only suited to a fully in-memory representation, with dynamically-allocated nodes that contain a small number of simple pointers (byte addresses) pointing to other nodes.

SP-GiST presents a generalized interface through which all of these space-partitioned trees can be constructed for a given datatype, in a way that minimizes disk seeks (PDF) and works well with block-orientated storage. Essentially, SP-GiST maps tree nodes onto disk blocks in an adaptive fashion, rather than simply having a block directly correspond to a tree node, as happens with other access methods.

There are particularly intricate data structures needed to support all of this. Space utilization can be an issue with SP-GiST indexes, though that's probably very workload dependent. This is something that pg_hexedit can be effective at representing visually.

SP-GiST is a good example of the PostgreSQL community implementing a concept that comes directly from state of the art database research (PDF). I suspect that we have yet to fully realize the benefit of SP-GiST for specific application domains, due to a simple lack of awareness among users and potential users that work in those domains. Perhaps this enhancement can contribute in some small way towards a better understanding of what is possible.

BRIN (Block Range Index)


BRIN "revmap" page
The structure of BRIN indexes is not at all tree-like. BRIN works by summarizing the locations of ranges of values within the underlying indexed table, and exploiting an underlying natural ordering (e.g., a date column on a large append-only table with historic sales records). BRIN indexes are often very small, even when they index a very large underlying table.

BRIN summary page
The on-disk representation is fairly simple. One novel aspect of BRIN's on-disk representation is that it's the only index access method that updates index tuples in-place (this is possible in part because it doesn't have one index tuple for every heap tuple/HOT chain from the indexed table, another unique property). These in-place updates can happen when a range in a summary page changes. Visibility into how often this happens in the real world may prove useful.

No comments:

Post a Comment