--- /dev/null
+
+
+
+
Writing A TABLESAMPLE Sampling Method
+
+
+
+
+ The TABLESAMPLE clause implementation in
+
PostgreSQL> supports creating a custom sampling methods.
+ These methods control what sample of the table will be returned when the
+ TABLESAMPLE clause is used.
+
+
+
+
Tablesample Method Functions
+
+ The tablesample method must provide following set of functions:
+
+
+void
+tsm_init (TableSampleDesc *desc,
+ uint32 seed, ...);
+
+ Initialize the tablesample scan. The function is called at the beginning
+ of each relation scan.
+
+ Note that the first two parameters are required but you can specify
+ additional parameters which then will be used by the TABLESAMPLE>
+ clause to determine the required user input in the query itself.
+ This means that if your function will specify additional float4 parameter
+ named percent, the user will have to call the tablesample method with
+ expression which evaluates (or can be coerced) to float4.
+ For example this definition:
+tsm_init (TableSampleDesc *desc,
+ uint32 seed, float4 pct);
+
+Will lead to SQL call like this:
+... TABLESAMPLE yourmethod(0.5) ...
+
+
+
+BlockNumber
+tsm_nextblock (TableSampleDesc *desc);
+
+ Returns the block number of next page to be scanned. InvalidBlockNumber
+ should be returned if the sampling has reached end of the relation.
+
+
+OffsetNumber
+tsm_nexttuple (TableSampleDesc *desc, BlockNumber blockno,
+ OffsetNumber maxoffset);
+
+ Return next tuple offset for the current page. InvalidOffsetNumber should
+ be returned if the sampling has reached end of the page.
+
+
+void
+tsm_end (TableSampleDesc *desc);
+
+ The scan has finished, cleanup any left over state.
+
+
+void
+tsm_reset (TableSampleDesc *desc);
+
+ The scan needs to rescan the relation again, reset any tablesample method
+ state.
+
+
+void
+tsm_cost (PlannerInfo *root, Path *path, RelOptInfo *baserel,
+ List *args, BlockNumber *pages, double *tuples);
+
+ This function is used by optimizer to decide best plan and is also used
+ for output of EXPLAIN>.
+
+
+ There is one more function which tablesampling method can implement in order
+ to gain more fine grained control over sampling. This function is optional:
+
+
+bool
+tsm_examinetuple (TableSampleDesc *desc, BlockNumber blockno,
+ HeapTuple tuple, bool visible);
+
+ Function that enables the sampling method to examine contents of the tuple
+ (for example to collect some internal statistics). The return value of this
+ function is used to determine if the tuple should be returned to client.
+ Note that this function will receive even invisible tuples but it is not
+ allowed to return true for such tuple (if it does,
+
PostgreSQL> will raise an error).
+
+
+ As you can see most of the tablesample method interfaces get the
+ TableSampleDesc> as a first parameter. This structure holds
+ state of the current scan and also provides storage for the tablesample
+ method's state. It is defined as following:
+typedef struct TableSampleDesc {
+ HeapScanDesc heapScan;
+ TupleDesc tupDesc;
+
+ void *tsmdata;
+} TableSampleDesc;
+
+ Where heapScan> is the descriptor of the physical table scan.
+ It's possible to get table size info from it. The tupDesc>
+ represents the tuple descriptor of the tuples returned by the scan and passed
+ to the tsm_examinetuple()> interface. The tsmdata>
+ can be used by tablesample method itself to store any state info it might
+ need during the scan. If used by the method, it should be pfree>d
+ in tsm_end()> function.
+
+
+
+