The original coding read tuples from workers in round-robin fashion,
but performance testing shows that it works much better to read enough
to empty one queue before moving on to the next. I believe the
reason for this is that, with the old approach, we could easily wake
up a worker repeatedly to write only one new tuple into the shm_mq
each time. With this approach, by the time the process gets scheduled,
it has a decent chance of being able to fill the entire buffer in
one go.
Patch by me. Dilip Kumar helped with performance testing.
continue;
}
- /* Advance nextreader pointer in round-robin fashion. */
- gatherstate->nextreader =
- (gatherstate->nextreader + 1) % gatherstate->nreaders;
-
/* If we got a tuple, return it. */
if (tup)
return tup;
+ /*
+ * Advance nextreader pointer in round-robin fashion. Note that we
+ * only reach this code if we weren't able to get a tuple from the
+ * current worker. We used to advance the nextreader pointer after
+ * every tuple, but it turns out to be much more efficient to keep
+ * reading from the same queue until that would require blocking.
+ */
+ gatherstate->nextreader =
+ (gatherstate->nextreader + 1) % gatherstate->nreaders;
+
/* Have we visited every TupleQueueReader? */
if (gatherstate->nextreader == waitpos)
{