You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Paul Rogers <pr...@mapr.com> on 2017/10/26 17:14:13 UTC

Capturing in-flight batches

Hi All,

Yesterday, in a conversation, Salim mentioned it would be handy to be able to capture and replay in-flight batches in a Drill query in order to diagnose problems. As it turns out, we have most of the pieces readily available; we just need someone to assemble them.

First, we have the IteratorValidatorBatchIterator class which sits on top of each operator and validates that operator’s state. We extended it a while back to validate vector internals to catch a few cases of offset vector corruption. This class could be extended to capture in-flight batches for selected operators.

Second, we have the VectorAccessibleSerializable class (and the recently added VectorSerializer wrapper class) that writes batches to, and reads batches from disk. This class is the foundation of our spilling support.

Third, we have the EasyFormatPlugin class that lets us easily create a new disk-based reader.

Combine them and we can use the validator to write batches using the vector serializer. Then, we create a new easy format plugin to read these files again using the vector serializer.

The good news is that most of these classes have been around since the early days, so any technique built using them should work for any older versions of Drill we need to debug. (Though, of course, we’d have to rebuild that old version to include the batch intercept code…)

Thanks,

- Paul