You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Stefán Baxter <st...@activitystream.com> on 2015/11/21 00:26:27 UTC

Searching in Arrays - Parquet - REPEATED_CONTAINS

Hi,

I'm trying to use a an array in Parquet to store list of IDs (1:* scenario)
as opposed to put each ID in a separate field. (array contains 1-10 values)

This requires me to use REPEATED_CONTAINS to search for these values.

I was expecting a performance penalty but it turns out that searching with
REPEATED_CONTAINS is 20x times slower then looking for a single value.

My guess it has to do with scan optimization and regular expressions being
used for comparison but I wonder if that is so or if there are some tricks
available to speed this up.

Any suggestions?

Regards,
 -Stefán