You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Stefán Baxter <st...@activitystream.com> on 2015/11/21 00:26:27 UTC
Searching in Arrays - Parquet - REPEATED_CONTAINS
Hi,
I'm trying to use a an array in Parquet to store list of IDs (1:* scenario)
as opposed to put each ID in a separate field. (array contains 1-10 values)
This requires me to use REPEATED_CONTAINS to search for these values.
I was expecting a performance penalty but it turns out that searching with
REPEATED_CONTAINS is 20x times slower then looking for a single value.
My guess it has to do with scan optimization and regular expressions being
used for comparison but I wonder if that is so or if there are some tricks
available to speed this up.
Any suggestions?
Regards,
-Stefán