You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by cobby cohen <qu...@yahoo.com> on 2015/03/12 21:39:30 UTC

filter on bucketed column

bucketed column seems great but i dont understand why they are being used for just for optimizing joins and not where clause (filter).i have a huge table (billions of records)  which includes a field with medium cardinality (~100,000). user usually filter with that field (at least). using partitions, or full table scan, are both inefficient. Hash partition, or bucketing seems to be the way to go. i saw HIVE-5831, but it seems the solution is not going into trunk for some reason.any comments?thanks.