You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2018/08/23 13:16:00 UTC

[GitHub] nishantmonu51 opened a new pull request #6222: Add ability to pass in Bloom filter from Hive Queries

nishantmonu51 opened a new pull request #6222: Add ability to pass in Bloom filter from Hive Queries
URL: https://github.com/apache/incubator-druid/pull/6222
 
 
   This PR adds a BloomDimFilter which can be used by Apache Hive to pass in BloomFilters. 
   
   Use Case - 
   We have fact table in druid and slowly changing dimension/lookup tables in Apache Hive and need to join those tables. 
   e.g. Consider the case of SSB Benchmark when lineorder is stored in Druid and parts table is in hive For following query from SSB Benchmark - 
   ```sql
   select sum(total_revenue) from druid.ssb_lineorder_100, hive.ssb_lineorder_100 WHERE lo_partkey = p_partkey and p_category = 'MFGR#14';
   ```
   In the above query Hive can scan parts table, create a bloom filter for possible values for p_part_key where p_category = 'MFGR#14'. This bloom filter can then be pushed to Druid reducing the data that needs to scanned and transferred between Druid and Hive. 
   Since BloomFilter is probablistic data structure and can have false positives. Hive will still need to do filtering while processing joins. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org