You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by BaiRan <li...@icloud.com> on 2016/06/22 02:45:49 UTC

Question about Bloom Filter in Spark 2.0

Hi all,

I have a question about bloom filter implementation in Spark-12818 issue. If I have a ORC file with bloom filter metadata, how can I utilise it by Spark SQL?
Thanks.

Best,
Ran

Re: Question about Bloom Filter in Spark 2.0

Posted by Jörn Franke <jo...@gmail.com>.
You should see at it both levels: there is one bloom filter for Orc data and one for data in-memory. 

It is already a good step towards an integration of format and in-memory representation for columnar data. 

> On 22 Jun 2016, at 14:01, BaiRan <li...@icloud.com> wrote:
> 
> After building bloom filter on existing data, does spark engine utilise bloom filter during query processing?
> Is there any plan about predicate push down by using bloom filter in ORC / Parquet?
> 
> Thanks
> Ran
>> On 22 Jun, 2016, at 10:48 am, Reynold Xin <rx...@databricks.com> wrote:
>> 
>> SPARK-12818 is about building a bloom filter on existing data. It has nothing to do with the ORC bloom filter, which can be used to do predicate pushdown.
>> 
>> 
>>> On Tue, Jun 21, 2016 at 7:45 PM, BaiRan <li...@icloud.com> wrote:
>>> Hi all,
>>> 
>>> I have a question about bloom filter implementation in Spark-12818 issue. If I have a ORC file with bloom filter metadata, how can I utilise it by Spark SQL?
>>> Thanks.
>>> 
>>> Best,
>>> Ran
> 

Re: Question about Bloom Filter in Spark 2.0

Posted by BaiRan <li...@icloud.com>.
After building bloom filter on existing data, does spark engine utilise bloom filter during query processing?
Is there any plan about predicate push down by using bloom filter in ORC / Parquet?

Thanks
Ran
> On 22 Jun, 2016, at 10:48 am, Reynold Xin <rx...@databricks.com> wrote:
> 
> SPARK-12818 is about building a bloom filter on existing data. It has nothing to do with the ORC bloom filter, which can be used to do predicate pushdown.
> 
> 
> On Tue, Jun 21, 2016 at 7:45 PM, BaiRan <lizbai@icloud.com <ma...@icloud.com>> wrote:
> Hi all,
> 
> I have a question about bloom filter implementation in Spark-12818 issue. If I have a ORC file with bloom filter metadata, how can I utilise it by Spark SQL?
> Thanks.
> 
> Best,
> Ran
> 


Re: Question about Bloom Filter in Spark 2.0

Posted by Reynold Xin <rx...@databricks.com>.
SPARK-12818 is about building a bloom filter on existing data. It has
nothing to do with the ORC bloom filter, which can be used to do predicate
pushdown.


On Tue, Jun 21, 2016 at 7:45 PM, BaiRan <li...@icloud.com> wrote:

> Hi all,
>
> I have a question about bloom filter implementation in Spark-12818 issue.
> If I have a ORC file with bloom filter metadata, how can I utilise it by
> Spark SQL?
> Thanks.
>
> Best,
> Ran
>