You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by BaiRan <li...@icloud.com> on 2016/06/22 02:45:49 UTC
Question about Bloom Filter in Spark 2.0
Hi all,
I have a question about bloom filter implementation in Spark-12818 issue. If I have a ORC file with bloom filter metadata, how can I utilise it by Spark SQL?
Thanks.
Best,
Ran
Re: Question about Bloom Filter in Spark 2.0
Posted by Jörn Franke <jo...@gmail.com>.
You should see at it both levels: there is one bloom filter for Orc data and one for data in-memory.
It is already a good step towards an integration of format and in-memory representation for columnar data.
> On 22 Jun 2016, at 14:01, BaiRan <li...@icloud.com> wrote:
>
> After building bloom filter on existing data, does spark engine utilise bloom filter during query processing?
> Is there any plan about predicate push down by using bloom filter in ORC / Parquet?
>
> Thanks
> Ran
>> On 22 Jun, 2016, at 10:48 am, Reynold Xin <rx...@databricks.com> wrote:
>>
>> SPARK-12818 is about building a bloom filter on existing data. It has nothing to do with the ORC bloom filter, which can be used to do predicate pushdown.
>>
>>
>>> On Tue, Jun 21, 2016 at 7:45 PM, BaiRan <li...@icloud.com> wrote:
>>> Hi all,
>>>
>>> I have a question about bloom filter implementation in Spark-12818 issue. If I have a ORC file with bloom filter metadata, how can I utilise it by Spark SQL?
>>> Thanks.
>>>
>>> Best,
>>> Ran
>
Re: Question about Bloom Filter in Spark 2.0
Posted by BaiRan <li...@icloud.com>.
After building bloom filter on existing data, does spark engine utilise bloom filter during query processing?
Is there any plan about predicate push down by using bloom filter in ORC / Parquet?
Thanks
Ran
> On 22 Jun, 2016, at 10:48 am, Reynold Xin <rx...@databricks.com> wrote:
>
> SPARK-12818 is about building a bloom filter on existing data. It has nothing to do with the ORC bloom filter, which can be used to do predicate pushdown.
>
>
> On Tue, Jun 21, 2016 at 7:45 PM, BaiRan <lizbai@icloud.com <ma...@icloud.com>> wrote:
> Hi all,
>
> I have a question about bloom filter implementation in Spark-12818 issue. If I have a ORC file with bloom filter metadata, how can I utilise it by Spark SQL?
> Thanks.
>
> Best,
> Ran
>
Re: Question about Bloom Filter in Spark 2.0
Posted by Reynold Xin <rx...@databricks.com>.
SPARK-12818 is about building a bloom filter on existing data. It has
nothing to do with the ORC bloom filter, which can be used to do predicate
pushdown.
On Tue, Jun 21, 2016 at 7:45 PM, BaiRan <li...@icloud.com> wrote:
> Hi all,
>
> I have a question about bloom filter implementation in Spark-12818 issue.
> If I have a ORC file with bloom filter metadata, how can I utilise it by
> Spark SQL?
> Thanks.
>
> Best,
> Ran
>