You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Nishadi Kirielle <nd...@gmail.com> on 2016/06/29 17:49:44 UTC

Bitmap Indexing to increase OLAP query performance

Hi All,

I am a CSE undergraduate and as for our final year project, we are
expecting to construct a cluster based, bit-oriented analytic platform
(storage engine) to provide fast query performance when used for OLAP with
the use of novel bitmap indexing techniques when and where appropriate.

For that we are expecting to use Spark SQL. We will need to implement a way
to cache the bit map indexes and in-cooperate the use of bitmap indexing at
the catalyst optimizer level when it is possible.

I would highly appreciate your feedback regarding the proposed approach.

Thank you & Regards

Nishadi Kirielle
Department of Computer Science and Engineering
University of Moratuwa
Sri Lanka

Re: Bitmap Indexing to increase OLAP query performance

Posted by Michael Allman <mi...@videoamp.com>.
Hi Nishadi,

I have not seen bloom filters in Spark. They are mentioned as part of the Orc file format, but I don't know if Spark uses them: https://orc.apache.org/docs/spec-index.html. Parquet has block-level min/max values, null counts, etc for leaf columns in its metadata. I don't believe Spark uses those directly either, though the underlying column reader may. See https://github.com/apache/parquet-mr/tree/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata and https://github.com/apache/parquet-mr/tree/master/parquet-column/src/main/java/org/apache/parquet/column/statistics.

Michael


> On Jun 29, 2016, at 11:27 PM, Nishadi Kirielle <nd...@gmail.com> wrote:
> 
> Thank you for the response. 
> Can I please know the reason why bit map indexes are not appropriate for big data. 
> Rather than using the traditional bitmap indexing techniques we are planning to implement a combination of novel bitmap indexing techniques like bit sliced indexes and projection indexes. 
> Furthermore, can I please know whether bloom filters have already been implemented in Spark.
> 
> Thank you
> 
> On Thu, Jun 30, 2016 at 12:51 AM, Jörn Franke <jornfranke@gmail.com <ma...@gmail.com>> wrote:
> 
> Is it the traditional bitmap indexing? I would not recommend it for big data. You could use bloom filters and min/max indexes in-memory which look to be more appropriate. However, if you want to use bitmap indexes then you would have to do it as you say. However, bitmap indexes may consume a lot of memory, so I am not sure that simply caching them in-memory is desired.
> 
> > On 29 Jun 2016, at 19:49, Nishadi Kirielle <ndimeshi@gmail.com <ma...@gmail.com>> wrote:
> >
> > Hi All,
> >
> > I am a CSE undergraduate and as for our final year project, we are expecting to construct a cluster based, bit-oriented analytic platform (storage engine) to provide fast query performance when used for OLAP with the use of novel bitmap indexing techniques when and where appropriate.
> >
> > For that we are expecting to use Spark SQL. We will need to implement a way to cache the bit map indexes and in-cooperate the use of bitmap indexing at the catalyst optimizer level when it is possible.
> >
> > I would highly appreciate your feedback regarding the proposed approach.
> >
> > Thank you & Regards
> >
> > Nishadi Kirielle
> > Department of Computer Science and Engineering
> > University of Moratuwa
> > Sri Lanka
> 


Re: Bitmap Indexing to increase OLAP query performance

Posted by Nishadi Kirielle <nd...@gmail.com>.
Thank you for the response.
Can I please know the reason why bit map indexes are not appropriate for
big data.
Rather than using the traditional bitmap indexing techniques we are
planning to implement a combination of novel bitmap indexing techniques
like bit sliced indexes and projection indexes.
Furthermore, can I please know whether bloom filters have already been
implemented in Spark.

Thank you

On Thu, Jun 30, 2016 at 12:51 AM, Jörn Franke <jo...@gmail.com> wrote:

>
> Is it the traditional bitmap indexing? I would not recommend it for big
> data. You could use bloom filters and min/max indexes in-memory which look
> to be more appropriate. However, if you want to use bitmap indexes then you
> would have to do it as you say. However, bitmap indexes may consume a lot
> of memory, so I am not sure that simply caching them in-memory is desired.
>
> > On 29 Jun 2016, at 19:49, Nishadi Kirielle <nd...@gmail.com> wrote:
> >
> > Hi All,
> >
> > I am a CSE undergraduate and as for our final year project, we are
> expecting to construct a cluster based, bit-oriented analytic platform
> (storage engine) to provide fast query performance when used for OLAP with
> the use of novel bitmap indexing techniques when and where appropriate.
> >
> > For that we are expecting to use Spark SQL. We will need to implement a
> way to cache the bit map indexes and in-cooperate the use of bitmap
> indexing at the catalyst optimizer level when it is possible.
> >
> > I would highly appreciate your feedback regarding the proposed approach.
> >
> > Thank you & Regards
> >
> > Nishadi Kirielle
> > Department of Computer Science and Engineering
> > University of Moratuwa
> > Sri Lanka
>

Re: Bitmap Indexing to increase OLAP query performance

Posted by Jörn Franke <jo...@gmail.com>.
Is it the traditional bitmap indexing? I would not recommend it for big data. You could use bloom filters and min/max indexes in-memory which look to be more appropriate. However, if you want to use bitmap indexes then you would have to do it as you say. However, bitmap indexes may consume a lot of memory, so I am not sure that simply caching them in-memory is desired. 

> On 29 Jun 2016, at 19:49, Nishadi Kirielle <nd...@gmail.com> wrote:
> 
> Hi All,
> 
> I am a CSE undergraduate and as for our final year project, we are expecting to construct a cluster based, bit-oriented analytic platform (storage engine) to provide fast query performance when used for OLAP with the use of novel bitmap indexing techniques when and where appropriate. 
> 
> For that we are expecting to use Spark SQL. We will need to implement a way to cache the bit map indexes and in-cooperate the use of bitmap indexing at the catalyst optimizer level when it is possible.
> 
> I would highly appreciate your feedback regarding the proposed approach.
> 
> Thank you & Regards
> 
> Nishadi Kirielle
> Department of Computer Science and Engineering
> University of Moratuwa
> Sri Lanka 

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org