You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by Praveen Krishna <pr...@tutanota.com> on 2019/06/06 11:14:42 UTC

Pluggable index for ORC

Hi all, 

       Can we have a custom stream (custom StreamKind Enum) that might hold a some index for a some columns. We are trying to index a Varchar column in an ORC file using Lucene and store the index as a part of that Varchar column. For a VARCHAR column we have currently three streams 
1. Present stream 
2. Data stream 
3. Length stream 

      It would be better if we could have an Index stream or a StreamKind which would represent a index chunk so that in future index for some columns can be computed and stored as a part of that column. 

Regards, 
Praveen Krishna D

Re: Pluggable index for ORC

Posted by Dain Sundstrom <da...@iq80.com>.
It would be nice if the there were some reserved space in the enums for experimentations like this.

-dain

----
Dain Sundstrom
Co-founder @ Presto Software Foundation, Co-creator of Presto (https://prestosql.io)

> On Jun 6, 2019, at 4:14 AM, Praveen Krishna <pr...@tutanota.com> wrote:
> 
> Hi all, 
> 
>        Can we have a custom stream (custom StreamKind Enum) that might hold a some index for a some columns. We are trying to index a Varchar column in an ORC file using Lucene and store the index as a part of that Varchar column. For a VARCHAR column we have currently three streams 
> 1. Present stream 
> 2. Data stream 
> 3. Length stream 
> 
>       It would be better if we could have an Index stream or a StreamKind which would represent a index chunk so that in future index for some columns can be computed and stored as a part of that column. 
> 
> Regards, 
> Praveen Krishna D