You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Cristian Espinoza <cr...@u-planner.com> on 2014/09/03 18:09:13 UTC

Querying protocol buffers files with Drill

Hi,

I'm evaluating Drill and until now it looks great. My idea is to use it to
directly query some protocol buffers files so they appear to the rest of my
JEE app as a datasource. But I've been unable to find any information in
the documentation about the proper way to register the file system,
specifically the format I have to use. The docs present examples for csv,
json and parquet formats, but there's none about protobuf.

Is this possible to do? According to Drill's description it may be.

Many thanks in advance,

Cristián Espinoza

Re: Querying protocol buffers files with Drill

Posted by Cristian Espinoza <cr...@u-planner.com>.
Many thanks to Yash and Jason for your answers about this. I will explore
two alternatives for now:

   - Using Hive as proposed by Jason. I'll read more on elephant-bird to
   check the way to import protobuf data into Hive.
   - I'm also reading about ways to convert protobuf files to parquet, a
   format Drill is able to use as a datasource. I believe I can do this using
   parquet-mr (https://github.com/Parquet/parquet-mr).

Cristian


> Hi Christian,
>
>While we do not have a native protobuf reader for Drill, we do support Hive
>Serdes as an input format. This will not be the fastest way to get your
>data into the Drill engine, but it should be less coding than writing a
>record reader for drill.
>
>If you need performance and are up for learning a bit more about Drill, we
>would certainly welcome a contribution of a protobuf reader and would be
>happy to help you get started.
>
>-Jason Altekruse


On Wed, Sep 3, 2014 at 10:58 AM, Yash Sharma <ya...@gmail.com> wrote:

> Hey Cristian, currently we do not have  protobuf readers in Drill. It would
> however be possible to add new readers in Drill by creating new
> RecordReaders.
>
> Yash.




On Wed, Sep 3, 2014 at 1:09 PM, Cristian Espinoza <
cristian.espinoza@u-planner.com> wrote:

> Hi,
>
> I'm evaluating Drill and until now it looks great. My idea is to use it to
> directly query some protocol buffers files so they appear to the rest of my
> JEE app as a datasource. But I've been unable to find any information in
> the documentation about the proper way to register the file system,
> specifically the format I have to use. The docs present examples for csv,
> json and parquet formats, but there's none about protobuf.
>
> Is this possible to do? According to Drill's description it may be.
>
> Many thanks in advance,
>
> Cristián Espinoza
>
>
>

Re: Querying protocol buffers files with Drill

Posted by Jason Altekruse <al...@gmail.com>.
Hi Christian,

While we do not have a native protobuf reader for Drill, we do support Hive
Serdes as an input format. This will not be the fastest way to get your
data into the Drill engine, but it should be less coding than writing a
record reader for drill.

If you need performance and are up for learning a bit more about Drill, we
would certainly welcome a contribution of a protobuf reader and would be
happy to help you get started.

-Jason Altekruse


On Wed, Sep 3, 2014 at 10:58 AM, Yash Sharma <ya...@gmail.com> wrote:

> Hey Cristian, currently we do not have  protobuf readers in Drill. It would
> however be possible to add new readers in Drill by creating new
> RecordReaders.
>
> Yash.
>
> Sent from phone. Pardon Typos.
> On 03/09/2014 11:14 pm, "Cristian Espinoza" <
> cristian.espinoza@u-planner.com>
> wrote:
>
> > Hi,
> >
> > I'm evaluating Drill and until now it looks great. My idea is to use it
> to
> > directly query some protocol buffers files so they appear to the rest of
> my
> > JEE app as a datasource. But I've been unable to find any information in
> > the documentation about the proper way to register the file system,
> > specifically the format I have to use. The docs present examples for csv,
> > json and parquet formats, but there's none about protobuf.
> >
> > Is this possible to do? According to Drill's description it may be.
> >
> > Many thanks in advance,
> >
> > Cristián Espinoza
> >
>

Re: Querying protocol buffers files with Drill

Posted by Yash Sharma <ya...@gmail.com>.
Hey Cristian, currently we do not have  protobuf readers in Drill. It would
however be possible to add new readers in Drill by creating new
RecordReaders.

Yash.

Sent from phone. Pardon Typos.
On 03/09/2014 11:14 pm, "Cristian Espinoza" <cr...@u-planner.com>
wrote:

> Hi,
>
> I'm evaluating Drill and until now it looks great. My idea is to use it to
> directly query some protocol buffers files so they appear to the rest of my
> JEE app as a datasource. But I've been unable to find any information in
> the documentation about the proper way to register the file system,
> specifically the format I have to use. The docs present examples for csv,
> json and parquet formats, but there's none about protobuf.
>
> Is this possible to do? According to Drill's description it may be.
>
> Many thanks in advance,
>
> Cristián Espinoza
>