You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Bob Rudis <bo...@rud.is> on 2017/01/17 03:05:31 UTC

WARC files

Hey folks,

Does anyone know if there have been UDFs made to enable working with
WARC files in Drill?

WARC: http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml

thx,

-Bob

Re: WARC files

Posted by rahul challapalli <ch...@gmail.com>.
I believe what you you need is a format plugin.

Once you manage to read a file and populate drill's internal data
structures(value vectors), then the format of the file no longer comes into
picture. So from here on you can use any sql operators (filter, join etc)
or UDF's

To my knowledge there is no format plugin available for drill to read WARC
files. However if hive supports reading WARC files, then you can use drill
and query them through the hive plugin for better query runtimes.

- Rahul

On Mon, Jan 16, 2017 at 7:05 PM, Bob Rudis <bo...@rud.is> wrote:

> Hey folks,
>
> Does anyone know if there have been UDFs made to enable working with
> WARC files in Drill?
>
> WARC: http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml
>
> thx,
>
> -Bob
>