You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Jim Scott <js...@maprtech.com> on 2015/06/18 20:37:16 UTC

regex format

I recall at some point the topic of supporting a regex format record reader
came up, and I thought that Jacques said that he had this built into the
test framework and that at some point it should find its way into a release
use.

The closest ticket I can find is: DRILL-739 and I don't think that quite
covers the request.

Just wondering if there is any status on this.

To clarify the topic it would be to create a file format where you could
define a regular expression so that when text files are loaded they can be
parsed based on that regex. Effectively the grouping from the regular
expression would result in columns[n] for each record.

Re: regex format

Posted by Ted Dunning <te...@gmail.com>.
Yes.  It would.  Or the author could host source for a GPL drill add-on.



On Thu, Jun 18, 2015 at 6:00 PM, Matthew Burgess <ma...@gmail.com>
wrote:

> The logparser license is GPLv3, I'm guessing Drill would need a
> dual-license
> from the author?
>
> From:  Ted Dunning <te...@gmail.com>
> Reply-To:  <de...@drill.apache.org>
> Date:  Thursday, June 18, 2015 at 8:32 PM
> To:  "dev@drill.apache.org" <de...@drill.apache.org>
> Subject:  Re: regex format
>
> The msot common use of a regex parser in my experience is to parse log
> files.  A better way to parse log files that use CLF format specifiers is
> with the logparse package.
>
> See https://github.com/nielsbasjes/logparser
>
> Should the efforts be focused there?
>
>
>
> On Thu, Jun 18, 2015 at 3:11 PM, Jacques Nadeau <ja...@apache.org>
> wrote:
>
> >  I have some pieces but I don't think there was a Jira out for it.  The
> >  proposal seems good but I'm not sure what is the right way to manage
> >  configuration.  My thought is that is should probably based on udtf but
> we
> >  don't have that facility yet.  We should put something together that
> >  describes how those should work in drill first I would think.
> >  On Jun 18, 2015 11:38 AM, "Jim Scott" <js...@maprtech.com> wrote:
> >
> >>  > I recall at some point the topic of supporting a regex format record
> >  reader
> >>  > came up, and I thought that Jacques said that he had this built into
> the
> >>  > test framework and that at some point it should find its way into a
> >  release
> >>  > use.
> >>  >
> >>  > The closest ticket I can find is: DRILL-739 and I don't think that
> quite
> >>  > covers the request.
> >>  >
> >>  > Just wondering if there is any status on this.
> >>  >
> >>  > To clarify the topic it would be to create a file format where you
> could
> >>  > define a regular expression so that when text files are loaded they
> can
> >  be
> >>  > parsed based on that regex. Effectively the grouping from the regular
> >>  > expression would result in columns[n] for each record.
> >>  >
> >
>
>
>
>

Re: regex format

Posted by Matthew Burgess <ma...@gmail.com>.
The logparser license is GPLv3, I'm guessing Drill would need a dual-license
from the author?

From:  Ted Dunning <te...@gmail.com>
Reply-To:  <de...@drill.apache.org>
Date:  Thursday, June 18, 2015 at 8:32 PM
To:  "dev@drill.apache.org" <de...@drill.apache.org>
Subject:  Re: regex format

The msot common use of a regex parser in my experience is to parse log
files.  A better way to parse log files that use CLF format specifiers is
with the logparse package.

See https://github.com/nielsbasjes/logparser

Should the efforts be focused there?



On Thu, Jun 18, 2015 at 3:11 PM, Jacques Nadeau <ja...@apache.org> wrote:

>  I have some pieces but I don't think there was a Jira out for it.  The
>  proposal seems good but I'm not sure what is the right way to manage
>  configuration.  My thought is that is should probably based on udtf but we
>  don't have that facility yet.  We should put something together that
>  describes how those should work in drill first I would think.
>  On Jun 18, 2015 11:38 AM, "Jim Scott" <js...@maprtech.com> wrote:
> 
>>  > I recall at some point the topic of supporting a regex format record
>  reader
>>  > came up, and I thought that Jacques said that he had this built into the
>>  > test framework and that at some point it should find its way into a
>  release
>>  > use.
>>  >
>>  > The closest ticket I can find is: DRILL-739 and I don't think that quite
>>  > covers the request.
>>  >
>>  > Just wondering if there is any status on this.
>>  >
>>  > To clarify the topic it would be to create a file format where you could
>>  > define a regular expression so that when text files are loaded they can
>  be
>>  > parsed based on that regex. Effectively the grouping from the regular
>>  > expression would result in columns[n] for each record.
>>  >
> 




Re: regex format

Posted by Ted Dunning <te...@gmail.com>.
The msot common use of a regex parser in my experience is to parse log
files.  A better way to parse log files that use CLF format specifiers is
with the logparse package.

See https://github.com/nielsbasjes/logparser

Should the efforts be focused there?



On Thu, Jun 18, 2015 at 3:11 PM, Jacques Nadeau <ja...@apache.org> wrote:

> I have some pieces but I don't think there was a Jira out for it.  The
> proposal seems good but I'm not sure what is the right way to manage
> configuration.  My thought is that is should probably based on udtf but we
> don't have that facility yet.  We should put something together that
> describes how those should work in drill first I would think.
> On Jun 18, 2015 11:38 AM, "Jim Scott" <js...@maprtech.com> wrote:
>
> > I recall at some point the topic of supporting a regex format record
> reader
> > came up, and I thought that Jacques said that he had this built into the
> > test framework and that at some point it should find its way into a
> release
> > use.
> >
> > The closest ticket I can find is: DRILL-739 and I don't think that quite
> > covers the request.
> >
> > Just wondering if there is any status on this.
> >
> > To clarify the topic it would be to create a file format where you could
> > define a regular expression so that when text files are loaded they can
> be
> > parsed based on that regex. Effectively the grouping from the regular
> > expression would result in columns[n] for each record.
> >
>

Re: regex format

Posted by Jacques Nadeau <ja...@apache.org>.
I have some pieces but I don't think there was a Jira out for it.  The
proposal seems good but I'm not sure what is the right way to manage
configuration.  My thought is that is should probably based on udtf but we
don't have that facility yet.  We should put something together that
describes how those should work in drill first I would think.
On Jun 18, 2015 11:38 AM, "Jim Scott" <js...@maprtech.com> wrote:

> I recall at some point the topic of supporting a regex format record reader
> came up, and I thought that Jacques said that he had this built into the
> test framework and that at some point it should find its way into a release
> use.
>
> The closest ticket I can find is: DRILL-739 and I don't think that quite
> covers the request.
>
> Just wondering if there is any status on this.
>
> To clarify the topic it would be to create a file format where you could
> define a regular expression so that when text files are loaded they can be
> parsed based on that regex. Effectively the grouping from the regular
> expression would result in columns[n] for each record.
>