You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Jim Scott <js...@maprtech.com> on 2015/10/30 16:48:25 UTC

DRILL-3423

Jacques, or anyone else that can answer...

The httpd log parser has a unique capability to be able to parse out fields
in a query string by specifying an asterisk.

I would like to enable this capability within the plugin, but I want to
verify that there will not be any weird effects based on certain situations.
e.g.
When parsing out field HTTP.URI:request.firstline.uri.query.<PARAM_NAME>
where PARAM_NAME is identified by an asterisk the parser will pull out each
name for drill to work with the data. In the code, if I come across a field
like this and add a new column of data will this cause any issues?

to be more specific... 1 line in one file has a unique PARAM_NAME and it
gets added at some arbitrary point, will Drill be ok with this? I would
normally assume YES, however, I don't want to go back in later and fix this
if the answer is NO :-)

Thanks!
Jim

Re: DRILL-3423

Posted by Jacques Nadeau <ja...@dremio.com>.
Hey Jim,

The answer is it depends...  :)

Drill is designed fundamentally designed to handle schema changes. However,
many of the items are not finished around this. As such, if you have an
event like this occur after a record batch boundary, you'll have issues
currently. A batch boundary is generally ~4000 records. As such, whether
you have issues depends on how likely to see all known patterns within a
~4000 record window. If you are only likely to see a subset in that window,
you'll probably have issues. If you expect to see all combinations in that
window, you'll probably be fine.

A number of people are actively working on solving this issue. For example,
Amit is currently working on resolving issues in schema change with regards
to the join operators. Within a couple months, we hope that these issues
will be resolved.

Hope that helps.
Jacques

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Fri, Oct 30, 2015 at 8:48 AM, Jim Scott <js...@maprtech.com> wrote:

> Jacques, or anyone else that can answer...
>
> The httpd log parser has a unique capability to be able to parse out
> fields in a query string by specifying an asterisk.
>
> I would like to enable this capability within the plugin, but I want to
> verify that there will not be any weird effects based on certain situations.
> e.g.
> When parsing out field HTTP.URI:request.firstline.uri.query.<PARAM_NAME>
> where PARAM_NAME is identified by an asterisk the parser will pull out
> each name for drill to work with the data. In the code, if I come across a
> field like this and add a new column of data will this cause any issues?
>
> to be more specific... 1 line in one file has a unique PARAM_NAME and it
> gets added at some arbitrary point, will Drill be ok with this? I would
> normally assume YES, however, I don't want to go back in later and fix this
> if the answer is NO :-)
>
> Thanks!
> Jim
>