You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by John Omernik <jo...@omernik.com> on 2016/05/29 11:55:11 UTC

Clarification on Drill Options

Hey all, when looking at the drill options, and specifically as I was
trying to understand the parquet options, I realized that the naming of the
options was forming "question" as I looked at them. What do I mean?
Consider:

+--------------------------------------------+

|                    name                    |

+--------------------------------------------+

| store.parquet.block-size                   |

| store.parquet.compression                  |

| store.parquet.dictionary.page-size         |

| store.parquet.enable_dictionary_encoding   |

| store.parquet.page-size                    |

| store.parquet.use_new_reader               |

| store.parquet.vector_fill_check_threshold  |

| store.parquet.vector_fill_threshold        |

+--------------------------------------------+



So I will remove "store.parquet" as I refer to them here:


use_new_reader - This seems fairly obvious an "on read" options and
(maybe?) does affect the Parquet writer, yet "enable_dictionary_encoding"
is likely ONLY an on write option.... correct? I mean, if the Parquet file
was written somewhere else, and written with Dictionary encoding, Drill
will still read it ok, regardless of this setting. Compression as well, if
the Parquet file was created with gzip, and this setting is snappy, it will
still read it, same goes for block size. Thus, those seem to be "writer"
settings, rather than reader settings.


So what about the vector settings? Write or Read (or both?) For json there
is this setting: | store.json.writer.uglify    which seems to be writer
focused and obviously writer, but for other settings, knowing what the
setting applies to, on write, on read, neither, or both, could be very
useful for troubleshooting and knowing which settings to play with.


Now, changing these settings as they are is not recommended, even in my
test clusters, I have scripts that alter them for specific ETLs, and I
would hate to have things break, but how hard would it be to add a string
column to sys.options something like "applies_to" with write, read, both,
neither, n/a as options?   I think this could be valuable for users and
administrators of Drill.


One other note, in addition to the applies_to,  would it be horrifically
difficult to add a  "description" field for options?  Self documenting
settings sure would be handy....  :)


John

Re: Clarification on Drill Options

Posted by John Omernik <jo...@omernik.com>.
Looks like some work has been done here, any chance we can move this along?

https://issues.apache.org/jira/browse/DRILL-4699


Thanks!

On Tue, May 31, 2016 at 12:51 PM, John Omernik <jo...@omernik.com> wrote:

> I added a JIRA related to this:
>
> https://issues.apache.org/jira/browse/DRILL-4699
>
> On Sun, May 29, 2016 at 6:55 AM, John Omernik <jo...@omernik.com> wrote:
>
>> Hey all, when looking at the drill options, and specifically as I was
>> trying to understand the parquet options, I realized that the naming of the
>> options was forming "question" as I looked at them. What do I mean?
>> Consider:
>>
>> +--------------------------------------------+
>>
>> |                    name                    |
>>
>> +--------------------------------------------+
>>
>> | store.parquet.block-size                   |
>>
>> | store.parquet.compression                  |
>>
>> | store.parquet.dictionary.page-size         |
>>
>> | store.parquet.enable_dictionary_encoding   |
>>
>> | store.parquet.page-size                    |
>>
>> | store.parquet.use_new_reader               |
>>
>> | store.parquet.vector_fill_check_threshold  |
>>
>> | store.parquet.vector_fill_threshold        |
>>
>> +--------------------------------------------+
>>
>>
>>
>> So I will remove "store.parquet" as I refer to them here:
>>
>>
>> use_new_reader - This seems fairly obvious an "on read" options and
>> (maybe?) does affect the Parquet writer, yet "enable_dictionary_encoding"
>> is likely ONLY an on write option.... correct? I mean, if the Parquet file
>> was written somewhere else, and written with Dictionary encoding, Drill
>> will still read it ok, regardless of this setting. Compression as well, if
>> the Parquet file was created with gzip, and this setting is snappy, it will
>> still read it, same goes for block size. Thus, those seem to be "writer"
>> settings, rather than reader settings.
>>
>>
>> So what about the vector settings? Write or Read (or both?) For json
>> there is this setting: | store.json.writer.uglify    which seems to be
>> writer focused and obviously writer, but for other settings, knowing what
>> the setting applies to, on write, on read, neither, or both, could be very
>> useful for troubleshooting and knowing which settings to play with.
>>
>>
>> Now, changing these settings as they are is not recommended, even in my
>> test clusters, I have scripts that alter them for specific ETLs, and I
>> would hate to have things break, but how hard would it be to add a string
>> column to sys.options something like "applies_to" with write, read, both,
>> neither, n/a as options?   I think this could be valuable for users and
>> administrators of Drill.
>>
>>
>> One other note, in addition to the applies_to,  would it be horrifically
>> difficult to add a  "description" field for options?  Self documenting
>> settings sure would be handy....  :)
>>
>>
>> John
>>
>>
>>
>

Re: Clarification on Drill Options

Posted by John Omernik <jo...@omernik.com>.
I added a JIRA related to this:

https://issues.apache.org/jira/browse/DRILL-4699

On Sun, May 29, 2016 at 6:55 AM, John Omernik <jo...@omernik.com> wrote:

> Hey all, when looking at the drill options, and specifically as I was
> trying to understand the parquet options, I realized that the naming of the
> options was forming "question" as I looked at them. What do I mean?
> Consider:
>
> +--------------------------------------------+
>
> |                    name                    |
>
> +--------------------------------------------+
>
> | store.parquet.block-size                   |
>
> | store.parquet.compression                  |
>
> | store.parquet.dictionary.page-size         |
>
> | store.parquet.enable_dictionary_encoding   |
>
> | store.parquet.page-size                    |
>
> | store.parquet.use_new_reader               |
>
> | store.parquet.vector_fill_check_threshold  |
>
> | store.parquet.vector_fill_threshold        |
>
> +--------------------------------------------+
>
>
>
> So I will remove "store.parquet" as I refer to them here:
>
>
> use_new_reader - This seems fairly obvious an "on read" options and
> (maybe?) does affect the Parquet writer, yet "enable_dictionary_encoding"
> is likely ONLY an on write option.... correct? I mean, if the Parquet file
> was written somewhere else, and written with Dictionary encoding, Drill
> will still read it ok, regardless of this setting. Compression as well, if
> the Parquet file was created with gzip, and this setting is snappy, it will
> still read it, same goes for block size. Thus, those seem to be "writer"
> settings, rather than reader settings.
>
>
> So what about the vector settings? Write or Read (or both?) For json there
> is this setting: | store.json.writer.uglify    which seems to be writer
> focused and obviously writer, but for other settings, knowing what the
> setting applies to, on write, on read, neither, or both, could be very
> useful for troubleshooting and knowing which settings to play with.
>
>
> Now, changing these settings as they are is not recommended, even in my
> test clusters, I have scripts that alter them for specific ETLs, and I
> would hate to have things break, but how hard would it be to add a string
> column to sys.options something like "applies_to" with write, read, both,
> neither, n/a as options?   I think this could be valuable for users and
> administrators of Drill.
>
>
> One other note, in addition to the applies_to,  would it be horrifically
> difficult to add a  "description" field for options?  Self documenting
> settings sure would be handy....  :)
>
>
> John
>
>
>