You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by John Omernik <jo...@omernik.com> on 2017/05/02 17:54:28 UTC

Re: Clarification on Drill Options

Looks like some work has been done here, any chance we can move this along?

https://issues.apache.org/jira/browse/DRILL-4699


Thanks!

On Tue, May 31, 2016 at 12:51 PM, John Omernik <jo...@omernik.com> wrote:

> I added a JIRA related to this:
>
> https://issues.apache.org/jira/browse/DRILL-4699
>
> On Sun, May 29, 2016 at 6:55 AM, John Omernik <jo...@omernik.com> wrote:
>
>> Hey all, when looking at the drill options, and specifically as I was
>> trying to understand the parquet options, I realized that the naming of the
>> options was forming "question" as I looked at them. What do I mean?
>> Consider:
>>
>> +--------------------------------------------+
>>
>> |                    name                    |
>>
>> +--------------------------------------------+
>>
>> | store.parquet.block-size                   |
>>
>> | store.parquet.compression                  |
>>
>> | store.parquet.dictionary.page-size         |
>>
>> | store.parquet.enable_dictionary_encoding   |
>>
>> | store.parquet.page-size                    |
>>
>> | store.parquet.use_new_reader               |
>>
>> | store.parquet.vector_fill_check_threshold  |
>>
>> | store.parquet.vector_fill_threshold        |
>>
>> +--------------------------------------------+
>>
>>
>>
>> So I will remove "store.parquet" as I refer to them here:
>>
>>
>> use_new_reader - This seems fairly obvious an "on read" options and
>> (maybe?) does affect the Parquet writer, yet "enable_dictionary_encoding"
>> is likely ONLY an on write option.... correct? I mean, if the Parquet file
>> was written somewhere else, and written with Dictionary encoding, Drill
>> will still read it ok, regardless of this setting. Compression as well, if
>> the Parquet file was created with gzip, and this setting is snappy, it will
>> still read it, same goes for block size. Thus, those seem to be "writer"
>> settings, rather than reader settings.
>>
>>
>> So what about the vector settings? Write or Read (or both?) For json
>> there is this setting: | store.json.writer.uglify    which seems to be
>> writer focused and obviously writer, but for other settings, knowing what
>> the setting applies to, on write, on read, neither, or both, could be very
>> useful for troubleshooting and knowing which settings to play with.
>>
>>
>> Now, changing these settings as they are is not recommended, even in my
>> test clusters, I have scripts that alter them for specific ETLs, and I
>> would hate to have things break, but how hard would it be to add a string
>> column to sys.options something like "applies_to" with write, read, both,
>> neither, n/a as options?   I think this could be valuable for users and
>> administrators of Drill.
>>
>>
>> One other note, in addition to the applies_to,  would it be horrifically
>> difficult to add a  "description" field for options?  Self documenting
>> settings sure would be handy....  :)
>>
>>
>> John
>>
>>
>>
>