You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Arina Ielchiieva (Jira)" <ji...@apache.org> on 2019/10/23 15:40:00 UTC

[jira] [Created] (DRILL-7419) Enhance Drill splitting logic for compressed files

Arina Ielchiieva created DRILL-7419:
---------------------------------------

             Summary: Enhance Drill splitting logic for compressed files
                 Key: DRILL-7419
                 URL: https://issues.apache.org/jira/browse/DRILL-7419
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.16.0
            Reporter: Arina Ielchiieva


By default Drill treats all compressed files are non splittable. Drill uses BlockMapBuilder to split file into blocks if possible. According to its code, it tries to split the file if blockSplittable is set to true and file IS NOT compressed. So even if format is block splittable but came as compressed file, it won't be split.
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/schedule/BlockMapBuilder.java#L115

But some compression codecs can be splittable, for example; bzip2 (https://i.stack.imgur.com/jpprr.jpg). Codec type should be taken into account when considering if file can be split.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)