You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@drill.apache.org by "Charles Givre (Jira)" <ji...@apache.org> on 2019/10/28 15:09:00 UTC

[jira] [Updated] (DRILL-7423) Create More Efficient Way to Read Excel Cells

     [ https://issues.apache.org/jira/browse/DRILL-7423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Charles Givre updated DRILL-7423:
---------------------------------
    Description: 
The Excel format plugin reads cells but there are ways to make the reading process more efficient.  Since the schema of an Excel file is not known in advance, Drill must read the first row of data in order to extract the schema.  

It is actually a bit more complex.  To read the schema, Drill must first read the header rows and convert them all into Strings.  This gets us the header names if present.

Drill cannot create writers until it actually reads the first row of data where it will determine the data types.  This creates an inefficiency in that when Drill is writing the columns, it has to do a hash lookup for each column.  Since the columns are in a fixed order, it may be possible to store the writers in a 

  was:The Excel format plugin reads cells but there are ways to make the reading process more efficient.  


> Create More Efficient Way to Read Excel Cells
> ---------------------------------------------
>
>                 Key: DRILL-7423
>                 URL: https://issues.apache.org/jira/browse/DRILL-7423
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.18.0
>            Reporter: Charles Givre
>            Priority: Major
>
> The Excel format plugin reads cells but there are ways to make the reading process more efficient.  Since the schema of an Excel file is not known in advance, Drill must read the first row of data in order to extract the schema.  
> It is actually a bit more complex.  To read the schema, Drill must first read the header rows and convert them all into Strings.  This gets us the header names if present.
> Drill cannot create writers until it actually reads the first row of data where it will determine the data types.  This creates an inefficiency in that when Drill is writing the columns, it has to do a hash lookup for each column.  Since the columns are in a fixed order, it may be possible to store the writers in a 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)