You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "PJ Fanning (Jira)" <ji...@apache.org> on 2021/12/07 11:07:00 UTC

[jira] [Created] (DRILL-8071) format-excel should use POI DataFormatter

PJ Fanning created DRILL-8071:
---------------------------------

             Summary: format-excel should use POI DataFormatter
                 Key: DRILL-8071
                 URL: https://issues.apache.org/jira/browse/DRILL-8071
             Project: Apache Drill
          Issue Type: Improvement
          Components: Execution - Data Types
            Reporter: PJ Fanning


The existing ExcelBatchReader uses the raw data values from the cells. This raw data ignores formatting set on the cells. As an example, numbers and dates are stored as doubles. With the POI DataFormatter, you can get the cell style applied so that the data will appear as it does when you view the data in Excel itself.

[https://poi.apache.org/apidocs/dev/org/apache/poi/ss/usermodel/DataFormatter.html#formatCellValue-org.apache.poi.ss.usermodel.Cell-]

 

A big number like 123456789.987654 could be stored as double that is more like 123456789.9876539999999 when represented in decimal format (because this might be the closest match that double can represent). The cell format could say that cell has 6 decimal places after the decimal point so the formatter would round the number back to the value that it displayed in Excel as.

Code that processes excel files is a real pain to get right because the Microsoft storage format is really bad.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)