You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2022/01/15 18:27:47 UTC

[GitHub] [drill] pjfanning opened a new pull request #2427: DRILL-8095: upgrade to poi 5.2.0 and ignore cell styles

pjfanning opened a new pull request #2427:
URL: https://github.com/apache/drill/pull/2427


   # [DRILL-8095](https://issues.apache.org/jira/browse/DRILL-8095): upgrade to poi 5.2.0 and ignore cell styles
   
   ## Description
   
   upgrade to poi 5.2.0 and ignore cell styles (to avoid unnecessary parse time)
   
   ## Documentation
   No changes
   
   ## Testing
   existing unit tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] cgivre merged pull request #2427: DRILL-8095: upgrade to poi 5.2.0

Posted by GitBox <gi...@apache.org>.
cgivre merged pull request #2427:
URL: https://github.com/apache/drill/pull/2427


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] pjfanning commented on pull request #2427: DRILL-8095: upgrade to poi 5.2.0

Posted by GitBox <gi...@apache.org>.
pjfanning commented on pull request #2427:
URL: https://github.com/apache/drill/pull/2427#issuecomment-1013768187


   @cgivre see latest comment on https://issues.apache.org/jira/browse/DRILL-8095 - not loading the styles breaks existing Drill tests - so I abandoned that bit of the change


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] cgivre commented on pull request #2427: DRILL-8095: upgrade to poi 5.2.0

Posted by GitBox <gi...@apache.org>.
cgivre commented on pull request #2427:
URL: https://github.com/apache/drill/pull/2427#issuecomment-1013770006


   > tbh styles should used if you are inferring the schema - I just thought the existing code wasn't using them
   > 
   > One enhancement that I think Drill needs is to allow the format-excel code to process a number of rows (maybe defaulting to something like 5 or 10) when inferring the schema - this would help if the first data row has null values in some columns - or you might have a column that some times has numeric values but other times text and in that case the schema would need to make that cell text based.
   
   I'd definitely agree with this approach.  I am actually working on a storage plugin for Google Sheets [1] which does exactly that.  There is a `typifier` class which reads the data from a column and attempts to infer the data type.  Perhaps once that gets merged, we can reuse the `typifier` for the Excel reader and use this approach. 
   
   
   [1]: https://github.com/cgivre/drill/tree/storage-googlesheets/contrib/storage-googlesheets


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] cgivre commented on pull request #2427: DRILL-8095: upgrade to poi 5.2.0

Posted by GitBox <gi...@apache.org>.
cgivre commented on pull request #2427:
URL: https://github.com/apache/drill/pull/2427#issuecomment-1013768959


   > @cgivre see latest comment on https://issues.apache.org/jira/browse/DRILL-8095 - not loading the styles breaks existing Drill tests - so I abandoned that bit of the change
   
   Roger that.... Sorry I didn't see that.  We don't have to include it in this PR, but if you can recommend a better way of inferring the data type that would be more performant, I'm all ears.  I did it using the Cell style because that seemed to make the most sense.   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] pjfanning commented on pull request #2427: DRILL-8095: upgrade to poi 5.2.0

Posted by GitBox <gi...@apache.org>.
pjfanning commented on pull request #2427:
URL: https://github.com/apache/drill/pull/2427#issuecomment-1013769560


   tbh styles should used if you are inferring the schema - I just thought the existing code wasn't using them
   
   One enhancement that I think Drill needs is to allow the format-excel code to process a number of rows (maybe defaulting to something like 5 or 10) when inferring the schema - this would help if the first data row has null values in some columns - or you might have a column that some times has numeric values but other times text and in that case the schema would need to make that cell text based.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org