You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2021/12/09 00:23:12 UTC

[GitHub] [drill] pjfanning opened a new pull request #2400: DRILL-8071: use excel cell.getLocalDateTimeCellValue

pjfanning opened a new pull request #2400:
URL: https://github.com/apache/drill/pull/2400


   # [DRILL-8071](https://issues.apache.org/jira/browse/DRILL-8071): use excel cell.getLocalDateTimeCellValue
   
   ## Description
   
   Use built-in POI/excel-streaming-reader method to get date time - this is safer than writing custom code to convert excel numbers to date times. In particular, this method takes into account if the workbook stores date times with 1900 or 1904 as the base date.
   
   ## Documentation
   No changes
   
   ## Testing
   Unit tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] pjfanning commented on pull request #2400: DRILL-8071: use excel cell.getLocalDateTimeCellValue

Posted by GitBox <gi...@apache.org>.
pjfanning commented on pull request #2400:
URL: https://github.com/apache/drill/pull/2400#issuecomment-990324203


   @cgivre I removed the new logging
   
   I agree that it would be nice to have some configuration to decide what to with errors. I think this PR is probably better to stick to just the date time conversion issue.
   
   If Drill does not currently have support for supporting conversion errors - could I suggest that one possible approach is to have excel like cell comments? - so that Drill could put a default value in the Drill cell equivalent but attach a comment with the error. It would still be useful to support fail fast as an option too (ie fail the upload if any cells can't be loaded as the type the schema says they are).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] cgivre commented on pull request #2400: DRILL-8071: use excel cell.getLocalDateTimeCellValue

Posted by GitBox <gi...@apache.org>.
cgivre commented on pull request #2400:
URL: https://github.com/apache/drill/pull/2400#issuecomment-990305838


   > 
   
   @pjfanning Thanks for this PR.  You've hit on an issue which is subject to a lot of different opinions.  The issue at stake is how do you read data without a schema and what do you do when the data is inconsistent?
   
   I'm approaching this from the data scientist/user perspective and I am just one data point, but here goes.  I really like the way Pandas handles the whole situation.  Specifically, Pandas does attempt to infer data types, but also gives the user a lot of control over the behavior when things go wrong.   I think Pandas lets the user decide if it throws an exception, ignores it, or coerces a value.  
   
   I don't know if that's more work than you want to do, but I do feel that logging isn't quite enough because that assumes that a user has logging enabled and can view the logs.  If there was a config option on error handling, it could be changed at query time with the `table()` function.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] pjfanning edited a comment on pull request #2400: DRILL-8071: use excel cell.getLocalDateTimeCellValue

Posted by GitBox <gi...@apache.org>.
pjfanning edited a comment on pull request #2400:
URL: https://github.com/apache/drill/pull/2400#issuecomment-990324203


   @cgivre I removed the new logging
   
   I agree that it would be nice to have some configuration to decide what to with errors. I think this PR is probably better to stick to just the date time conversion issue.
   
   If Drill does not currently have support for supporting conversion errors - could I suggest that one possible approach is to have excel-like cell comments? - so that Drill could put a default value in the Drill cell equivalent but attach a comment with the error. It would still be useful to support fail fast as an option too (ie fail the upload if any cells can't be loaded as the type the schema says they are).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] pjfanning commented on pull request #2400: DRILL-8071: use excel cell.getLocalDateTimeCellValue

Posted by GitBox <gi...@apache.org>.
pjfanning commented on pull request #2400:
URL: https://github.com/apache/drill/pull/2400#issuecomment-989858078


   @cgivre this PR covers one of the issues in DRILL-8701 - is it ok that I use the try-catch block and just log errors that happen when cell.getLocalDateTimeCellValue fails?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] cgivre commented on pull request #2400: DRILL-8071: use excel cell.getLocalDateTimeCellValue

Posted by GitBox <gi...@apache.org>.
cgivre commented on pull request #2400:
URL: https://github.com/apache/drill/pull/2400#issuecomment-990319067


   @pjfanning 
   What if we added a boolean config option `nullOnInvalidDate` or something like that, and we could let the user decide what they want?  Personally, I like the idea of being able to read through the data w/o error even when it is invalid.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] cgivre merged pull request #2400: DRILL-8071: use excel cell.getLocalDateTimeCellValue

Posted by GitBox <gi...@apache.org>.
cgivre merged pull request #2400:
URL: https://github.com/apache/drill/pull/2400


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org