You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/03/28 15:24:06 UTC

[GitHub] [arrow-rs] alamb opened a new issue, #3969: [REGRESSION] Parsing dates of the form `2022-1-1` that used to parse (now needs `2022-01-01`)

alamb opened a new issue, #3969:
URL: https://github.com/apache/arrow-rs/issues/3969

   **Describe the bug**
   In arrow `34.0.0`, values such as `'2021-1-1T05:11:10.432'` were parsed as a timestamp but now they error
   
   **To Reproduce**
   
   Try to parse this value as a timestamp: `'2021-1-1T05:11:10.432'`
   
   
   **Expected behavior**
   
   The value `'2021-1-1T05:11:10.432'` parses the same as `'2021-01-01T05:11:10.432'`
   
   Workaround is to change it to (add leading zeros) `'2021-01-01T05:11:10.432'`
   
   **Additional context**
   We discovered this as part of the upgrade in DataFusion https://github.com/apache/arrow-datafusion/pull/5685
   
   Per @tustvold it appears that `2021-1-1T05:11:10.432` is supported by chrono even though it is not strictly valid and arrow was not documented to support this format: https://github.com/apache/arrow-datafusion/pull/5685#discussion_r1150732948
   
   It may be that we simply choose not to fix this regression as the previous implementation was working "by accident" but I wanted to file this ticket to track the change in behavior


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold closed issue #3969: [REGRESSION] Parsing dates of the form `2022-1-1` that used to parse (now needs `2022-01-01`)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold closed issue #3969: [REGRESSION] Parsing dates of the form `2022-1-1` that used to parse (now needs `2022-01-01`)
URL: https://github.com/apache/arrow-rs/issues/3969


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #3969: [REGRESSION] Parsing dates of the form `2022-1-1` that used to parse (now needs `2022-01-01`)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #3969:
URL: https://github.com/apache/arrow-rs/issues/3969#issuecomment-1487143900

   > It may be that we simply choose not to fix this regression as the previous implementation was working "by accident"
   
   Indeed, chrono's implementation of strptime is incredibly permissive, to the point where some of the undocumented behaviour is actually considered a bug - https://github.com/chronotope/chrono/issues/332. In particular [the docs](https://docs.rs/chrono/0.4.24/chrono/format/strftime/index.html) state that `%m` and similar expect 0 padded inputs, however, the implementation is more permissive.
   
   We strive to support RFC3339 and reasonable variants thereupon, we therefore added additional cases based on chrono's strptime to supplement chrono's very rigid RFC3339 support, which in turn led to chrono's permissive strptime implementation leaking through arrow's parsing abstraction.
   
   Given RFC3339 requires that days and months should be [two digits](https://www.rfc-editor.org/rfc/rfc3339#page-8) and years must be [4 digits](https://www.rfc-editor.org/rfc/rfc3339#page-4), and we have never claimed support for such timestamps, I do not personally consider this a regression and do not plan to change it.
   
   That being said if someone wants to contribute additional functionality for this and is able to do so in a way that doesn't regress parsing performance, I would be fine with it. The major motivation for not supporting it, is that knowing ahead of time the character layout is what allows the parser to be more efficient than otherwise.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #3969: [REGRESSION] Parsing dates of the form `2022-1-1` that used to parse (now needs `2022-01-01`)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #3969:
URL: https://github.com/apache/arrow-rs/issues/3969#issuecomment-1572188931

   I'm closing this as 2 months have passed with no complaints


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org