You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "WillAyd (via GitHub)" <gi...@apache.org> on 2024/04/17 20:41:15 UTC

[I] How to work with dates outside of CPython bounds in pyarrow? [arrow]

WillAyd opened a new issue, #41266:
URL: https://github.com/apache/arrow/issues/41266

   ### Describe the usage question you have. Please include as many useful details as  possible.
   
   
   The CPython date implementation has a limit on date objects where the year must be >= 1 and <= 9999. The date32 pyarrow object will happily accept dates that are outside of these bounds:
   
   ```python
   old_date = pa.scalar(-1_000_000, type=pa.date32())
   ```
   
   But it doesn't seem like you can do anything with this object really. Just trying to print it will yield:
   
   ```python
   OverflowError: date value out of range
   ```
   
   and there do not seem to be any elements to introspect aside from the value (number of days since Epoch). Is there some other way to interact with this object and at least inspect the year / month / day components on the object itself? Or is the expectation that users go through compute for those?
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python] How to work with dates outside of CPython bounds in pyarrow? [arrow]

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #41266:
URL: https://github.com/apache/arrow/issues/41266#issuecomment-2063171837

   > But it doesn't seem like you can do anything with this object really. Just trying to print it will yield:
   
   I think it is actually _only_ the conversion to a datetime.date object that doesn't work, and as a side-effect also the printing of a scalar which relies on that (or conversion to pandas). Most other functionality should work normally (pretty printing of the array, all generic array operations, casting, datetime related kernels, etc)
   
   ```
   >>> old_date = pa.scalar(-1_000_000, type=pa.date32())
   >>> arr = pa.array([old_date])
   >>> arr
   <pyarrow.lib.Date32Array object at 0x7f06641d11e0>
   [
     -0768-02-04
   ]
   >>> arr.cast(pa.timestamp("s"))
   <pyarrow.lib.TimestampArray object at 0x7f0664a2f880>
   [
     -0768-02-04 00:00:00
   ]
   
   >>> import pyarrow.compute as pc
   >>>  pc.year(arr)
   <pyarrow.lib.Int64Array object at 0x7f065c2d3580>
   [
     -768
   ]
   ```
   
   But yes, given that we can pretty-print such values, we shouldn't rely on the to-python-object conversion for the repr of the scalar.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org