You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Gaurav Sheni (Jira)" <ji...@apache.org> on 2022/05/11 21:14:00 UTC

[jira] [Created] (ARROW-16540) Support storing different timezone in an array

Gaurav Sheni created ARROW-16540:
------------------------------------

             Summary: Support storing different timezone in an array 
                 Key: ARROW-16540
                 URL: https://issues.apache.org/jira/browse/ARROW-16540
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Format, Python
            Reporter: Gaurav Sheni


As a user, I wish I could use pyarrow to store a column of datetimes with different timezones. In certain datasets, it is ideal to a column with mixed timezones (ex - taxi pickups). Even if the data is limited to a single location (let's say a business in NYC for example) over the time span of a single year... then your timezones will be EDT/EST with offsets of -4:00 and -5:00.

 

Currently, it is not possible to keep a column with different timezones.

 
{code:java}
import pytz
import pyarrow as pa
import pytz
 
arr = pa.array([datetime(2010, 1, 1,  tzinfo=pytz.timezone('US/Central')), datetime(2015, 1, 1, tzinfo=pytz.timezone('US/Eastern'))])
arr.type
arr[0]
arr[1]
{code}
 

 
{code:java}
TimestampType(timestamp[us, tz=US/Central])

<pyarrow.TimestampScalar: datetime.datetime(2014, 12, 31, 18, 0, tzinfo=<DstTzInfo 'US/Central' CST-1 day, 18:00:00 STD>)>

Out[25]: <pyarrow.TimestampScalar: datetime.datetime(2009, 12, 31, 18, 0, tzinfo=<DstTzInfo 'US/Central' CST-1 day, 18:00:00 STD>)>{code}
 

> Notice how both rows have Central timezone now

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)