You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/05/14 12:18:27 UTC

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7169: ARROW-5359: [Python] Support loading non-nanosecond out-of-range timestamps

jorisvandenbossche commented on a change in pull request #7169:
URL: https://github.com/apache/arrow/pull/7169#discussion_r425087732



##########
File path: python/pyarrow/tests/test_pandas.py
##########
@@ -3941,3 +3945,28 @@ def test_metadata_compat_missing_field_name():
     result = table.to_pandas()
     # on python 3.5 the column order can differ -> adding check_like=True
     tm.assert_frame_equal(result, expected, check_like=True)
+
+
+@pytest.mark.skipif(pq is None, reason="Parquet not available")

Review comment:
       ```suggestion
   @pytest.mark.parquet
   ```
   
   (the mark will automatically ensure the tests are skipped as well)

##########
File path: python/pyarrow/array.pxi
##########
@@ -549,6 +550,11 @@ cdef class _PandasConvertible:
             Cast integers with nulls to objects
         date_as_object : bool, default True
             Cast dates to objects. If False, convert to datetime64[ns] dtype.
+        timestamp_as_object : bool, default False
+            Cast non-nanosecond timestamps (np.datetime64) to objects. This is

Review comment:
       Shoudn't we also force this conversion when you have nanosecond timestamps? If I specify `timestamp_as_object=True`, I would expect to actually get datetime objects, even if my unit is nanoseconds?

##########
File path: python/pyarrow/pandas_compat.py
##########
@@ -699,6 +699,17 @@ def _reconstruct_block(item, columns=None, extension_columns=None):
 
     block_arr = item.get('block', None)
     placement = item['placement']
+
+    if (
+            (block_arr is not None) and
+            (block_arr.dtype.type == np.datetime64) and
+            (block_arr.dtype.name != "datetime64[ns]")
+    ):
+        # Non-nanosecond timestamps can express much larger values than
+        # nanosecond timestamps, and pandas checks that the values fit into
+        # nanosecond range, so this needs to be an object as dtype.
+        block_arr = block_arr.astype(np.dtype("O"))

Review comment:
       Can you explain why this is needed? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org