You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/23 23:52:54 UTC

[GitHub] [arrow] arw2019 opened a new pull request #7822: ARROW-9096: [Python] Pandas roundtrip with object-dtype column labels with integer(floating) values: data type "integer"("floating") not understood

arw2019 opened a new pull request #7822:
URL: https://github.com/apache/arrow/pull/7822


   - [x] closes ARROW-9096
   - [x] tests added & passed
   
   This PR fixes the roundtrip conversion for Pandas DataFrames whose column index is numeric but has `dtype=object`, such as
   ```
   df = pd.DataFrame([1], columns=pd.Index([1], dtype=object))  # underlying int
   df = pd.DataFrame([1], columns=pd.Index([1.1], dtype=object)) # underlying float
   df = pd.DataFrame([1], columns=pd.Index([datetime(2018, 1, 1)], dtype='object')) # underlying datetime
   ```
   https://issues.apache.org/jira/browse/ARROW-3651 largely solved the datetime variant of this problem (such that the conversion ran correctly excepting that the dtype after roundtrip did not match). With the current fix a roundtrip of the problematic DataFrames from ARROW-3651 returns the exact original frame.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 commented on a change in pull request #7822: ARROW-9096: [Python] Pandas roundtrip with dtype="object" underlying numeric column index

Posted by GitBox <gi...@apache.org>.
arw2019 commented on a change in pull request #7822:
URL: https://github.com/apache/arrow/pull/7822#discussion_r460084581



##########
File path: python/pyarrow/pandas_compat.py
##########
@@ -1080,8 +1083,10 @@ def _reconstruct_columns_from_metadata(columns, column_indexes):
     ]
 
     # Convert each level to the dtype provided in the metadata
+    # ARROW-9096: need numpy_type to match cast against original DataFrame

Review comment:
       makes sense - it's easy to see what's going on w/o this line so dropped it as suggested




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 commented on pull request #7822: ARROW-9096: [Python] Pandas roundtrip with dtype="object" underlying numeric column index

Posted by GitBox <gi...@apache.org>.
arw2019 commented on pull request #7822:
URL: https://github.com/apache/arrow/pull/7822#issuecomment-669656466


   thanks @wesm @emkornfield for reviewing!!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #7822: ARROW-9096: [Python] Pandas roundtrip with dtype="object" underlying numeric column index

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #7822:
URL: https://github.com/apache/arrow/pull/7822#issuecomment-663368625


   CC @jorisvandenbossche 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm closed pull request #7822: ARROW-9096: [Python] Pandas roundtrip with dtype="object" underlying numeric column index

Posted by GitBox <gi...@apache.org>.
wesm closed pull request #7822:
URL: https://github.com/apache/arrow/pull/7822


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #7822: ARROW-9096: [Python] Pandas roundtrip with dtype="object" underlying numeric column index

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #7822:
URL: https://github.com/apache/arrow/pull/7822#discussion_r459872595



##########
File path: python/pyarrow/pandas_compat.py
##########
@@ -1009,12 +1009,15 @@ def _is_generated_index_name(name):
     return re.match(pattern, name) is not None
 
 
+# ARROW-9096: added integer and floating

Review comment:
       nit: generally only have comment like this for TODOs.  git blame/git log can track when things were added.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7822: ARROW-9096: [Python] Pandas roundtrip with dtype="object" underlying numeric column index

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7822:
URL: https://github.com/apache/arrow/pull/7822#issuecomment-663292114


   https://issues.apache.org/jira/browse/ARROW-9096


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #7822: ARROW-9096: [Python] Pandas roundtrip with dtype="object" underlying numeric column index

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #7822:
URL: https://github.com/apache/arrow/pull/7822#discussion_r459873169



##########
File path: python/pyarrow/pandas_compat.py
##########
@@ -1080,8 +1083,10 @@ def _reconstruct_columns_from_metadata(columns, column_indexes):
     ]
 
     # Convert each level to the dtype provided in the metadata
+    # ARROW-9096: need numpy_type to match cast against original DataFrame

Review comment:
       nit: drop JIRA reference.  maybe also the whole comment since this is used directly below?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org