You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/09 16:49:15 UTC

[GitHub] [arrow] rezaeir opened a new issue #7385: bad handling of Int64 column of dataframe when reading in R with read_feather

rezaeir opened a new issue #7385:
URL: https://github.com/apache/arrow/issues/7385


   I'm new with using the arrow pacakge. my code is as follows:  
   ```
   # Python
   import pandas as pd
   df = pd.DataFrame(dict(a=[1,2], b=[3,4]))
   
   df.to_feather('test.feather')
   ```
   then 
   ```
   # R
   library('arrow')
   df = read_feather('test.feather')
   data.matrix(df)
   ```
   which instead of coercing to double gives me this unusual output
   ```
   a	        b
   4.940656e-324	1.482197e-323
   9.881313e-324	1.976263e-323
   
   ```
   Which part of my code is wrong or What should I do here to fix this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nealrichardson closed issue #7385: bad handling of Int64 column of dataframe when reading in R with read_feather

Posted by GitBox <gi...@apache.org>.

nealrichardson closed issue #7385:
URL: https://github.com/apache/arrow/issues/7385


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nealrichardson commented on issue #7385: bad handling of Int64 column of dataframe when reading in R with read_feather

Posted by GitBox <gi...@apache.org>.

nealrichardson commented on issue #7385:
URL: https://github.com/apache/arrow/issues/7385#issuecomment-641472127


   This appears to be a feature of how `data.matrix()` interacts with `bit64::integer64` class objects. Here's a reprex without involving `arrow`:
   
   ```
   > df <- data.frame(a=bit64::as.integer64(1:2), b=bit64::as.integer64(3:4))
   > df
     a b
   1 1 3
   2 2 4
   > data.matrix(df)
                    a             b
   [1,] 4.940656e-324 1.482197e-323
   [2,] 9.881313e-324 1.976263e-323
   ```
   
   You could fix this in your example either by providing a schema in Python with int32 types, or by calling `as.integer` on the columns of your data.frame before calling `data.matrix()`.
   
   One could argue that we should downcast int64 to int32 if there are no out of bounds values since that's what R can natively handle. I made https://issues.apache.org/jira/browse/ARROW-9083 to consider that.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org