You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jonkeane (via GitHub)" <gi...@apache.org> on 2023/03/28 18:32:51 UTC

[GitHub] [arrow] jonkeane commented on pull request #34744: GH-15248: [R] R to Arrow to R roundtrip adds a ptype column attribute to list columns

jonkeane commented on PR #34744:
URL: https://github.com/apache/arrow/pull/34744#issuecomment-1487418207

   Agreed with @paleolimbot our implementation of arrow lists <-> list columns was always a bit funny (and not entirely fully fleshed out). I remember one feature we do use that for is columns that contain data.frames and IIRC, there's some special casing of that in our own code.
   
   > I think the root cause is that our "arrow_list" vctrs class is half-baked (not necessarily that adding the .ptype or returning a vctrs_list_of is a problem.
   > 
   > (Alternatively, you could fix the vctrs implementation for arrow_list and friends but unless somebody pipes up in defense of it I think we should wipe all references to it and commit to vctrs_list_of)
   
   It would be great to briefly breakdown what these two would mean pros/cons that kind of thing — it doesn't have to be exhaustive, but I don't have a good list in my head right now of why one would be better than the other.
   
   Another thing I might be misremembering (so take it with a grain of salt + verify I'm not making this up): there is a slight mistmatch between what kind of heterogeneity is allowed in an R list versus and Arrow list. IIRC, (for variable sized lists) Arrow still requires types to be consistent across elements, but R does not. That should be totally ok with roundtripping arrow -> R -> arrow, though we might need to catch if an R list isn't compatible. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org