You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "mikelui (via GitHub)" <gi...@apache.org> on 2023/03/26 05:28:41 UTC

[GitHub] [arrow] mikelui opened a new issue, #34729: Better support for Maps in Pandas

mikelui opened a new issue, #34729:
URL: https://github.com/apache/arrow/issues/34729

   ### Describe the enhancement requested
   
   Today (Py)Arrow -> Pandas treats:
   
   1. structs as Python dicts (pydicts)
   2. maps as Python list of tuples (i.e. [(key1, value1), (key2, value2), ...]
   
   While treating maps as a list of tuples has various pros (preserve ordering, allows duplicates, speed of iteration/creation), many times users simply want a ... map! (i.e. pydict). 
   
   Having to convert every element via `dict(map_elem)` is cumbersome, slow, and downright nasty when working with arbitrarily nested maps in Pandas.
   
   Today, Pyarrow already supports (pydicts -> arrow maps) when a schema is provided. So, it's a known use-case. 
   
   I propose a simple switch in PandasOptions for `table.to_pandas(...)` to generate pydicts for maps. This creates a symmetrical option for the (pydict -> arrow maps), as well.
   
   ----
   
   As alluded to above, the cons are that:
   1. Users lose ordering.
   2. Duplicates will be removed, resulting in potential data loss. This should be made clear to the user.
   3. Potential ambiguity when examining data that has both maps and structs
   
   I think the upsides of ergonomic flexibility outweigh these cons.
   
   ----
   
   Separately, I think there's a bug that precludes (pydicts -> arrow maps) when the type is nested (e.g. list of maps). That should be fixed as well to provide a more featureful map experience.
   
   ### Component(s)
   
   C++, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 closed issue #34729: Better support for Maps in Pandas

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 closed issue #34729: Better support for Maps in Pandas
URL: https://github.com/apache/arrow/issues/34729


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org