You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/21 15:49:57 UTC

[GitHub] [arrow] markweissman opened a new issue, #13408: Pyarrow join handles nulls differently than pandas merge.

markweissman opened a new issue, #13408:
URL: https://github.com/apache/arrow/issues/13408

   pyarrow.Table.join handles nulls differently than pandas.  Please consider either changing or documenting this.
   
   (Pdb) df=pa.Table.from_pandas(pd.DataFrame(dict(x=[None, "foo"]))); df.join(df, "x", join_type="inner").to_pandas()
        x
   0  foo
   (Pdb) df=pd.DataFrame(dict(x=[None, "foo"])); df.merge(df, on="x", how="inner")
         x
   0  None
   1   foo
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on issue #13408: Pyarrow join handles nulls differently than pandas merge.

Posted by GitBox <gi...@apache.org>.
wjones127 commented on issue #13408:
URL: https://github.com/apache/arrow/issues/13408#issuecomment-1163470295

   cc @amol- 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] markweissman commented on issue #13408: Pyarrow join handles nulls differently than pandas merge.

Posted by GitBox <gi...@apache.org>.
markweissman commented on issue #13408:
URL: https://github.com/apache/arrow/issues/13408#issuecomment-1163529320

   @amol- Thanks much!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] amol- commented on issue #13408: Pyarrow join handles nulls differently than pandas merge.

Posted by GitBox <gi...@apache.org>.
amol- commented on issue #13408:
URL: https://github.com/apache/arrow/issues/13408#issuecomment-1163515796

   Well, usually `NULL` values represent missing / unknown values thus there is no way to match them against anything else. So it's usually the common behaviour to exclude Null keys.
   
   But we can make that an option of the join operation and switch between. I created https://issues.apache.org/jira/browse/ARROW-16882 to track such feature.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] amol- closed issue #13408: Pyarrow join handles nulls differently than pandas merge.

Posted by GitBox <gi...@apache.org>.
amol- closed issue #13408: Pyarrow join handles nulls differently than pandas merge.
URL: https://github.com/apache/arrow/issues/13408


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org