You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/12/30 18:19:29 UTC

[GitHub] [beam] TheNeuralBit commented on a change in pull request #13634: [BEAM-11532] Fix edge case in merge where left_on and right_on contain equivalent column names

TheNeuralBit commented on a change in pull request #13634:
URL: https://github.com/apache/beam/pull/13634#discussion_r550284105



##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -1218,15 +1219,32 @@ def merge(
     merged = frame_base.DeferredFrame.wrap(
         expressions.ComputedExpression(
             'merge',
-            lambda left, right: left.merge(
-                right, left_index=True, right_index=True, **kwargs),
+            lambda left, right: left.merge(right,
+                                           left_index=True,
+                                           right_index=True,
+                                           suffixes=suffixes,
+                                           **kwargs),
             [indexed_left._expr, indexed_right._expr],
             preserves_partition_by=partitionings.Singleton(),
             requires_partition_by=partitionings.Index()))
 
     if left_index or right_index:
       return merged
     else:
+      common_cols = set(left_on).intersection(right_on)
+      if len(common_cols):
+        # When merging on the same column name from both dfs, merged will have
+        # two duplicate columns, one with lsuffix and one with rsuffix.
+        # Normally pandas de-dupes these into a single column with no suffix.
+        # This replicates that logic by dropping the _right_ dupe, and removing
+        # the suffix from the _left_ dupe.
+        lsuffix, rsuffix = suffixes
+        merged = merged.drop(

Review comment:
       In that case I think it would have been renamed to {col}{rsuffix}{rsuffix} - this is a good edge case to think about though. I'll look at adding some more test cases. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org