You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "waynexia (via GitHub)" <gi...@apache.org> on 2023/05/04 07:38:45 UTC

[GitHub] [arrow-datafusion] waynexia commented on a diff in pull request #6135: Substrait: Fix incorrect join key fields (indices) when same table is being used more than once

waynexia commented on code in PR #6135:
URL: https://github.com/apache/arrow-datafusion/pull/6135#discussion_r1184592657


##########
datafusion/substrait/src/logical_plan/producer.rs:
##########
@@ -15,7 +15,7 @@
 // specific language governing permissions and limitations
 // under the License.
 
-use std::{collections::HashMap, sync::Arc};
+use std::{collections::HashMap, usize};

Review Comment:
   style: This type should be in the scope by default
   
   ```suggestion
   use std::collections::HashMap;
   ```



##########
datafusion/substrait/tests/roundtrip_logical_plan.rs:
##########
@@ -278,6 +278,18 @@ mod tests {
         roundtrip("SELECT a,b,c,d,e FROM datafusion.public.data;").await
     }
 
+    #[tokio::test]
+    async fn roundtrip_inner_join_table_reuse() -> Result<()> {
+        assert_expected_plan(
+            "SELECT d1.b, d2.c FROM data d1 JOIN data d2 ON d1.a = d2.a",
+            "Projection: data.b, data.c\
+            \n  Inner Join: data.a = data.a\
+            \n    TableScan: data projection=[a, b]\
+            \n    TableScan: data projection=[a, c]",

Review Comment:
   I tried this case against the main branch and it can pass. I guess this is not the original query that causes the error?
   
   BTW, if two tables have the same name, does it means they are the same table? (for this case, `d1` and `d2` are the same table `data`). If so I think we need not distinguish "different" columns because they will eventually refer to the same column. Please let me know if I missed anything



##########
datafusion/substrait/src/logical_plan/producer.rs:
##########
@@ -363,6 +344,40 @@ pub fn to_substrait_rel(
     }
 }
 
+fn to_substrait_join_expr(
+    join_conditions: &Vec<(Expr, Expr)>,
+    eq_op: Operator,
+    left_schema: &DFSchemaRef,
+    right_schema: &DFSchemaRef,
+    extension_info: &mut (
+        Vec<extensions::SimpleExtensionDeclaration>,
+        HashMap<String, u32>,
+    ),
+) -> Result<Expression> {
+    // Only support AND confunction for each binary expression in join conditions

Review Comment:
   maybe a typo?
   ```suggestion
       // Only support AND conjunction for each binary expression in join conditions
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org