You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "nkarpov (via GitHub)" <gi...@apache.org> on 2023/04/13 01:57:44 UTC

[GitHub] [arrow-datafusion-python] nkarpov opened a new pull request, #324: adding support for join_on to python bindings

nkarpov opened a new pull request, #324:
URL: https://github.com/apache/arrow-datafusion-python/pull/324

   # Which issue does this PR close?
   Should close https://github.com/apache/arrow-datafusion-python/issues/323
   
   # What changes are included in this PR?
   # Are there any user-facing changes?
   Yes there are user facing changes, namely we're extending the PyDataFrame to also expose the underlying `join_on`
   
   # More
   
   This PR is mostly to provoke the discussion. If there's some existing reason we don't expose it, or perhaps extend `join` to support it, please let me know. Just looking for some feedback. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] nkarpov commented on a diff in pull request #324: adding support for join_on to python bindings

Posted by "nkarpov (via GitHub)" <gi...@apache.org>.
nkarpov commented on code in PR #324:
URL: https://github.com/apache/arrow-datafusion-python/pull/324#discussion_r1165545986


##########
src/dataframe.rs:
##########
@@ -212,6 +212,39 @@ impl PyDataFrame {
         Ok(Self::new(df))
     }
 
+    /// Thin wrapper for datafusion-rust `join_on`
+    /// Slightly modified from original `join` above.
+    fn join_on(
+            &self,
+            right: PyDataFrame,
+            on_exprs: Vec<PyExpr>,
+            how: &str,
+        ) -> PyResult<Self> {
+        let join_type = match how {
+            "inner" => JoinType::Inner,
+            "left" => JoinType::Left,
+            "right" => JoinType::Right,
+            "full" => JoinType::Full,
+            "semi" => JoinType::LeftSemi,
+            "anti" => JoinType::LeftAnti,
+            how => {
+                return Err(DataFusionError::Common(format!(
+                    "The join type {how} does not exist or is not implemented"
+                ))
+                .into());
+            }

Review Comment:
   I think so, but the trait doesn't appear to implemented https://github.com/apache/arrow-datafusion/blob/main/datafusion/expr/src/logical_plan/plan.rs#L1115-L1156, so I copied the approach from `join` in this same file for now.
   
   Should I add it to the main repo? Should it be a blocker for this change here?
   
   I'm new to this community so please bear with my questions 😬!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] Jimexist commented on a diff in pull request #324: adding support for join_on to python bindings

Posted by "Jimexist (via GitHub)" <gi...@apache.org>.
Jimexist commented on code in PR #324:
URL: https://github.com/apache/arrow-datafusion-python/pull/324#discussion_r1165115002


##########
src/dataframe.rs:
##########
@@ -212,6 +212,39 @@ impl PyDataFrame {
         Ok(Self::new(df))
     }
 
+    /// Thin wrapper for datafusion-rust `join_on`
+    /// Slightly modified from original `join` above.
+    fn join_on(
+            &self,
+            right: PyDataFrame,
+            on_exprs: Vec<PyExpr>,
+            how: &str,
+        ) -> PyResult<Self> {
+        let join_type = match how {
+            "inner" => JoinType::Inner,
+            "left" => JoinType::Left,
+            "right" => JoinType::Right,
+            "full" => JoinType::Full,
+            "semi" => JoinType::LeftSemi,
+            "anti" => JoinType::LeftAnti,
+            how => {
+                return Err(DataFusionError::Common(format!(
+                    "The join type {how} does not exist or is not implemented"
+                ))
+                .into());
+            }

Review Comment:
   shouldn't this be covered by the https://doc.rust-lang.org/std/str/trait.FromStr.html impl for the enum?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] nkarpov commented on pull request #324: adding support for join_on to python bindings

Posted by "nkarpov (via GitHub)" <gi...@apache.org>.
nkarpov commented on PR #324:
URL: https://github.com/apache/arrow-datafusion-python/pull/324#issuecomment-1518441123

   @Jimexist I made the change in the rust repo as you suggested and it's merged in. I've updated this implementation to use it too. This throws an error though because we're still referencing the 22.0.0 build. I'm not familiar with how these repos are cut, so please let me know how we'll proceed once the change in rust side is actually cut in its release. Thanks! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org