You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/23 16:10:44 UTC

[GitHub] [arrow-datafusion] gaojun2048 opened a new pull request #2070: [Bug][Datafusion] fix TaskContext session_config bug

gaojun2048 opened a new pull request #2070:
URL: https://github.com/apache/arrow-datafusion/pull/2070


   closes https://github.com/apache/arrow-datafusion/issues/2069
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #2070: [Bug][Datafusion] fix TaskContext session_config bug

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #2070:
URL: https://github.com/apache/arrow-datafusion/pull/2070#discussion_r834588274



##########
File path: ballista-examples/examples/test_sql.rs
##########
@@ -0,0 +1,46 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use ballista::prelude::{BallistaConfig, BallistaContext, Result};
+use datafusion::prelude::CsvReadOptions;
+
+/// This example show the udf plugin is work
+///
+#[tokio::main]
+async fn main() -> Result<()> {
+    let config = BallistaConfig::builder()
+        .set("ballista.shuffle.partitions", "1")
+        .build()?;
+
+    let ctx = BallistaContext::standalone(&config, 10).await.unwrap();
+
+    let testdata = datafusion::test_util::arrow_test_data();
+
+    // register csv file with the execution context
+    ctx.register_csv(
+        "aggregate_test_100",
+        &format!("{}/csv/aggregate_test_100.csv", testdata),
+        CsvReadOptions::new(),
+    )
+    .await?;
+
+    // test udf
+    let df = ctx.sql("select count(1) from aggregate_test_100").await?;

Review comment:
       I think https://github.com/apache/arrow-datafusion/issues/321 tracks some of this -- I will make a note




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #2070: [Bug][Datafusion] fix TaskContext session_config bug

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #2070:
URL: https://github.com/apache/arrow-datafusion/pull/2070#issuecomment-1077896409


   Thanks @gaojun2048 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb edited a comment on pull request #2070: [Bug][Datafusion] fix TaskContext session_config bug

Posted by GitBox <gi...@apache.org>.
alamb edited a comment on pull request #2070:
URL: https://github.com/apache/arrow-datafusion/pull/2070#issuecomment-1076707745


   None of the CI checks ran on this branch @ https://github.com/apache/arrow-datafusion/pull/2070/commits/ccf67091c8dd5e2045802241c411ce56fa021b5e (probably because github was having issuesearlier today ) so I merged `apache/master` into this branch in https://github.com/apache/arrow-datafusion/pull/2070/commits/5cda9d1798259d05ca0edece5f5b5718aea6ceef in an attempt to start the checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] liukun4515 commented on a change in pull request #2070: [Bug][Datafusion] fix TaskContext session_config bug

Posted by GitBox <gi...@apache.org>.
liukun4515 commented on a change in pull request #2070:
URL: https://github.com/apache/arrow-datafusion/pull/2070#discussion_r833852081



##########
File path: ballista-examples/examples/test_sql.rs
##########
@@ -0,0 +1,46 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use ballista::prelude::{BallistaConfig, BallistaContext, Result};
+use datafusion::prelude::CsvReadOptions;
+
+/// This example show the udf plugin is work
+///
+#[tokio::main]
+async fn main() -> Result<()> {
+    let config = BallistaConfig::builder()
+        .set("ballista.shuffle.partitions", "1")
+        .build()?;
+
+    let ctx = BallistaContext::standalone(&config, 10).await.unwrap();
+
+    let testdata = datafusion::test_util::arrow_test_data();
+
+    // register csv file with the execution context
+    ctx.register_csv(
+        "aggregate_test_100",
+        &format!("{}/csv/aggregate_test_100.csv", testdata),
+        CsvReadOptions::new(),
+    )
+    .await?;
+
+    // test udf
+    let df = ctx.sql("select count(1) from aggregate_test_100").await?;

Review comment:
       Maybe we can add this test for the UT or it IT framework.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #2070: [Bug][Datafusion] fix TaskContext session_config bug

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #2070:
URL: https://github.com/apache/arrow-datafusion/pull/2070


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] mingmwang commented on pull request #2070: [Bug][Datafusion] fix TaskContext session_config bug

Posted by GitBox <gi...@apache.org>.
mingmwang commented on pull request #2070:
URL: https://github.com/apache/arrow-datafusion/pull/2070#issuecomment-1077364611


   My bad. The properties in the TaskContext is not populated in this PR.  
   
   The fix LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] liukun4515 commented on a change in pull request #2070: [Bug][Datafusion] fix TaskContext session_config bug

Posted by GitBox <gi...@apache.org>.
liukun4515 commented on a change in pull request #2070:
URL: https://github.com/apache/arrow-datafusion/pull/2070#discussion_r835832156



##########
File path: ballista-examples/examples/test_sql.rs
##########
@@ -0,0 +1,46 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use ballista::prelude::{BallistaConfig, BallistaContext, Result};
+use datafusion::prelude::CsvReadOptions;
+
+/// This example show the udf plugin is work
+///
+#[tokio::main]
+async fn main() -> Result<()> {
+    let config = BallistaConfig::builder()
+        .set("ballista.shuffle.partitions", "1")
+        .build()?;
+
+    let ctx = BallistaContext::standalone(&config, 10).await.unwrap();
+
+    let testdata = datafusion::test_util::arrow_test_data();
+
+    // register csv file with the execution context
+    ctx.register_csv(
+        "aggregate_test_100",
+        &format!("{}/csv/aggregate_test_100.csv", testdata),
+        CsvReadOptions::new(),
+    )
+    .await?;
+
+    // test udf
+    let df = ctx.sql("select count(1) from aggregate_test_100").await?;

Review comment:
       > about add a ballista-integration-
   
   agree with the suggestion.
   we can discuss this in this issue https://github.com/apache/arrow-datafusion/issues/321.
   
   @gaojun2048  you can provide some suggestions or proposals.
   You can point out some lack of ballista about test and record them.
   We can make a plan to address them in Q2.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] gaojun2048 commented on a change in pull request #2070: [Bug][Datafusion] fix TaskContext session_config bug

Posted by GitBox <gi...@apache.org>.
gaojun2048 commented on a change in pull request #2070:
URL: https://github.com/apache/arrow-datafusion/pull/2070#discussion_r834104400



##########
File path: ballista-examples/examples/test_sql.rs
##########
@@ -0,0 +1,46 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use ballista::prelude::{BallistaConfig, BallistaContext, Result};
+use datafusion::prelude::CsvReadOptions;
+
+/// This example show the udf plugin is work
+///
+#[tokio::main]
+async fn main() -> Result<()> {
+    let config = BallistaConfig::builder()
+        .set("ballista.shuffle.partitions", "1")
+        .build()?;
+
+    let ctx = BallistaContext::standalone(&config, 10).await.unwrap();
+
+    let testdata = datafusion::test_util::arrow_test_data();
+
+    // register csv file with the execution context
+    ctx.register_csv(
+        "aggregate_test_100",
+        &format!("{}/csv/aggregate_test_100.csv", testdata),
+        CsvReadOptions::new(),
+    )
+    .await?;
+
+    // test udf
+    let df = ctx.sql("select count(1) from aggregate_test_100").await?;

Review comment:
       @liukun4515  How about add a ballista-integration-test cargo?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] liukun4515 commented on a change in pull request #2070: [Bug][Datafusion] fix TaskContext session_config bug

Posted by GitBox <gi...@apache.org>.
liukun4515 commented on a change in pull request #2070:
URL: https://github.com/apache/arrow-datafusion/pull/2070#discussion_r833851832



##########
File path: ballista-examples/examples/test_sql.rs
##########
@@ -0,0 +1,46 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use ballista::prelude::{BallistaConfig, BallistaContext, Result};
+use datafusion::prelude::CsvReadOptions;
+
+/// This example show the udf plugin is work
+///
+#[tokio::main]
+async fn main() -> Result<()> {
+    let config = BallistaConfig::builder()
+        .set("ballista.shuffle.partitions", "1")
+        .build()?;
+
+    let ctx = BallistaContext::standalone(&config, 10).await.unwrap();
+
+    let testdata = datafusion::test_util::arrow_test_data();
+
+    // register csv file with the execution context
+    ctx.register_csv(
+        "aggregate_test_100",
+        &format!("{}/csv/aggregate_test_100.csv", testdata),
+        CsvReadOptions::new(),
+    )
+    .await?;
+
+    // test udf
+    let df = ctx.sql("select count(1) from aggregate_test_100").await?;

Review comment:
       @alamb 
   Do we have the UT or IT for the ballista side?
   I think this `test_sql.rs` file just adds an example for the ballista.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #2070: [Bug][Datafusion] fix TaskContext session_config bug

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #2070:
URL: https://github.com/apache/arrow-datafusion/pull/2070#discussion_r833601132



##########
File path: ballista-examples/examples/test_sql.rs
##########
@@ -0,0 +1,46 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use ballista::prelude::{BallistaConfig, BallistaContext, Result};
+use datafusion::prelude::CsvReadOptions;
+
+/// This example show the udf plugin is work
+///
+#[tokio::main]
+async fn main() -> Result<()> {
+    let config = BallistaConfig::builder()
+        .set("ballista.shuffle.partitions", "1")
+        .build()?;
+
+    let ctx = BallistaContext::standalone(&config, 10).await.unwrap();
+
+    let testdata = datafusion::test_util::arrow_test_data();
+
+    // register csv file with the execution context
+    ctx.register_csv(
+        "aggregate_test_100",
+        &format!("{}/csv/aggregate_test_100.csv", testdata),
+        CsvReadOptions::new(),
+    )
+    .await?;
+
+    // test udf
+    let df = ctx.sql("select count(1) from aggregate_test_100").await?;

Review comment:
       💯  for test

##########
File path: datafusion/src/execution/context.rs
##########
@@ -1340,27 +1340,31 @@ impl TaskContext {
         match task_props {
             TaskProperties::KVPairs(props) => {
                 let session_config = SessionConfig::new();
-                session_config
-                    .with_batch_size(props.get(BATCH_SIZE).unwrap().parse().unwrap())
-                    .with_target_partitions(
-                        props.get(TARGET_PARTITIONS).unwrap().parse().unwrap(),
-                    )
-                    .with_repartition_joins(
-                        props.get(REPARTITION_JOINS).unwrap().parse().unwrap(),
-                    )
-                    .with_repartition_aggregations(
-                        props
-                            .get(REPARTITION_AGGREGATIONS)
-                            .unwrap()
-                            .parse()
-                            .unwrap(),
-                    )
-                    .with_repartition_windows(
-                        props.get(REPARTITION_WINDOWS).unwrap().parse().unwrap(),
-                    )
-                    .with_parquet_pruning(
-                        props.get(PARQUET_PRUNING).unwrap().parse().unwrap(),
-                    )
+                if props.is_empty() {
+                    session_config
+                } else {
+                    session_config
+                        .with_batch_size(props.get(BATCH_SIZE).unwrap().parse().unwrap())

Review comment:
       FYI @mingmwang 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] gaojun2048 commented on a change in pull request #2070: [Bug][Datafusion] fix TaskContext session_config bug

Posted by GitBox <gi...@apache.org>.
gaojun2048 commented on a change in pull request #2070:
URL: https://github.com/apache/arrow-datafusion/pull/2070#discussion_r833855523



##########
File path: ballista-examples/examples/test_sql.rs
##########
@@ -0,0 +1,46 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use ballista::prelude::{BallistaConfig, BallistaContext, Result};
+use datafusion::prelude::CsvReadOptions;
+
+/// This example show the udf plugin is work
+///
+#[tokio::main]
+async fn main() -> Result<()> {
+    let config = BallistaConfig::builder()
+        .set("ballista.shuffle.partitions", "1")
+        .build()?;
+
+    let ctx = BallistaContext::standalone(&config, 10).await.unwrap();
+
+    let testdata = datafusion::test_util::arrow_test_data();
+
+    // register csv file with the execution context
+    ctx.register_csv(
+        "aggregate_test_100",
+        &format!("{}/csv/aggregate_test_100.csv", testdata),
+        CsvReadOptions::new(),
+    )
+    .await?;
+
+    // test udf
+    let df = ctx.sql("select count(1) from aggregate_test_100").await?;

Review comment:
       @liukun4515  I think we need a ballista-integration-test cargo. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #2070: [Bug][Datafusion] fix TaskContext session_config bug

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #2070:
URL: https://github.com/apache/arrow-datafusion/pull/2070#issuecomment-1076707745


   None of the CI checks ran on this branch @ https://github.com/apache/arrow-datafusion/pull/2070/commits/ccf67091c8dd5e2045802241c411ce56fa021b5e (probably because github was having issuesearlier today ) so I merged `apache/master` into this branch in an attempt to start the checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org