You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/11/21 19:57:45 UTC

[GitHub] [arrow] andygrove opened a new pull request #8734: ARROW-10680: [Rust] [DataFusion] Add partial support for TPC-H query 12

andygrove opened a new pull request #8734:
URL: https://github.com/apache/arrow/pull/8734


   Add partial support for TPC-H query 12


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rdettai commented on pull request #8734: ARROW-10680: [Rust] [DataFusion] Add partial support for TPC-H query 12

Posted by GitBox <gi...@apache.org>.
rdettai commented on pull request #8734:
URL: https://github.com/apache/arrow/pull/8734#issuecomment-732126271


   We can discuss downgrading back to tokio 0.2 in the causing PR https://github.com/apache/arrow/pull/8697 or open an issue


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #8734: ARROW-10680: [Rust] [DataFusion] Add partial support for TPC-H query 12

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #8734:
URL: https://github.com/apache/arrow/pull/8734#issuecomment-731655951


   @jorgecarleitao the error was caused by a tokio version mismatch between the crates 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8734: ARROW-10680: [Rust] [DataFusion] Add partial support for TPC-H query 12

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8734:
URL: https://github.com/apache/arrow/pull/8734#issuecomment-731629745


   https://issues.apache.org/jira/browse/ARROW-10680


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #8734: ARROW-10680: [Rust] [DataFusion] Add partial support for TPC-H query 12

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8734:
URL: https://github.com/apache/arrow/pull/8734#issuecomment-731699890


   I will take a look at the filter pushdown.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove edited a comment on pull request #8734: ARROW-10680: [Rust] [DataFusion] Add partial support for TPC-H query 12

Posted by GitBox <gi...@apache.org>.
andygrove edited a comment on pull request #8734:
URL: https://github.com/apache/arrow/pull/8734#issuecomment-731772595


   Windows build failure seems unrelated:
   
   ```
   LINK : fatal error LNK1318: Unexpected PDB error; FILE_SYSTEM (3) 'D:\a\arrow\arrow\rust\target\debug\deps\arrow-0d5ca67d15d3b3d6.pdb'
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rdettai commented on pull request #8734: ARROW-10680: [Rust] [DataFusion] Add partial support for TPC-H query 12

Posted by GitBox <gi...@apache.org>.
rdettai commented on pull request #8734:
URL: https://github.com/apache/arrow/pull/8734#issuecomment-732089304


   @andygrove @jorgecarleitao I noticed you updated tokio to 0.3 in arrow-flight. You need to activate the `rt-multi-thread` for it to work! 😃 
   
   
   And apart from that, don't you have problems running tonic (which uses tokio 0.2) with tokio 0.3 ?
   
   `cd rust/arrow-flight; cargo run --example server`
   -> 'there is no reactor running, must be called from the context of Tokio runtime'


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #8734: ARROW-10680: [Rust] [DataFusion] Add partial support for TPC-H query 12

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #8734:
URL: https://github.com/apache/arrow/pull/8734#issuecomment-731656882


   The query works if I disable the `FilterPushDown` rule:
   
   ```
   Running benchmarks with the following options: BenchmarkOpt { query: 12, debug: true, iterations: 1, concurrency: 24, batch_size: 4096, path: "/mnt/tpch/tbl-sf1/", file_format: "tbl", mem_table: false }
   Logical plan:
   Aggregate: groupBy=[[#l_shipmode]], aggr=[[SUM(Int32(1)) AS high_line_count, SUM(Int32(0)) AS low_line_count]]
     Join: l_orderkey = o_orderkey
       Filter: #l_receiptdate Lt Utf8("1995-01-01")
         Filter: #l_receiptdate GtEq Utf8("1994-01-01")
           Filter: #l_shipdate Lt #l_commitdate
             Filter: #l_commitdate Lt #l_receiptdate
               Filter: #l_shipmode Eq Utf8("MAIL") Or #l_shipmode Eq Utf8("SHIP")
                 TableScan: lineitem projection=None
       TableScan: orders projection=None
   Optimized logical plan:
   Aggregate: groupBy=[[#l_shipmode]], aggr=[[SUM(Int32(1)) AS high_line_count, SUM(Int32(0)) AS low_line_count]]
     Join: l_orderkey = o_orderkey
       Filter: #l_receiptdate Lt Utf8("1995-01-01")
         Filter: #l_receiptdate GtEq Utf8("1994-01-01")
           Filter: #l_shipdate Lt #l_commitdate
             Filter: #l_commitdate Lt #l_receiptdate
               Filter: #l_shipmode Eq Utf8("MAIL") Or #l_shipmode Eq Utf8("SHIP")
                 TableScan: lineitem projection=Some([0, 10, 11, 12, 14])
       TableScan: orders projection=Some([0])
   +------------+-----------------+----------------+
   | l_shipmode | high_line_count | low_line_count |
   +------------+-----------------+----------------+
   | MAIL       | 15526           | 0              |
   | SHIP       | 15462           | 0              |
   +------------+-----------------+----------------+
   Query 12 iteration 0 took 13431 ms
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #8734: ARROW-10680: [Rust] [DataFusion] Add partial support for TPC-H query 12

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #8734:
URL: https://github.com/apache/arrow/pull/8734#issuecomment-731772595


   build failure seems unrelated:
   
   ```
   LINK : fatal error LNK1318: Unexpected PDB error; FILE_SYSTEM (3) 'D:\a\arrow\arrow\rust\target\debug\deps\arrow-0d5ca67d15d3b3d6.pdb'
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8734: ARROW-10680: [Rust] [DataFusion] Add partial support for TPC-H query 12

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8734:
URL: https://github.com/apache/arrow/pull/8734#discussion_r529036547



##########
File path: rust/benchmarks/src/bin/tpch.rs
##########
@@ -193,8 +143,105 @@ async fn benchmark(opt: BenchmarkOpt) -> Result<()> {
     Ok(())
 }
 
-async fn execute_sql(ctx: &mut ExecutionContext, sql: &str, debug: bool) -> Result<()> {
-    let plan = ctx.create_logical_plan(sql)?;
+fn create_logical_plan(ctx: &mut ExecutionContext, query: usize) -> Result<LogicalPlan> {
+    match query {
+        1 => ctx.create_logical_plan(
+            "select
+                    l_returnflag,
+                    l_linestatus,
+                    sum(l_quantity),
+                    sum(l_extendedprice),
+                    sum(l_extendedprice * (1 - l_discount)),
+                    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)),
+                    avg(l_quantity),
+                    avg(l_extendedprice),
+                    avg(l_discount),
+                    count(*)
+                from
+                    lineitem
+                where
+                    l_shipdate <= '1998-12-01'
+                group by
+                    l_returnflag,
+                    l_linestatus
+                order by
+                    l_returnflag,
+                    l_linestatus",
+        ),
+
+        12 => {
+            // We do not have sufficient SQL support for this query yet
+
+            // "SELECT
+            //     l_shipmode,
+            //     sum(case
+            //         when o_orderpriority = '1-URGENT'
+            //             OR o_orderpriority = '2-HIGH'
+            //             then 1
+            //         else 0
+            //     end) as high_line_count,
+            //     sum(case
+            //         when o_orderpriority <> '1-URGENT'
+            //             AND o_orderpriority <> '2-HIGH'
+            //             then 1
+            //         else 0
+            //     end) AS low_line_count
+            // FROM
+            //     orders,
+            //     lineitem
+            // WHERE
+            //     o_orderkey = l_orderkey
+            //     AND l_shipmode in ('MAIL', 'SHIP')
+            //     AND l_commitdate < l_receiptdate
+            //     AND l_shipdate < l_commitdate
+            //     AND l_receiptdate >= date '1994-01-01'
+            //     AND l_receiptdate < date '1994-01-01' + interval '1' year
+            // GROUP BY
+            //     l_shipmode
+            // ORDER BY
+            //     l_shipmode"
+
+            Ok(ctx
+                .table("lineitem")?
+                .filter(
+                    col("l_shipmode")
+                        .eq(lit("MAIL"))
+                        .or(col("l_shipmode").eq(lit("SHIP"))),
+                )?
+                .filter(col("l_commitdate").lt(col("l_receiptdate")))?
+                .filter(col("l_shipdate").lt(col("l_commitdate")))?
+                .filter(col("l_receiptdate").gt_eq(lit("1994-01-01")))?
+                // we do not support date functions yet, so faking the "+ interval '1' year" part

Review comment:
       there is `to_timestmp`: https://github.com/apache/arrow/blob/master/rust/datafusion/src/physical_plan/datetime_expressions.rs#L81 but we still need support to support intervals




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove closed pull request #8734: ARROW-10680: [Rust] [DataFusion] Add partial support for TPC-H query 12

Posted by GitBox <gi...@apache.org>.
andygrove closed pull request #8734:
URL: https://github.com/apache/arrow/pull/8734


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #8734: ARROW-10680: [Rust] [DataFusion] Add partial support for TPC-H query 12

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #8734:
URL: https://github.com/apache/arrow/pull/8734#issuecomment-731631082


   The query runs for a while and then fails with:
   
   ```
   Running benchmarks with the following options: BenchmarkOpt { query: 12, debug: false, iterations: 1, concurrency: 2, batch_size: 4096, path: "/mnt/tpch/tbl-sf1/", file_format: "tbl", mem_table: false }
   thread 'main' panicked at 'must be called from the context of Tokio runtime configured with either `basic_scheduler` or `threaded_scheduler`', datafusion/src/physical_plan/hash_aggregate.rs:368:9
   ```
   
   @jorgecarleitao @alamb I suggest that we merge this anyway since it does fix two join bugs and the benchmark is valid.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8734: ARROW-10680: [Rust] [DataFusion] Add partial support for TPC-H query 12

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8734:
URL: https://github.com/apache/arrow/pull/8734#discussion_r528245504



##########
File path: rust/benchmarks/src/bin/tpch.rs
##########
@@ -89,6 +89,8 @@ enum TpchOpt {
     Convert(ConvertOpt),
 }
 
+const TABLES: &[&'static str] = &["lineitem", "orders"];

Review comment:
       ```suggestion
   const TABLES: &[&str] = &["lineitem", "orders"];
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org