You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/17 16:48:17 UTC

[GitHub] [arrow-datafusion] xudong963 opened a new pull request #1601: fix: casting Int64 to Float64 unsuccessfully caused tpch8 to fail

xudong963 opened a new pull request #1601:
URL: https://github.com/apache/arrow-datafusion/pull/1601


   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #1576 
   
    # Rationale for this change
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #1601: fix: casting Int64 to Float64 unsuccessfully caused tpch8 to fail

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1601:
URL: https://github.com/apache/arrow-datafusion/pull/1601#issuecomment-1015611128


   Filed https://github.com/apache/arrow-datafusion/issues/1609 to track


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #1601: fix: casting Int64 to Float64 unsuccessfully caused tpch8 to fail

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #1601:
URL: https://github.com/apache/arrow-datafusion/pull/1601


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on pull request #1601: fix: casting Int64 to Float64 unsuccessfully caused tpch8 to fail

Posted by GitBox <gi...@apache.org>.
xudong963 commented on pull request #1601:
URL: https://github.com/apache/arrow-datafusion/pull/1601#issuecomment-1015046138


   One problem with this modification is that `float64` to `int` causes a loss of accuracy.  I think `return_type` decided by `then` data type is unreasonable. 
   
   I checked Postgres' behavior, it can process reasonably.
   ```sql
   postgres=# SELECT * FROM test;
    a
   ---
    1
    2
    3
   (3 rows)
   
   
   postgres=# SELECT a,
          CASE WHEN a=1 THEN 1.2
               ELSE 0
          END
       FROM test;
    a | case
   ---+------
    1 |  1.2
    2 |    0
    3 |    0
   (3 rows)
   
   
   postgres=# SELECT a,
          CASE WHEN a=1 THEN 12
               ELSE 5.4
          END
       FROM test;
    a | case
   ---+------
    1 |   12
    2 |  5.4
    3 |  5.4
   (3 rows)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 edited a comment on pull request #1601: fix: casting Int64 to Float64 unsuccessfully caused tpch8 to fail

Posted by GitBox <gi...@apache.org>.
xudong963 edited a comment on pull request #1601:
URL: https://github.com/apache/arrow-datafusion/pull/1601#issuecomment-1014729947


   ### TPCH 8
   ```
   ➜  benchmarks git:(fix_cast) ✗ cargo run --release --bin tpch -- benchmark datafusion --iterations 3 --path ./data --format tbl --query 8 --batch-size 4096
    
   Running benchmarks with the following options: DataFusionBenchmarkOpt { query: 8, debug: false, iterations: 3, partitions: 2, batch_size: 4096, path: "./data", file_format: "tbl", mem_table: false }
   Query 8 iteration 0 took 6401.8 ms
   Query 8 iteration 1 took 5731.4 ms
   Query 8 iteration 2 took 5830.1 ms
   Query 8 avg time: 5987.78 ms
   ```
   
   ### TPCH 14
   ```
   ➜  benchmarks git:(fix_cast) cargo run --release --bin tpch -- benchmark datafusion --iterations 3 --path ./data --format tbl --query 14 --batch-size 4096
    
   Running benchmarks with the following options: DataFusionBenchmarkOpt { query: 14, debug: false, iterations: 3, partitions: 2, batch_size: 4096, path: "./data", file_format: "tbl", mem_table: false }
   Query 14 iteration 0 took 4292.5 ms
   Query 14 iteration 1 took 4464.5 ms
   Query 14 iteration 2 took 2763.9 ms
   Query 14 avg time: 3840.29 ms
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on pull request #1601: fix: casting Int64 to Float64 unsuccessfully caused tpch8 to fail

Posted by GitBox <gi...@apache.org>.
xudong963 commented on pull request #1601:
URL: https://github.com/apache/arrow-datafusion/pull/1601#issuecomment-1014729947


   ```
   ➜  benchmarks git:(fix_cast) ✗ cargo run --release --bin tpch -- benchmark datafusion --iterations 3 --path ./data --format tbl --query 8 --batch-size 4096
    
   Running benchmarks with the following options: DataFusionBenchmarkOpt { query: 8, debug: false, iterations: 3, partitions: 2, batch_size: 4096, path: "./data", file_format: "tbl", mem_table: false }
   Query 8 iteration 0 took 6401.8 ms
   Query 8 iteration 1 took 5731.4 ms
   Query 8 iteration 2 took 5830.1 ms
   Query 8 avg time: 5987.78 ms
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on pull request #1601: fix: casting Int64 to Float64 unsuccessfully caused tpch8 to fail

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #1601:
URL: https://github.com/apache/arrow-datafusion/pull/1601#issuecomment-1015117652


   >  I think return_type decided by then data type is unreasonable.
   
   I agree, better to take both branches into account.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on pull request #1601: fix: casting Int64 to Float64 unsuccessfully caused tpch8 to fail

Posted by GitBox <gi...@apache.org>.
xudong963 commented on pull request #1601:
URL: https://github.com/apache/arrow-datafusion/pull/1601#issuecomment-1015561419


   > > I think return_type decided by then data type is unreasonable.
   > 
   > I agree, better to take both branches into account.
   
   Next PR I'll do the thing and keep consistent with Postgres. cc @alamb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1601: fix: casting Int64 to Float64 unsuccessfully caused tpch8 to fail

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1601:
URL: https://github.com/apache/arrow-datafusion/pull/1601#discussion_r786905414



##########
File path: datafusion/src/physical_plan/expressions/case.rs
##########
@@ -324,7 +325,10 @@ impl CaseExpr {
 
         // start with the else condition, or nulls
         let mut current_value: Option<ArrayRef> = if let Some(e) = &self.else_expr {
-            Some(e.evaluate(batch)?.into_array(batch.num_rows()))
+            // keep `else_expr`'s data type and return type consistent

Review comment:
       Eventually it would be great to separate out the `coercion` logic into the `
   
   ```rust
   
   /// Create a CASE expression
   pub fn case(
       expr: Option<Arc<dyn PhysicalExpr>>,
       when_thens: &[WhenThen],
       else_expr: Option<Arc<dyn PhysicalExpr>>,
   ) -> Result<Arc<dyn PhysicalExpr>> {
       Ok(Arc::new(CaseExpr::try_new(expr, when_thens, else_expr)?))
   }
   
   ```
   
   function the way it is done with `BinaryExpr` and `binary_cast`
   
   https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/physical_plan/expressions/binary.rs#L1106-L1114
   
   But this is better than master so 👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org