You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "izveigor (via GitHub)" <gi...@apache.org> on 2023/06/12 12:36:39 UTC

[GitHub] [arrow-datafusion] izveigor opened a new issue, #6649: Parser not working properly

izveigor opened a new issue, #6649:
URL: https://github.com/apache/arrow-datafusion/issues/6649

   ### Describe the bug
   
   There are several cases, when parser does not work correctly. I can draw this conclusion from the comparison PostgreSQL and Arrow Datafusion queries.
   Main causes of errors:
   1) The precedence of bitwise shifts does not equal to other bitwise operators.
   2) Unary operation `-` with the operations `*`, `/` and `%`  negates the result of the next statements.
   3) The same problem like `(2)` we have with `Intervals`, when we want to add or subtract negative interval with other, we negate the whole result except of one interval.
   
   ### To Reproduce
   
   | SQL expression | Arrow Datafusion | PostgreSQL |
   | ----------------- | ------------------- | -------------|
   | `select 15 \| 12331233 >> 1234123 & 12;` | 4 | 48175 |
   | `select - 1234 % 12 % 3 * 1545 % 23;` | -4 | -17 |
   | `select - 1234 % 12 % 3 * 1545 % 23;` | -4 | -22 |
   | `select - 13 * 1324 * -234 / 24 * 23;` | 3562884 | 3859791|
   | `select -12 * 35 % -16 * 32;` | -420 | -128|
   | `select - interval '16 hours' + interval '2 hours';` | 18:00:00 | -14:00:00 |
   
   ```
   DataFusion CLI v25.0.0
   ❯ select 15 | 12331233 >> 1234123 & 12;
   +-----------------------------------------------------------+
   | Int64(15) | Int64(12331233) >> Int64(1234123) & Int64(12) |
   +-----------------------------------------------------------+
   | 48175                                                     |
   +-----------------------------------------------------------+
   1 row in set. Query took 0.001 seconds.
   ❯ select - 1234 % 12 % 3 * 1545 % 23;
   +------------------------------------------------------------------+
   | (- Int64(1234) % Int64(12) % Int64(3) * Int64(1545) % Int64(23)) |
   +------------------------------------------------------------------+
   | -4                                                               |
   +------------------------------------------------------------------+
   1 row in set. Query took 0.001 seconds.
   ❯ select - 1234 % 12 % 3 * 1545 % 23;
   +------------------------------------------------------------------+
   | (- Int64(1234) % Int64(12) % Int64(3) * Int64(1545) % Int64(23)) |
   +------------------------------------------------------------------+
   | -4                                                               |
   +------------------------------------------------------------------+
   1 row in set. Query took 0.001 seconds.
   ❯ select - 13 * 1324 * -234 / 24 * 23;
   +----------------------------------------------------------------------+
   | (- Int64(13) * Int64(1324) * (- Int64(234) / Int64(24) * Int64(23))) |
   +----------------------------------------------------------------------+
   | 3562884                                                              |
   +----------------------------------------------------------------------+
   1 row in set. Query took 0.001 seconds.
   ❯ select -12 * 35 % -16 * 32;
   +-------------------------------------------------------+
   | (- Int64(12) * Int64(35) % (- Int64(16) * Int64(32))) |
   +-------------------------------------------------------+
   | -420                                                  |
   +-------------------------------------------------------+
   1 row in set. Query took 0.001 seconds.
   ❯ select - interval '16 hours' + interval '2 hours';
   +------------------------------------------------------------------------------------+
   | (- IntervalMonthDayNano("57600000000000") + IntervalMonthDayNano("7200000000000")) |
   +------------------------------------------------------------------------------------+
   | 0 years 0 mons 0 days -18 hours 0 mins 0.000000000 secs                            |
   +------------------------------------------------------------------------------------+
   1 row in set. Query took 0.000 seconds.
   ❯ 
   ```
   **BUT:**
   ```
   ❯ select -(interval '2 hours' + interval '3 hours') + (interval '5 hours' - interval '2 hours');
   +---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | (- IntervalMonthDayNano("7200000000000") + IntervalMonthDayNano("10800000000000")) + IntervalMonthDayNano("18000000000000") - IntervalMonthDayNano("7200000000000") |
   +---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | 0 years 0 mons 0 days -2 hours 0 mins 0.000000000 secs                                                                                                              |
   +---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   1 row in set. Query took 0.001 seconds.
   ```
   
   ```
   postgres=# select -(interval '2 hours' + interval '3 hours') + (interval '5 hours' - interval '2 hours');
    ?column?  
   -----------
    -02:00:00
   (1 row)
   ```
   
   ```
   ❯ select -2 + -2 + 5 + -5;
   +----------------------------------------------+
   | Int64(-2) + Int64(-2) + Int64(5) + Int64(-5) |
   +----------------------------------------------+
   | -4                                           |
   +----------------------------------------------+
   1 row in set. Query took 0.001 seconds.
   ```
   
   ```
   postgres=# select -2 + -2 + 5 + -5;
    ?column? 
   ----------
          -4
   (1 row)
   ```
   
   ```
   ❯ select -2 + -2 - 5 + -5;
   +----------------------------------------------+
   | Int64(-2) + Int64(-2) - Int64(5) + Int64(-5) |
   +----------------------------------------------+
   | -14                                          |
   +----------------------------------------------+
   1 row in set. Query took 0.000 seconds.
   ```
   
   ```
   postgres=# select -2 + -2 - 5 + -5;
    ?column? 
   ----------
         -14
   (1 row)
   ```
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] izveigor closed issue #6649: Parser not working properly

Posted by "izveigor (via GitHub)" <gi...@apache.org>.
izveigor closed issue #6649: Parser not working properly
URL: https://github.com/apache/arrow-datafusion/issues/6649


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6649: Parser not working properly

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6649:
URL: https://github.com/apache/arrow-datafusion/issues/6649#issuecomment-1602823163

   I think the fix should be available sometime tomorrow https://github.com/sqlparser-rs/sqlparser-rs/issues/887


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] thomas-k-cameron commented on issue #6649: Parser not working properly

Posted by "thomas-k-cameron (via GitHub)" <gi...@apache.org>.
thomas-k-cameron commented on issue #6649:
URL: https://github.com/apache/arrow-datafusion/issues/6649#issuecomment-1587610290

   I think I identified the issue.
   On `physical-expr`, you can find this.
   ```rust
           Expr::BinaryExpr(BinaryExpr { left, op, right }) => {
               // Create physical expressions for left and right operands
               let lhs = create_physical_expr(
                   left,
                   input_dfschema,
                   input_schema,
                   execution_props,
               )?;
               let rhs = create_physical_expr(
                   right,
                   input_dfschema,
                   input_schema,
                   execution_props,
               )?;
   ```
   
   It evaluates right/left independently and doesn't mix it up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] thomas-k-cameron commented on issue #6649: Parser not working properly

Posted by "thomas-k-cameron (via GitHub)" <gi...@apache.org>.
thomas-k-cameron commented on issue #6649:
URL: https://github.com/apache/arrow-datafusion/issues/6649#issuecomment-1589674848

   @alamb @andygrove 
   
   I'm assuming that I can fix this issue by adding a logic to re-order things in this function `fn sql_expr_to_logical_expr` at `datafusion/sql/src/expr/mod.rs`.
   
   But I'm not quite sure if there is a better way to do this. Would you kindly give me some advice?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] izveigor commented on issue #6649: Parser not working properly

Posted by "izveigor (via GitHub)" <gi...@apache.org>.
izveigor commented on issue #6649:
URL: https://github.com/apache/arrow-datafusion/issues/6649#issuecomment-1594583119

   As the creator of this ticket, due to the lack of activity, I will take on some aspects of solving the problem. First, It's true that bitwise binary operations have the wrong precedence, **BUT** only in terms of difference from some databases (include PostgreSQL).
   
   For example:
   ```
   select 15 | 12331233 & 1234123 | 12 & 15 | 34;
   PostgreSQL, SQLite: 47
   Python, MySQL, Datafusion: 1048815
   ```
   
   So, if it's so important to follow the rules of precedence PostgreSQL, then it's better to fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] thomas-k-cameron commented on issue #6649: Parser not working properly

Posted by "thomas-k-cameron (via GitHub)" <gi...@apache.org>.
thomas-k-cameron commented on issue #6649:
URL: https://github.com/apache/arrow-datafusion/issues/6649#issuecomment-1587727901

   Seems like sql parse doesn't re-organize when it gets `div`/`mul`.
   I think this is the cause of the issue.
   
   `datafusion/sql/src/expr/mod.rs`
   
   ```rust
   impl<'a, S: ContextProvider> SqlToRel<'a, S> {
       pub(crate) fn sql_expr_to_logical_expr(
           &self,
           sql: SQLExpr,
           schema: &DFSchema,
           planner_context: &mut PlannerContext,
       ) -> Result<Expr> {
           enum StackEntry {
               SQLExpr(Box<SQLExpr>),
               Operator(Operator),
           }
   
           // Virtual stack machine to convert SQLExpr to Expr
           // This allows visiting the expr tree in a depth-first manner which
           // produces expressions in postfix notations, i.e. `a + b` => `a b +`.
           // See https://github.com/apache/arrow-datafusion/issues/1444
           let mut stack = vec![StackEntry::SQLExpr(Box::new(sql))];
           let mut eval_stack = vec![];
   
           while let Some(entry) = stack.pop() {
               match entry {
                   StackEntry::SQLExpr(sql_expr) => {
                       match *sql_expr {
                           SQLExpr::BinaryOp { left, op, right } => {
                               // Note the order that we push the entries to the stack
                               // is important. We want to visit the left node first.
                               // todo!
                               let op = self.parse_sql_binary_op(op)?;
                               stack.push(StackEntry::Operator(op));
                               stack.push(StackEntry::SQLExpr(right));
                               stack.push(StackEntry::SQLExpr(left));
                           }
                           _ => {
                               let expr = self.sql_expr_to_logical_expr_internal(
                                   *sql_expr,
                                   schema,
                                   planner_context,
                               )?;
                               eval_stack.push(expr);
                           }
                       }
                   }
                   StackEntry::Operator(op) => {
                       let right = eval_stack.pop().unwrap();
                       let left = eval_stack.pop().unwrap();
                       let expr = Expr::BinaryExpr(BinaryExpr::new(
                           Box::new(left),
                           op,
                           Box::new(right),
                       ));
                       eval_stack.push(expr);
                   }
               }
           }
   
           assert_eq!(1, eval_stack.len());
           let expr = eval_stack.pop().unwrap();
           Ok(expr)
       }
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] thomas-k-cameron commented on issue #6649: Parser not working properly

Posted by "thomas-k-cameron (via GitHub)" <gi...@apache.org>.
thomas-k-cameron commented on issue #6649:
URL: https://github.com/apache/arrow-datafusion/issues/6649#issuecomment-1587540701

   It seems like it is interpreting as this,
   ```sql
   -12 * 35) % (-16 * 32);
   ```
   instead of this.
   ```sql
   (-12 * 35 % -16) * 32;
   ```
   
   I was able to reproduce with v26.0.0
   
   ```sql
   ❯ select (-12 * 35) % (-16 * 32);
   +-------------------------------------------------------+
   | (- Int64(12) * Int64(35)) % (- Int64(16) * Int64(32)) |
   +-------------------------------------------------------+
   | -420                                                  |
   +-------------------------------------------------------+
   1 row in set. Query took 0.000 seconds.
   ❯ select (-12 * 35 % -16) * 32;
   +----------------------------------------------------+
   | (- Int64(12) * Int64(35) % Int64(-16)) * Int64(32) |
   +----------------------------------------------------+
   | -128                                               |
   +----------------------------------------------------+
   1 row in set. Query took 0.001 seconds.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] thomas-k-cameron commented on issue #6649: Parser not working properly

Posted by "thomas-k-cameron (via GitHub)" <gi...@apache.org>.
thomas-k-cameron commented on issue #6649:
URL: https://github.com/apache/arrow-datafusion/issues/6649#issuecomment-1587636926

   This issue comes from the fact that right/left of the `BinaryExpr` is not merged properly.
   
   I can come up with few ways to fix this.
   
   1. create dedicated `Expr` variant for handling 
   
   This one introduce some level of complexity, and it should be time consuming. There is a chance to introduce breaking changes
   
   2. Check `BinaryExpr` when op is one of Add, Minus, Mul, Div
   
   Should be easier than the above, but not quite sure how complicated it is going to be


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6649: Parser not working properly

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6649:
URL: https://github.com/apache/arrow-datafusion/issues/6649#issuecomment-1589972379

   Thanks for the report @thomas-k-cameron  
   
   I think the issue you describe is likely due to different operator parsing precedence in sqlparser compared to posgresql
   
   I suspect the fix that will be the most effective and least brittle is to update the precedence rules in the sqlparser library, perhaps in here: 
   
   https://github.com/sqlparser-rs/sqlparser-rs/blob/2b37e4ae6ecbf2f22593b593ad1e4638ca76a8b2/src/parser.rs#L1975-L2056
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6649: Parser not working properly

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6649:
URL: https://github.com/apache/arrow-datafusion/issues/6649#issuecomment-1604575441

   Fixed in https://github.com/apache/arrow-datafusion/pull/6753 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] izveigor commented on issue #6649: Parser not working properly

Posted by "izveigor (via GitHub)" <gi...@apache.org>.
izveigor commented on issue #6649:
URL: https://github.com/apache/arrow-datafusion/issues/6649#issuecomment-1602742473

   @alamb you are right about the precedence.
   The main problem is in the line: https://github.com/sqlparser-rs/sqlparser-rs/blob/main/src/parser.rs#L797.
   I created PR which fixes this problem: https://github.com/sqlparser-rs/sqlparser-rs/pull/902.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org