You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/10 18:29:15 UTC

[GitHub] [arrow-datafusion] andygrove opened a new issue, #2867: Error parsing valid SQL with aliases

andygrove opened a new issue, #2867:
URL: https://github.com/apache/arrow-datafusion/issues/2867

   **Describe the bug**
   I am working on a SQL query fuzzer (see [sqlfuzz](https://crates.io/crates/sqlfuzz) crate) and I think it just found a bug in DataFusion. 
   
   The following SQL works fine in Spark but fails in DataFusion with `ParserError("Expected ), found: __t404")`.
   
   ```sql
   SELECT __t404.__c394, __t404.__c395, __t404.__c396, __t404.__c397
   FROM ((
       (SELECT test1.c0 AS __c394, test1.c1 AS __c395, test1.c2 AS __c396, test1.c3 AS __c397
           FROM (test1)) __t398
       INNER JOIN
       (SELECT test1.c0 AS __c399, test1.c1 AS __c400, test1.c2 AS __c401, test1.c3 AS __c402
           FROM (test1)) __t403
       ON __t398.__c395 = __t403.__c401) __t404)
   WHERE __t404.__c395 != __t404.__c396
   ```
   
   Spark:
   
   ```
   scala> spark.sql("""SELECT __t404.__c394, __t404.__c395, __t404.__c396, __t404.__c397
        | FROM ((
        |     (SELECT test1.c0 AS __c394, test1.c1 AS __c395, test1.c2 AS __c396, test1.c3 AS __c397
        |         FROM (test1)) __t398
        |     INNER JOIN
        |     (SELECT test1.c0 AS __c399, test1.c1 AS __c400, test1.c2 AS __c401, test1.c3 AS __c402
        |         FROM (test1)) __t403
        |     ON __t398.__c395 = __t403.__c401) __t404)
        | WHERE __t404.__c395 != __t404.__c396""").show
   +------+------+------+------+
   |__c394|__c395|__c396|__c397|
   +------+------+------+------+
   +------+------+------+------+
   ```
   
   **To Reproduce**
   Use sqlfuzz
   
   **Expected behavior**
   Query should work
   
   **Additional context**
   None
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waitingkuo commented on issue #2867: Error parsing valid SQL with aliases

Posted by GitBox <gi...@apache.org>.
waitingkuo commented on issue #2867:
URL: https://github.com/apache/arrow-datafusion/issues/2867#issuecomment-1204502430

   @andygrove i've made the pull request to sqlparser-rs to make nestedjoin support alias
   
   I will update this https://github.com/apache/arrow-datafusion/blob/master/datafusion/sql/src/planner.rs#L731 after new sqlparser released
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waitingkuo commented on issue #2867: Error parsing valid SQL with aliases

Posted by GitBox <gi...@apache.org>.
waitingkuo commented on issue #2867:
URL: https://github.com/apache/arrow-datafusion/issues/2867#issuecomment-1179784547

   i tried it on sqlparser-rs, it doesn't work
   
   ``` rust
   use sqlparser::dialect::GenericDialect;
   use sqlparser::parser::Parser;
   
   fn main() {
   
       let dialect = GenericDialect {}; // or AnsiDialect
   
       let sql = "SELECT __t404.__c394, __t404.__c395, __t404.__c396, __t404.__c397
                   FROM ((
                       (SELECT test1.c0 AS __c394, test1.c1 AS __c395, test1.c2 AS __c396, test1.c3 AS __c397
                           FROM (test1)) __t398
                       INNER JOIN
                       (SELECT test1.c0 AS __c399, test1.c1 AS __c400, test1.c2 AS __c401, test1.c3 AS __c402
                           FROM (test1)) __t403
                       ON __t398.__c395 = __t403.__c401) __t404)
                   WHERE __t404.__c395 != __t404.__c396; ";
   
   
   
       let ast = Parser::parse_sql(&dialect, sql).unwrap();
   
       println!("AST: {:?}", ast);
   } 
   ```
   
   ``` bash
   ➜  sp git:(master) ✗ cargo run
      Compiling sp v0.1.0 (/Users/willy/willy/df-workspace/sp)
       Finished dev [unoptimized + debuginfo] target(s) in 0.30s
        Running `target/debug/sp`
   thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ParserError("Expected ), found: __t404")', src/main.rs:26:48
   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
   ```
   
    
   
   looks like the outer alias doesn't work. the following sql works
   ``` SQL
   ❯ SELECT __c394, __c395, __c396, __c397 
   FROM ((
       (SELECT test1.c0 AS __c394, test1.c1 AS __c395, test1.c2 AS __c396, test1.c3 AS __c397
           FROM (test1)) __t398
       INNER JOIN
       (SELECT test1.c0 AS __c399, test1.c1 AS __c400, test1.c2 AS __c401, test1.c3 AS __c402
           FROM (test1)) __t403
       ON __t398.__c395 = __t403.__c401) )
   WHERE __c395 != __c396;
   0 rows in set. Query took 0.004 seconds.
   ```
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mkmik commented on issue #2867: Error parsing valid SQL with aliases

Posted by GitBox <gi...@apache.org>.
mkmik commented on issue #2867:
URL: https://github.com/apache/arrow-datafusion/issues/2867#issuecomment-1305401887

   I stumbled upon a bug that may be a simpler reproducer of this
   
   ```console
   ❯ select a from (select a from (select 1 as a) as foo group by 1 order by a) as c;
   SchemaError(FieldNotFound { field: Column { relation: Some("foo"), name: "a" }, valid_fields: Some([Column { relation: Some("c"), name: "a" }]) })
   ```
   
   ----
   
   Interestingly if I remove as c in the outer expression it works:
   
   ```console
   ❯ select a from (select a from (select 1 as a) as foo group by 1 order by a);
   +---+
   | a |
   +---+
   | 1 |
   +---+
   1 row in set. Query took 0.005 seconds.
   ```
   
   It's also interesting that using positional column references in the order by works:
   
   ```console
   ❯ select a from (select a from (select 1 as a) as foo group by 1 order by 1) as c;
   +---+
   | a |
   +---+
   | 1 |
   +---+
   1 row in set. Query took 0.005 seconds.
   ```
   
   Furthermore, the bug reproduces only if `group by a` and `order by a` are *both* present:
   
   ```console
   ❯ select a from (select a from (select 1 as a) as foo group by a) as c;
   +---+
   | a |
   +---+
   | 1 |
   +---+
   1 row in set. Query took 0.005 seconds.
   ❯ select a from (select a from (select 1 as a) as foo order by a) as c;
   +---+
   | a |
   +---+
   | 1 |
   +---+
   1 row in set. Query took 0.004 seconds.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] waitingkuo commented on issue #2867: Error parsing valid SQL with aliases

Posted by GitBox <gi...@apache.org>.
waitingkuo commented on issue #2867:
URL: https://github.com/apache/arrow-datafusion/issues/2867#issuecomment-1207912280

   @andygrove the parser error fixed in #3072 
   
   now it raised another error
   ```bash
   DataFusion CLI v10.0.0
   0 rows in set. Query took 0.038 seconds.
   SchemaError(FieldNotFound { qualifier: Some("__t404"), name: "__c395", valid_fields: Some(["__t398.__c394", "__t398.__c395", "__t398.__c396", "__t398.__c397", "__t403.__c399", "__t403.__c400", "__t403.__c401", "__t403.__c402"]) })
   ```
   
   I feel like this might have the similar issue as #3073 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org