You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/10 18:29:15 UTC
[GitHub] [arrow-datafusion] andygrove opened a new issue, #2867: Error parsing valid SQL with aliases
andygrove opened a new issue, #2867:
URL: https://github.com/apache/arrow-datafusion/issues/2867
**Describe the bug**
I am working on a SQL query fuzzer (see [sqlfuzz](https://crates.io/crates/sqlfuzz) crate) and I think it just found a bug in DataFusion.
The following SQL works fine in Spark but fails in DataFusion with `ParserError("Expected ), found: __t404")`.
```sql
SELECT __t404.__c394, __t404.__c395, __t404.__c396, __t404.__c397
FROM ((
(SELECT test1.c0 AS __c394, test1.c1 AS __c395, test1.c2 AS __c396, test1.c3 AS __c397
FROM (test1)) __t398
INNER JOIN
(SELECT test1.c0 AS __c399, test1.c1 AS __c400, test1.c2 AS __c401, test1.c3 AS __c402
FROM (test1)) __t403
ON __t398.__c395 = __t403.__c401) __t404)
WHERE __t404.__c395 != __t404.__c396
```
Spark:
```
scala> spark.sql("""SELECT __t404.__c394, __t404.__c395, __t404.__c396, __t404.__c397
| FROM ((
| (SELECT test1.c0 AS __c394, test1.c1 AS __c395, test1.c2 AS __c396, test1.c3 AS __c397
| FROM (test1)) __t398
| INNER JOIN
| (SELECT test1.c0 AS __c399, test1.c1 AS __c400, test1.c2 AS __c401, test1.c3 AS __c402
| FROM (test1)) __t403
| ON __t398.__c395 = __t403.__c401) __t404)
| WHERE __t404.__c395 != __t404.__c396""").show
+------+------+------+------+
|__c394|__c395|__c396|__c397|
+------+------+------+------+
+------+------+------+------+
```
**To Reproduce**
Use sqlfuzz
**Expected behavior**
Query should work
**Additional context**
None
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] waitingkuo commented on issue #2867: Error parsing valid SQL with aliases
Posted by GitBox <gi...@apache.org>.
waitingkuo commented on issue #2867:
URL: https://github.com/apache/arrow-datafusion/issues/2867#issuecomment-1204502430
@andygrove i've made the pull request to sqlparser-rs to make nestedjoin support alias
I will update this https://github.com/apache/arrow-datafusion/blob/master/datafusion/sql/src/planner.rs#L731 after new sqlparser released
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] waitingkuo commented on issue #2867: Error parsing valid SQL with aliases
Posted by GitBox <gi...@apache.org>.
waitingkuo commented on issue #2867:
URL: https://github.com/apache/arrow-datafusion/issues/2867#issuecomment-1179784547
i tried it on sqlparser-rs, it doesn't work
``` rust
use sqlparser::dialect::GenericDialect;
use sqlparser::parser::Parser;
fn main() {
let dialect = GenericDialect {}; // or AnsiDialect
let sql = "SELECT __t404.__c394, __t404.__c395, __t404.__c396, __t404.__c397
FROM ((
(SELECT test1.c0 AS __c394, test1.c1 AS __c395, test1.c2 AS __c396, test1.c3 AS __c397
FROM (test1)) __t398
INNER JOIN
(SELECT test1.c0 AS __c399, test1.c1 AS __c400, test1.c2 AS __c401, test1.c3 AS __c402
FROM (test1)) __t403
ON __t398.__c395 = __t403.__c401) __t404)
WHERE __t404.__c395 != __t404.__c396; ";
let ast = Parser::parse_sql(&dialect, sql).unwrap();
println!("AST: {:?}", ast);
}
```
``` bash
➜ sp git:(master) ✗ cargo run
Compiling sp v0.1.0 (/Users/willy/willy/df-workspace/sp)
Finished dev [unoptimized + debuginfo] target(s) in 0.30s
Running `target/debug/sp`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ParserError("Expected ), found: __t404")', src/main.rs:26:48
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```
looks like the outer alias doesn't work. the following sql works
``` SQL
❯ SELECT __c394, __c395, __c396, __c397
FROM ((
(SELECT test1.c0 AS __c394, test1.c1 AS __c395, test1.c2 AS __c396, test1.c3 AS __c397
FROM (test1)) __t398
INNER JOIN
(SELECT test1.c0 AS __c399, test1.c1 AS __c400, test1.c2 AS __c401, test1.c3 AS __c402
FROM (test1)) __t403
ON __t398.__c395 = __t403.__c401) )
WHERE __c395 != __c396;
0 rows in set. Query took 0.004 seconds.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] mkmik commented on issue #2867: Error parsing valid SQL with aliases
Posted by GitBox <gi...@apache.org>.
mkmik commented on issue #2867:
URL: https://github.com/apache/arrow-datafusion/issues/2867#issuecomment-1305401887
I stumbled upon a bug that may be a simpler reproducer of this
```console
❯ select a from (select a from (select 1 as a) as foo group by 1 order by a) as c;
SchemaError(FieldNotFound { field: Column { relation: Some("foo"), name: "a" }, valid_fields: Some([Column { relation: Some("c"), name: "a" }]) })
```
----
Interestingly if I remove as c in the outer expression it works:
```console
❯ select a from (select a from (select 1 as a) as foo group by 1 order by a);
+---+
| a |
+---+
| 1 |
+---+
1 row in set. Query took 0.005 seconds.
```
It's also interesting that using positional column references in the order by works:
```console
❯ select a from (select a from (select 1 as a) as foo group by 1 order by 1) as c;
+---+
| a |
+---+
| 1 |
+---+
1 row in set. Query took 0.005 seconds.
```
Furthermore, the bug reproduces only if `group by a` and `order by a` are *both* present:
```console
❯ select a from (select a from (select 1 as a) as foo group by a) as c;
+---+
| a |
+---+
| 1 |
+---+
1 row in set. Query took 0.005 seconds.
❯ select a from (select a from (select 1 as a) as foo order by a) as c;
+---+
| a |
+---+
| 1 |
+---+
1 row in set. Query took 0.004 seconds.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] waitingkuo commented on issue #2867: Error parsing valid SQL with aliases
Posted by GitBox <gi...@apache.org>.
waitingkuo commented on issue #2867:
URL: https://github.com/apache/arrow-datafusion/issues/2867#issuecomment-1207912280
@andygrove the parser error fixed in #3072
now it raised another error
```bash
DataFusion CLI v10.0.0
0 rows in set. Query took 0.038 seconds.
SchemaError(FieldNotFound { qualifier: Some("__t404"), name: "__c395", valid_fields: Some(["__t398.__c394", "__t398.__c395", "__t398.__c396", "__t398.__c397", "__t403.__c399", "__t403.__c400", "__t403.__c401", "__t403.__c402"]) })
```
I feel like this might have the similar issue as #3073
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org