You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "hu6360567 (via GitHub)" <gi...@apache.org> on 2023/05/29 05:06:51 UTC
[GitHub] [arrow-datafusion-python] hu6360567 opened a new issue, #399: fail to sql query if column contains capitalized letter
hu6360567 opened a new issue, #399:
URL: https://github.com/apache/arrow-datafusion-python/issues/399
**Describe the bug**
fail to execute sql when select with column if the column contains capitalized letter
**To Reproduce**
```python
import pyarrow as pa
import datafusion
if __name__ == "__main__":
tbl = pa.Table.from_arrays([[1, 2, 3]], names=["id"])
for column_name in ["id", "ID", "iD"]:
ctx = datafusion.SessionContext()
ctx.register_record_batches(name="tbl", partitions=[tbl.rename_columns([column_name]).to_batches()])
sql = f"SELECT {column_name} from tbl"
try:
print(sql)
ctx.sql(sql)
except Exception as e:
print(e)
```
```
SELECT id from tbl
SELECT ID from tbl
DataFusion error: SchemaError(FieldNotFound { field: Column { relation: None, name: "id" }, valid_fields: [Column { relation: Some(Bare { table: "tbl" }), name: "ID" }] })
SELECT iD from tbl
DataFusion error: SchemaError(FieldNotFound { field: Column { relation: None, name: "id" }, valid_fields: [Column { relation: Some(Bare { table: "tbl" }), name: "iD" }] })
```
**Expected behavior**
All sql should executed successfully.
**Additional context**
Python 3.8.16
pyarrow 12.0.0
datafusion 23.0.0
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion-python] hu6360567 commented on issue #399: fail to sql query if column contains capitalized letter
Posted by "hu6360567 (via GitHub)" <gi...@apache.org>.
hu6360567 commented on issue #399:
URL: https://github.com/apache/arrow-datafusion-python/issues/399#issuecomment-1566755257
seems column name needs to be quoted. `SELECT "ID" FROM tbl` works as expected.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion-python] mesejo commented on issue #399: fail to sql query if column contains capitalized letter
Posted by "mesejo (via GitHub)" <gi...@apache.org>.
mesejo commented on issue #399:
URL: https://github.com/apache/arrow-datafusion-python/issues/399#issuecomment-1665288691
If anyone else encounters a similar issue, they may find helpful information regarding this behavior by referring to the section [Identifiers and Capitalization](https://arrow.apache.org/datafusion/user-guide/example-usage.html#identifiers-and-capitalization) of DataFusion's documentation. Reproducing for completeness:
> Please be aware that all identifiers are effectively made lower-case in SQL, so if your csv file has capital letters (ex: Name) you must put your column name in double quotes or the examples won’t work.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org