You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "hu6360567 (via GitHub)" <gi...@apache.org> on 2023/05/29 05:06:51 UTC

[GitHub] [arrow-datafusion-python] hu6360567 opened a new issue, #399: fail to sql query if column contains capitalized letter

hu6360567 opened a new issue, #399:
URL: https://github.com/apache/arrow-datafusion-python/issues/399

   **Describe the bug**
   fail to execute sql when select with column if the column contains capitalized letter
   
   **To Reproduce**
   ```python
   import pyarrow as pa
   import datafusion
   
   if __name__ == "__main__":
       tbl = pa.Table.from_arrays([[1, 2, 3]], names=["id"])
   
       for column_name in ["id", "ID", "iD"]:
           ctx = datafusion.SessionContext()
           ctx.register_record_batches(name="tbl", partitions=[tbl.rename_columns([column_name]).to_batches()])
   
           sql = f"SELECT {column_name} from tbl"
   
   
           try:
               print(sql)
               ctx.sql(sql)
           except Exception as e:
               print(e)
   ```
   ```
   SELECT id from tbl
   SELECT ID from tbl
   DataFusion error: SchemaError(FieldNotFound { field: Column { relation: None, name: "id" }, valid_fields: [Column { relation: Some(Bare { table: "tbl" }), name: "ID" }] })
   SELECT iD from tbl
   DataFusion error: SchemaError(FieldNotFound { field: Column { relation: None, name: "id" }, valid_fields: [Column { relation: Some(Bare { table: "tbl" }), name: "iD" }] })
   ```
   
   **Expected behavior**
   All sql should executed successfully.
   
   **Additional context**
   Python 3.8.16
   pyarrow 12.0.0
   datafusion 23.0.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] hu6360567 commented on issue #399: fail to sql query if column contains capitalized letter

Posted by "hu6360567 (via GitHub)" <gi...@apache.org>.
hu6360567 commented on issue #399:
URL: https://github.com/apache/arrow-datafusion-python/issues/399#issuecomment-1566755257

   seems column name needs to be quoted. `SELECT "ID" FROM tbl` works as expected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] mesejo commented on issue #399: fail to sql query if column contains capitalized letter

Posted by "mesejo (via GitHub)" <gi...@apache.org>.
mesejo commented on issue #399:
URL: https://github.com/apache/arrow-datafusion-python/issues/399#issuecomment-1665288691

   If anyone else encounters a similar issue, they may find helpful information regarding this behavior by referring to the section [Identifiers and Capitalization](https://arrow.apache.org/datafusion/user-guide/example-usage.html#identifiers-and-capitalization) of DataFusion's documentation. Reproducing for completeness:
   
   > Please be aware that all identifiers are effectively made lower-case in SQL, so if your csv file has capital letters (ex: Name) you must put your column name in double quotes or the examples won’t work.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org