You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2021/02/08 18:13:00 UTC

[GitHub] [superset] kekwan opened a new issue #13009: Incorrect SELECT * statement previewing tables with timestamp partitions

kekwan opened a new issue #13009:
URL: https://github.com/apache/superset/issues/13009


   When previewing a table in SQL Lab, a SELECT * statement is generated based on table data and table metadata if it contains partitions. When generating the SELECT * statement on a table with partitions, the table data, 'TIMESTAMP' in my case is not casted in the WHERE block. 
   
   ### Expected results
   
   Previewing tables with data partitions should work. The WHERE block of the generated SELECT * statement should talk into account casting of data types, particularly TIMESTAMP.
   
   ### Actual results
   
   `select_star` method calls `where_latest_partition` to generate WHERE block when partition exists. `where_latest_partition` is failing to CAST data types needed for WHERE block causing syntax error when it runs the select * statement.
   
   SQL Statement generated by `get_table_metadata` calling `select_star`. Columns `datetimepartition` and `etl_ts` should be casted to TIMESTAMP, not VARCHAR.
   
   ```
   SELECT "event_type" AS "event_type",
          "trace_id" AS "trace_id",
          "tenant" AS "tenant",
          "core_tenant" AS "core_tenant",
          "event_timestamp" AS "event_timestamp",
          "table_id" AS "table_id",
          "dtr_workspace_id" AS "dtr_workspace_id",
          "connectors" AS "connectors",
          "table_type" AS "table_type",
          "row_count" AS "row_count",
          "field_count" AS "field_count",
          "partition_count" AS "partition_count",
          "data_size_bytes" AS "data_size_bytes",
          "datetimepartition" AS "datetimepartition",
          "etl_ts" AS "etl_ts",
          "insert_try_number" AS "insert_try_number"
   FROM "events"."data_tables"
   WHERE "datetimepartition" = '2021-02-08 16:00:00.000'
     AND "etl_ts" = '2021-02-08 16:30:00.000'
     AND "insert_try_number" = 1
   LIMIT 100
   ```
   Syntax error from Presto due to the incorrect SQL statement
   `Presto error: Cannot apply operator: timestamp(3) = varchar(23)`
   
   #### Screenshots
   ![image](https://user-images.githubusercontent.com/19199254/107262094-fc413280-69f4-11eb-8816-af2de056338e.png)
   
   
   #### How to reproduce the bug
   
   1. Go to SQL Lab
   2. Preview Trino/Presto table with partitions and TIMESTAMP as a partition column.
   
   ### Environment
   
   
   - superset version: `0.37.2`
   - python version: `python 3.6`
   
   ### Checklist
   
   Make sure to follow these steps before submitting your issue - thank you!
   
   - [x] I have checked the superset logs for python stacktraces and included it here as text if there are any.
   - [x] I have reproduced the issue with at least the latest released version of superset.
   - [x] I have checked the issue tracker for the same issue and I haven't found one similar.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] kekwan commented on issue #13009: Incorrect SELECT * statement previewing tables with timestamp partitions

Posted by GitBox <gi...@apache.org>.
kekwan commented on issue #13009:
URL: https://github.com/apache/superset/issues/13009#issuecomment-777736133


   @tooptoop4 Yup. I filed a PR https://github.com/apache/superset/pull/13082


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] tooptoop4 commented on issue #13009: Incorrect SELECT * statement previewing tables with timestamp partitions

Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #13009:
URL: https://github.com/apache/superset/issues/13009#issuecomment-777248803


   @kekwan can u share the code for the fix?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] tritonrc commented on issue #13009: Incorrect SELECT * statement previewing tables with timestamp partitions

Posted by GitBox <gi...@apache.org>.
tritonrc commented on issue #13009:
URL: https://github.com/apache/superset/issues/13009#issuecomment-776857855


   Hi, I ran into this same issue yesterday and after digging around the code I landed on a fix.  In my case, the fields were coming back marked as `Time` type.  Looking at https://github.com/apache/superset/blob/master/superset/db_engine_specs/presto.py#L576
   
   I insert two lines to cover the `Time` type at line 579:
   
   ```python
   if tt == utils.TemporalType.TIME:
               return f"""from_iso8601_timestamp('{dttm.isoformat(timespec="microseconds")}')"""  # pylint: disable=line-too-long
   ``` 
   
   As part of digging into this, I ran across the recently landed PR https://github.com/apache/superset/pull/12861 which might help here as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] kekwan commented on issue #13009: Incorrect SELECT * statement previewing tables with timestamp partitions

Posted by GitBox <gi...@apache.org>.
kekwan commented on issue #13009:
URL: https://github.com/apache/superset/issues/13009#issuecomment-776886300


   Hey @tritonrc. I also fixed the issue but had a slightly different fix for my scenario.
   
   Issue 1 was our Presto `TIMESTAMP` columns were getting marked as `OTHER` (not `TIME` either) due to Trino/Presto's new timestamp precision type. We followed this https://github.com/apache/superset/issues/11138#issuecomment-702475356, to omit timestamp precision and it correctly identified our `TIMESTAMP` columns.
   
   Even this didn't fix our issue with previewing tables however. I believe Presto/Trino doesn't do automatic casting in `WHERE` statements so when the SELECT * is generated for preview tables, it needs to explicitly cast the partition value to 
   `TIMESTAMP`. This is a valid query in Presto (notice the explicit casting of the partition value)
   
   ```
   SELECT "event_type" AS "event_type",
          "trace_id" AS "trace_id",
          "tenant" AS "tenant",
          "core_tenant" AS "core_tenant",
          "event_timestamp" AS "event_timestamp",
          "table_id" AS "table_id",
          "dtr_workspace_id" AS "dtr_workspace_id",
          "connectors" AS "connectors",
          "table_type" AS "table_type",
          "row_count" AS "row_count",
          "field_count" AS "field_count",
          "partition_count" AS "partition_count",
          "data_size_bytes" AS "data_size_bytes",
          "datetimepartition" AS "datetimepartition",
          "etl_ts" AS "etl_ts",
          "insert_try_number" AS "insert_try_number"
   FROM "events"."data_tables"
   WHERE "datetimepartition" = TIMESTAMP '2021-02-08 16:00:00.000'
     AND "etl_ts" = TIMESTAMP '2021-02-08 16:30:00.000'
     AND "insert_try_number" = 1
   LIMIT 100
   ```
   
   In order to do this, I implemented a TypeDecorator class which overrode existing functionality of the SQLA timestamp type and instantiated that class here when the column type is timestamp https://github.com/apache/superset/blob/master/superset/db_engine_specs/presto.py#L917


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org