You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2020/06/15 13:39:07 UTC

[GitHub] [incubator-superset] thibault-ketterer opened a new issue #10054: Athena CSV import fail (no TEXT type)

thibault-ketterer opened a new issue #10054:
URL: https://github.com/apache/incubator-superset/issues/10054


   I import the following CSV into Athena
   
   ```
   viewing time,Date
   559211,2020-06-07
   454096,2020-06-06
   766314,2020-06-03
   638622,2020-06-01
   506586,2020-05-31
   468516,2019-10-01
   466481,2019-09-30
   519893,2019-09-29
   364684,2019-09-28
   403074,2019-09-27
   ```
   
   ### Expected results
   
   CSV to be uploaded to Athena
   
   ### Actual results
   
   Got Red error and Athena complains about types (TEXT or Date)
   
   #### Screenshots
   
   ![image](https://user-images.githubusercontent.com/4283686/84662347-00778780-af1c-11ea-8ce6-50e577d806fd.png)
   
   <img width="1188" alt="screen" src="https://user-images.githubusercontent.com/4283686/84662535-39aff780-af1c-11ea-92a9-dbf11ba4fcaf.png">
   
   with Date in the column "Date"
   <img width="1192" alt="screen_date" src="https://user-images.githubusercontent.com/4283686/84662757-8562a100-af1c-11ea-8c55-23f732e60bc3.png">
   
   
   #### How to reproduce the bug
   
   0. have a working athena configuration (test connection is ok) -> be allowed to upload CSV on schema "default"
   
   1. Go to 'superset home"
   2. Click on 'sources/Upload CSV'
   3. enter table name "test"
   4. choose you athena configuration connection
   5. choose the csv test file
   6. enter schema "default"
   
   
   ### Environment
   
   (please complete the following information):
   
   - superset version: 0.35.2 (also tested with last master AND last release 0.36 as of 2020-06-15)
   - python version: Python 3.6.9
   - node.js version: v10.20.1
   - npm version:  6.14.4
   
   Without giving advice on columns
   `Unable to upload CSV file "test_athena.csv" to table "test" in database "awsathena". Error message: (pyathena.error.OperationalError) FAILED: ParseException line 3:14 cannot recognize input near 'TEXT' ')' 'STORED' in column type [SQL: CREATE EXTERNAL TABLE `default`.test ( `559211` BIGINT, `2020-06-07` TEXT ) STORED AS PARQUET LOCATION 's3://xxxxxxxx-athena/superset/default/test/' ] (Background on this error at: http://sqlalche.me/e/e3q8)`
   
   And If I try to indicate the Date Column is a Date
   `Unable to upload CSV file "test_athena.csv" to table "test" in database "awsathena". Error message: (pyathena.error.OperationalError) FAILED: SemanticException [Error 10099]: DATETIME type isn't supported yet. Please use DATE or TIMESTAMP instead [SQL: CREATE EXTERNAL TABLE `default`.test ( txt BIGINT, `Date` DATETIME ) STORED AS PARQUET LOCATION 's3://xxxxxx-athena/superset/default/test/' ] (Background on this error at: http://sqlalche.me/e/e3q8)`
   
   ### Checklist
   
   Make sure these boxes are checked before submitting your issue - thank you!
   
   - [ x ] I have checked the superset logs for python stacktraces and included it here as text if there are any.
   - [ x ] I have reproduced the issue with at least the latest released version of superset. (also tested with last master AND last release 0.36 as of 2020-06-15)
   - [ x ] I have checked the issue tracker for the same issue and I haven't found one similar.
   
   ### Additional context
   
   I think there is 2 problems here
   - Athena and TEXT format which does not exists https://docs.aws.amazon.com/athena/latest/ug/data-types.html
   - Athena Date conversion does not seem to work properly
   
   I've digged a little into the code and found interesting methods like this one  `get_sqla_column_type`( https://github.com/apache/incubator-superset/blob/master/superset/db_engine_specs/base.py#L850
   like in mssql
   https://github.com/apache/incubator-superset/blob/master/superset/db_engine_specs/mssql.py#L80
   
   or this technique used in hive (a bit overkill ?) overriding the whole csv reading `create_table_from_csv`
   https://github.com/apache/incubator-superset/blob/master/superset/db_engine_specs/hive.py#L124
   
   maybe override or just the df_to_sql and change the type on the fly
   https://github.com/apache/incubator-superset/blob/master/superset/db_engine_specs/base.py#L445
   
   I would be glad if someone could give me a hint on where to fix the issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] mistercrunch commented on issue #10054: Athena CSV import fail (no TEXT type)

Posted by GitBox <gi...@apache.org>.
mistercrunch commented on issue #10054:
URL: https://github.com/apache/incubator-superset/issues/10054#issuecomment-647716086


   it's hard to write generic code here, especially for things like Preset/Athena that support numerous serdes. We might want some dynamic configuration hooks here too, meaning you define what happens in your environment if needed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] bkyryliuk commented on issue #10054: Athena CSV import fail (no TEXT type)

Posted by GitBox <gi...@apache.org>.
bkyryliuk commented on issue #10054:
URL: https://github.com/apache/incubator-superset/issues/10054#issuecomment-645537901


   Superset allows to implement csv upload functionality for each database if needed, default implementation works for presto but for example doesn't for for hive and has a separate implementations:
   https://github.com/apache/incubator-superset/blob/8744dadca8f98a84c7cbdbd098e53e435627063d/superset/db_engine_specs/hive.py#L110
   
   Contributions are more than welcome


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] villebro commented on issue #10054: Athena CSV import fail (no TEXT type)

Posted by GitBox <gi...@apache.org>.
villebro commented on issue #10054:
URL: https://github.com/apache/incubator-superset/issues/10054#issuecomment-648154806


   It appears there may be some problem with how Pandas' `read_csv` converts columns to Numpy types, and how those are later changed into a create table statement.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] stale[bot] closed issue #10054: Athena CSV import fail (no TEXT type)

Posted by GitBox <gi...@apache.org>.
stale[bot] closed issue #10054:
URL: https://github.com/apache/incubator-superset/issues/10054


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] thibault-ketterer commented on issue #10054: Athena CSV import fail (no TEXT type)

Posted by GitBox <gi...@apache.org>.
thibault-ketterer commented on issue #10054:
URL: https://github.com/apache/incubator-superset/issues/10054#issuecomment-646311616


   well, yeah, as I said previously I read the code and I was asking what would be the better way to fix the issue
   
   I think reimplementing the whole thing would be overkill.
   
   I don't precisely get what the `get_sqla_column_type` function does and if it might be enough to implement only this function for Athena to override the default types and replace TEXT by STRING.
   As it seem to be the case for mssql implementation.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] issue-label-bot[bot] commented on issue #10054: Athena CSV import fail (no TEXT type)

Posted by GitBox <gi...@apache.org>.
issue-label-bot[bot] commented on issue #10054:
URL: https://github.com/apache/incubator-superset/issues/10054#issuecomment-644141057


   Issue-Label Bot is automatically applying the label `#bug` to this issue, with a confidence of 0.86. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! 
   
    Links: [app homepage](https://github.com/marketplace/issue-label-bot), [dashboard](https://mlbot.net/data/apache/incubator-superset) and [code](https://github.com/hamelsmu/MLapp) for this bot.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] stale[bot] commented on issue #10054: Athena CSV import fail (no TEXT type)

Posted by GitBox <gi...@apache.org>.
stale[bot] commented on issue #10054:
URL: https://github.com/apache/incubator-superset/issues/10054#issuecomment-678677914


   This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue `.pinned` to prevent stale bot from closing the issue.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org