You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2020/08/01 19:47:05 UTC

[GitHub] [incubator-superset] bkyryliuk opened a new pull request #10498: feat: welcome presto to the suite of tested databases

bkyryliuk opened a new pull request #10498:
URL: https://github.com/apache/incubator-superset/pull/10498


   Adds presto to the CI
   
   ### SUMMARY
   * introduces test suite with presto & memory connector
   * splits main from examples db in the tests
   
   Based on: https://github.com/apache/incubator-superset/pull/10487
   Test only change.
   
   ### TEST PLAN
   * CI


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] bkyryliuk merged pull request #10498: feat: welcome presto to the suite of tested databases

Posted by GitBox <gi...@apache.org>.
bkyryliuk merged pull request #10498:
URL: https://github.com/apache/incubator-superset/pull/10498


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] willbarrett commented on a change in pull request #10498: feat: welcome presto to the suite of tested databases

Posted by GitBox <gi...@apache.org>.
willbarrett commented on a change in pull request #10498:
URL: https://github.com/apache/incubator-superset/pull/10498#discussion_r465320909



##########
File path: requirements-dev.txt
##########
@@ -30,7 +30,9 @@ pre-commit==1.17.0
 psycopg2-binary==2.8.5
 pycodestyle==2.5.0
 pydruid==0.6.1
-pyhive==0.6.2
+# Enable in-place multirow inserts
+# TODO(bkyryliuk): release new version of pyhive
+git+https://github.com/dropbox/PyHive@master

Review comment:
       I think this will be a blocker - the community has not been accepting of installing dependencies from Github.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] willbarrett commented on a change in pull request #10498: feat: welcome presto to the suite of tested databases

Posted by GitBox <gi...@apache.org>.
willbarrett commented on a change in pull request #10498:
URL: https://github.com/apache/incubator-superset/pull/10498#discussion_r465321272



##########
File path: superset/examples/birth_names.py
##########
@@ -54,19 +54,26 @@ def gen_filter(
 
 def load_data(tbl_name: str, database: Database, sample: bool = False) -> None:
     pdf = pd.read_json(get_example_data("birth_names.json.gz"))
-    pdf.ds = pd.to_datetime(pdf.ds, unit="ms")
+    if database.backend != "presto":

Review comment:
       Nit: switching the order of the conditions might make this more readable.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] willbarrett commented on a change in pull request #10498: feat: welcome presto to the suite of tested databases

Posted by GitBox <gi...@apache.org>.
willbarrett commented on a change in pull request #10498:
URL: https://github.com/apache/incubator-superset/pull/10498#discussion_r465714838



##########
File path: superset/utils/core.py
##########
@@ -1022,6 +1022,13 @@ def get_example_database() -> "Database":
     return get_or_create_db("examples", db_uri)
 
 
+def get_main_database() -> "Database":

Review comment:
       Ah, forgive the confusion on my part. This is fine as-is then.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] bkyryliuk commented on a change in pull request #10498: feat: welcome presto to the suite of tested databases

Posted by GitBox <gi...@apache.org>.
bkyryliuk commented on a change in pull request #10498:
URL: https://github.com/apache/incubator-superset/pull/10498#discussion_r466578324



##########
File path: superset/examples/birth_names.py
##########
@@ -54,19 +54,26 @@ def gen_filter(
 
 def load_data(tbl_name: str, database: Database, sample: bool = False) -> None:
     pdf = pd.read_json(get_example_data("birth_names.json.gz"))
-    pdf.ds = pd.to_datetime(pdf.ds, unit="ms")
+    if database.backend == "presto":
+        pdf.ds = pd.to_datetime(pdf.ds, unit="ms")
+        pdf.ds = pdf.ds.dt.strftime("%Y-%m-%d %H:%M%:%S")
+    else:
+        pdf.ds = pd.to_datetime(pdf.ds, unit="ms")

Review comment:
       my plan here is a bit different, we have talked about getting rid of load_examples for the test, I plan to move initialization code into the pytest fixture and will do that cleanup & restructure code to be more database generic.
   
   The scope of the changes is test only, I prefer not to modify production code.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] villebro commented on a change in pull request #10498: feat: welcome presto to the suite of tested databases

Posted by GitBox <gi...@apache.org>.
villebro commented on a change in pull request #10498:
URL: https://github.com/apache/incubator-superset/pull/10498#discussion_r466568237



##########
File path: tests/sqla_models_tests.py
##########
@@ -125,7 +129,11 @@ def test_extra_cache_keys(self, flask_g):
         )
         extra_cache_keys = table.get_extra_cache_keys(query_obj)
         self.assertTrue(table.has_extra_cache_key_calls(query_obj))
-        self.assertListEqual(extra_cache_keys, ["abc"])
+        # TODO: make it work with presto
+        if get_example_database().backend == "presto":
+            assert extra_cache_keys == []
+        else:
+            assert extra_cache_keys == ["abc"]

Review comment:
       Same here..

##########
File path: superset/examples/birth_names.py
##########
@@ -54,19 +54,26 @@ def gen_filter(
 
 def load_data(tbl_name: str, database: Database, sample: bool = False) -> None:
     pdf = pd.read_json(get_example_data("birth_names.json.gz"))
-    pdf.ds = pd.to_datetime(pdf.ds, unit="ms")
+    if database.backend == "presto":
+        pdf.ds = pd.to_datetime(pdf.ds, unit="ms")
+        pdf.ds = pdf.ds.dt.strftime("%Y-%m-%d %H:%M%:%S")
+    else:
+        pdf.ds = pd.to_datetime(pdf.ds, unit="ms")
     pdf = pdf.head(100) if sample else pdf
+
     pdf.to_sql(
         tbl_name,
         database.get_sqla_engine(),
         if_exists="replace",
         chunksize=500,
         dtype={
-            "ds": DateTime,
+            # TODO(bkyryliuk): use TIMESTAMP type for presto
+            "ds": DateTime if database.backend != "presto" else String(255),

Review comment:
       Same here, e.g. `get_examples_datetime_type()`. There are a few places below to which I feel the same applies.

##########
File path: superset/examples/birth_names.py
##########
@@ -54,19 +54,26 @@ def gen_filter(
 
 def load_data(tbl_name: str, database: Database, sample: bool = False) -> None:
     pdf = pd.read_json(get_example_data("birth_names.json.gz"))
-    pdf.ds = pd.to_datetime(pdf.ds, unit="ms")
+    if database.backend == "presto":
+        pdf.ds = pd.to_datetime(pdf.ds, unit="ms")
+        pdf.ds = pdf.ds.dt.strftime("%Y-%m-%d %H:%M%:%S")
+    else:
+        pdf.ds = pd.to_datetime(pdf.ds, unit="ms")

Review comment:
       This logic should ideally be moved out to `db_engine_specs/presto.py` to keep this clean of db-specific logic. Something along the lines of `BaseEngineSpec.convert_examples_datetime()` or similar (there are other similar methods there).

##########
File path: tests/database_api_tests.py
##########
@@ -222,7 +223,7 @@ def test_get_select_star_not_found_table(self):
             return
         uri = f"api/v1/database/{example_db.id}/select_star/table_does_not_exist/"
         rv = self.client.get(uri)
-        self.assertEqual(rv.status_code, 404)
+        self.assertEqual(rv.status_code, 404 if example_db.backend != "presto" else 500)

Review comment:
       I'd be interested in understanding why this is returning a 500. Perhaps add a TODO here so we can follow up on it.

##########
File path: tests/celery_tests.py
##########
@@ -174,8 +177,23 @@ def test_run_sync_query_cta(self, ctas_method):
         )
         # provide better error message
         self.assertEqual(QueryStatus.SUCCESS, result["query"]["state"], msg=result)
-        self.assertEqual([], result["data"])
-        self.assertEqual([], result["columns"])
+
+        expected_result = []
+        if backend == "presto":
+            expected_result = (
+                [{"rows": 1}] if ctas_method == CtasMethod.TABLE else [{"result": True}]
+            )
+        self.assertEqual(expected_result, result["data"])
+        expected_columns = []
+        if backend == "presto":
+            expected_columns = [
+                {
+                    "name": "rows" if ctas_method == CtasMethod.TABLE else "result",
+                    "type": "BIGINT" if ctas_method == CtasMethod.TABLE else "BOOLEAN",
+                    "is_date": False,
+                }
+            ]

Review comment:
       This also runs the risk of becoming convoluted if we introduce db-specific logic for all supported dbs. As this logic is repeated below, I think there is merit to centralizing this somewhere, too, but `db_engine_specs` probably isn't the right place for test assertion related stuff. I'm open to suggestions, but feel we can probably leave this as-is until we come up with a more scalable solution.

##########
File path: tests/sqla_models_tests.py
##########
@@ -93,7 +93,11 @@ def test_extra_cache_keys(self, flask_g):
         query_obj = dict(**base_query_obj, extras={})
         extra_cache_keys = table.get_extra_cache_keys(query_obj)
         self.assertTrue(table.has_extra_cache_key_calls(query_obj))
-        self.assertListEqual(extra_cache_keys, ["abc"])
+        # TODO: make it work with presto
+        if get_example_database().backend == "presto":
+            assert extra_cache_keys == []
+        else:
+            assert extra_cache_keys == ["abc"]

Review comment:
       hmm, I wonder why this is failing for presto. This really shouldn't have anything to do with the underlying analytical db type..




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] bkyryliuk commented on a change in pull request #10498: feat: welcome presto to the suite of tested databases

Posted by GitBox <gi...@apache.org>.
bkyryliuk commented on a change in pull request #10498:
URL: https://github.com/apache/incubator-superset/pull/10498#discussion_r466585259



##########
File path: tests/sqla_models_tests.py
##########
@@ -93,7 +93,11 @@ def test_extra_cache_keys(self, flask_g):
         query_obj = dict(**base_query_obj, extras={})
         extra_cache_keys = table.get_extra_cache_keys(query_obj)
         self.assertTrue(table.has_extra_cache_key_calls(query_obj))
-        self.assertListEqual(extra_cache_keys, ["abc"])
+        # TODO: make it work with presto
+        if get_example_database().backend == "presto":
+            assert extra_cache_keys == []
+        else:
+            assert extra_cache_keys == ["abc"]

Review comment:
       agree, did not investigate - worth looking into




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] bkyryliuk commented on a change in pull request #10498: feat: welcome presto to the suite of tested databases

Posted by GitBox <gi...@apache.org>.
bkyryliuk commented on a change in pull request #10498:
URL: https://github.com/apache/incubator-superset/pull/10498#discussion_r465400301



##########
File path: superset/utils/core.py
##########
@@ -1022,6 +1022,13 @@ def get_example_database() -> "Database":
     return get_or_create_db("examples", db_uri)
 
 
+def get_main_database() -> "Database":

Review comment:
       it is called in superset as a main database on the dev installation, I am fine with renaming it - however we should probably do it across the board in a separate PR




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] willbarrett commented on a change in pull request #10498: feat: welcome presto to the suite of tested databases

Posted by GitBox <gi...@apache.org>.
willbarrett commented on a change in pull request #10498:
URL: https://github.com/apache/incubator-superset/pull/10498#discussion_r465321913



##########
File path: superset/utils/core.py
##########
@@ -1022,6 +1022,13 @@ def get_example_database() -> "Database":
     return get_or_create_db("examples", db_uri)
 
 
+def get_main_database() -> "Database":

Review comment:
       We usually refer to this as the "metadata" database - should that be in the function name?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] bkyryliuk commented on pull request #10498: feat: welcome presto to the suite of tested databases

Posted by GitBox <gi...@apache.org>.
bkyryliuk commented on pull request #10498:
URL: https://github.com/apache/incubator-superset/pull/10498#issuecomment-670083445


   > I think we should aim to put as much db-specific logic in `db_engine_specs`, especially in the examples loading logic. We need to add similar logic for mutating column names, e.g. BigQuery doesn't support columns starting with a number (there's other similar problems for other dbs).
   > 
   > WRT the test assertions, I think those are fine left as-is. Also curious about that extra cache key test failing, that seems like a potential security problem, so I'm happy to help track down what's causing that.
   
   Agree with all the suggestions, my only comment here is this PR is fairly large already - I am happy to tackle the suggestions in the followup PRs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] bkyryliuk commented on a change in pull request #10498: feat: welcome presto to the suite of tested databases

Posted by GitBox <gi...@apache.org>.
bkyryliuk commented on a change in pull request #10498:
URL: https://github.com/apache/incubator-superset/pull/10498#discussion_r466584068



##########
File path: tests/celery_tests.py
##########
@@ -174,8 +177,23 @@ def test_run_sync_query_cta(self, ctas_method):
         )
         # provide better error message
         self.assertEqual(QueryStatus.SUCCESS, result["query"]["state"], msg=result)
-        self.assertEqual([], result["data"])
-        self.assertEqual([], result["columns"])
+
+        expected_result = []
+        if backend == "presto":
+            expected_result = (
+                [{"rows": 1}] if ctas_method == CtasMethod.TABLE else [{"result": True}]
+            )
+        self.assertEqual(expected_result, result["data"])
+        expected_columns = []
+        if backend == "presto":
+            expected_columns = [
+                {
+                    "name": "rows" if ctas_method == CtasMethod.TABLE else "result",
+                    "type": "BIGINT" if ctas_method == CtasMethod.TABLE else "BOOLEAN",
+                    "is_date": False,
+                }
+            ]

Review comment:
       agree, will tackle it in the upcoming PRs, do not have clear solution yet.
   most likely it would be possible to introduce TestDBSpec and hide that complexity there.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [incubator-superset] bkyryliuk commented on a change in pull request #10498: feat: welcome presto to the suite of tested databases

Posted by GitBox <gi...@apache.org>.
bkyryliuk commented on a change in pull request #10498:
URL: https://github.com/apache/incubator-superset/pull/10498#discussion_r466584813



##########
File path: tests/database_api_tests.py
##########
@@ -222,7 +223,7 @@ def test_get_select_star_not_found_table(self):
             return
         uri = f"api/v1/database/{example_db.id}/select_star/table_does_not_exist/"
         rv = self.client.get(uri)
-        self.assertEqual(rv.status_code, 404)
+        self.assertEqual(rv.status_code, 404 if example_db.backend != "presto" else 500)

Review comment:
       pyhive raises the exception, probably it is not caught, left todo




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org