You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2023/01/10 07:25:29 UTC

[GitHub] [superset] villebro commented on a diff in pull request #22635: fix: Stop query in SQL Lab with impala engine

villebro commented on code in PR #22635:
URL: https://github.com/apache/superset/pull/22635#discussion_r1065404439


##########
superset/db_engine_specs/impala.py:
##########
@@ -14,20 +14,31 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
+import logging
+import re
+import time
 from datetime import datetime
 from typing import Any, Dict, List, Optional
 
+from flask import current_app
 from sqlalchemy.engine.reflection import Inspector
+from sqlalchemy.orm import Session
 
+from superset.constants import QUERY_EARLY_CANCEL_KEY
 from superset.db_engine_specs.base import BaseEngineSpec
+from superset.models.sql_lab import Query
 from superset.utils import core as utils
 
+logger = logging.getLogger(__name__)
+
 
 class ImpalaEngineSpec(BaseEngineSpec):
     """Engine spec for Cloudera's Impala"""
 
     engine = "impala"
     engine_name = "Apache Impala"
+    # Query 5543ffdf692b7d02:f78a944000000000: 3% Complete (17 out of 547)
+    query_progress_r = re.compile(r".*Query.*: (?P<query_progress>[0-9]+)%.*")

Review Comment:
   nit: do we really need the leading and trailing `.*` here? 
   ```suggestion
       query_progress_r = re.compile(r"Query.*: (?P<query_progress>[0-9]+)%")
   ```
   
   Also, I know this is in line with what's being done in `hive.py`, but I would consider moving this outside the class into a constant `QUERY_PROGRESS_REGEX` in `impala.py`, as it's not defined in `BaseEngineSpec (defining it here makes it appear like we're overriding a class attribute in `BaseEngineSpec`). While at it, maybe do the same for `stage_progress_r` in `HiveEngineSpec`.



##########
superset/db_engine_specs/hive.py:
##########
@@ -375,7 +375,15 @@ def handle_cursor(  # pylint: disable=too-many-locals
                     last_log_line = len(log_lines)
                 if needs_commit:
                     session.commit()
-            time.sleep(current_app.config["HIVE_POLL_INTERVAL"])
+            if sleep_interval := current_app.config.get("HIVE_POLL_INTERVAL"):
+                logger.warning(
+                    "HIVE_POLL_INTERVAL is deprecated and will be removed in 3.0. Please use DB_POLL_INTERVAL instead"

Review Comment:
   The deprecation warning is incorrect:
   ```suggestion
                       "HIVE_POLL_INTERVAL is deprecated and will be removed in 3.0. Please use DB_POLL_INTERVAL_SECONDS instead"
   ```



##########
docker/pythonpath_dev/superset_config.py:
##########
@@ -106,6 +106,10 @@ class CeleryConfig(object):
 WEBDRIVER_BASEURL = "http://superset:8088/"
 # The base URL for the email report hyperlinks.
 WEBDRIVER_BASEURL_USER_FRIENDLY = WEBDRIVER_BASEURL
+# customize the polling time of each engine. The default time is 5 seconds
+DB_POLL_INTERVAL_SECONDS = {
+    "hive": int(timedelta(seconds=5).total_seconds()),
+}

Review Comment:
   I'm not sure if this is needed, as it's not changing the defaults.



##########
superset/config.py:
##########
@@ -1129,8 +1129,10 @@ def CSV_TO_HIVE_UPLOAD_DIRECTORY_FUNC(  # pylint: disable=invalid-name
 TRACKING_URL_TRANSFORMER = lambda url: url
 
 
-# Interval between consecutive polls when using Hive Engine
-HIVE_POLL_INTERVAL = int(timedelta(seconds=5).total_seconds())
+# customize the polling time of each engine. The default time is 5 seconds
+DB_POLL_INTERVAL_SECONDS = {
+    "hive": int(timedelta(seconds=5).total_seconds()),
+}

Review Comment:
   We can't really define defaults in `config.py`. Imagine if I want to override the poll interval for "impala" and add the following to `superset_config.py`:
   
   ```
   DB_POLL_INTERVAL_SECONDS = {"impala": 1}
   ```
   
   In this case the default for "hive" in `config.py` would be lost. This is why I recommended not defining the defaults in `config.py`, and rather placing them in their respective db engine specs. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org