You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2022/07/12 14:19:40 UTC

[GitHub] [superset] zhaoyongjie opened a new pull request, #20683: feat: the samples endpoint support filters

zhaoyongjie opened a new pull request, #20683:
URL: https://github.com/apache/superset/pull/20683

   <!---
   Please write the PR title following the conventions at https://www.conventionalcommits.org/en/v1.0.0/
   Example:
   fix(dashboard): load charts correctly
   -->
   
   ### SUMMARY
   <!--- Describe the change below, including rationale and design decisions -->
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   <!--- Skip this if not applicable -->
   
   ### TESTING INSTRUCTIONS
   support filter by column when send a samples query to the dataset.
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [ ] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in [SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on a diff in pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on code in PR #20683:
URL: https://github.com/apache/superset/pull/20683#discussion_r921052780


##########
superset/datasets/commands/samples.py:
##########
@@ -78,3 +116,24 @@ def validate(self) -> None:
             security_manager.raise_for_ownership(self._model)
         except SupersetSecurityException as ex:
             raise DatasetForbiddenError() from ex
+
+    @staticmethod
+    def get_limit_clause(
+        page: Optional[int], per_page: Optional[int]
+    ) -> Dict[str, int]:
+        samples_row_limit = app.config.get("SAMPLES_ROW_LIMIT", 1000)
+        limit = samples_row_limit
+        offset = 0
+
+        if isinstance(page, int) and isinstance(per_page, int):
+            limit = int(per_page)
+            if limit < 0 or limit > samples_row_limit:
+                # reset limit value if input is invalid
+                limit = samples_row_limit

Review Comment:
   Can be easily refactored to raise and exceptions if needed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on a diff in pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on code in PR #20683:
URL: https://github.com/apache/superset/pull/20683#discussion_r924548571


##########
superset/datasets/commands/samples.py:
##########
@@ -78,3 +116,24 @@ def validate(self) -> None:
             security_manager.raise_for_ownership(self._model)
         except SupersetSecurityException as ex:
             raise DatasetForbiddenError() from ex
+
+    @staticmethod
+    def get_limit_clause(
+        page: Optional[int], per_page: Optional[int]
+    ) -> Dict[str, int]:
+        samples_row_limit = app.config.get("SAMPLES_ROW_LIMIT", 1000)

Review Comment:
   That's what this setting does. for example, the SAMPLES_ROW_LIMIT = 10, but the source of the table has 100 rows, the samples will return 10 records, but the query of count(*)  will return a total count is 100, so the total_count is full count for this dataset(or query), and size of result always following the `SAMPLES_ROW_LIMIT` setting.
   
   ![image](https://user-images.githubusercontent.com/2016594/179769804-4f3a2f3d-a9f4-4828-b6f8-24be1c5c76e7.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on a diff in pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on code in PR #20683:
URL: https://github.com/apache/superset/pull/20683#discussion_r925674725


##########
superset/views/datasource/utils.py:
##########
@@ -0,0 +1,113 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from typing import Any, Dict, Optional
+
+from superset import app, db
+from superset.common.chart_data import ChartDataResultType
+from superset.common.query_context_factory import QueryContextFactory
+from superset.common.utils.query_cache_manager import QueryCacheManager
+from superset.constants import CacheRegion
+from superset.datasets.commands.exceptions import DatasetSamplesFailedError
+from superset.datasource.dao import DatasourceDAO
+from superset.utils.core import QueryStatus
+from superset.views.datasource.schemas import SamplesPayloadSchema
+
+
+def get_limit_clause(page: Optional[int], per_page: Optional[int]) -> Dict[str, int]:
+    samples_row_limit = app.config.get("SAMPLES_ROW_LIMIT", 1000)
+    limit = samples_row_limit
+    offset = 0
+
+    if isinstance(page, int) and isinstance(per_page, int):
+        limit = int(per_page)
+        if limit < 0 or limit > samples_row_limit:
+            # reset limit value if input is invalid
+            limit = samples_row_limit
+
+        offset = max((int(page) - 1) * limit, 0)
+
+    return {"row_offset": offset, "row_limit": limit}
+
+
+def get_samples(  # pylint: disable=too-many-arguments,too-many-locals
+    datasource_type: str,
+    datasource_id: int,
+    force: bool = False,
+    page: int = 1,
+    per_page: int = 1000,

Review Comment:
   the `SAMPLES_ROW_LIMIT` is not a constant, It is an attribute in the flask app config.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codyml commented on a diff in pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
codyml commented on code in PR #20683:
URL: https://github.com/apache/superset/pull/20683#discussion_r922566596


##########
superset/datasets/commands/samples.py:
##########
@@ -78,3 +116,24 @@ def validate(self) -> None:
             security_manager.raise_for_ownership(self._model)
         except SupersetSecurityException as ex:
             raise DatasetForbiddenError() from ex
+
+    @staticmethod
+    def get_limit_clause(
+        page: Optional[int], per_page: Optional[int]
+    ) -> Dict[str, int]:
+        samples_row_limit = app.config.get("SAMPLES_ROW_LIMIT", 1000)
+        limit = samples_row_limit
+        offset = 0
+
+        if isinstance(page, int) and isinstance(per_page, int):
+            limit = int(per_page)
+            if limit < 0 or limit > samples_row_limit:
+                # reset limit value if input is invalid
+                limit = samples_row_limit
+
+            offset = (int(page) - 1) * limit

Review Comment:
   I think the frontend defaults to 0-indexed page numbers.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on a diff in pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on code in PR #20683:
URL: https://github.com/apache/superset/pull/20683#discussion_r921055103


##########
superset/datasets/commands/samples.py:
##########
@@ -30,40 +29,79 @@
     DatasetSamplesFailedError,
 )
 from superset.datasets.dao import DatasetDAO
+from superset.datasets.schemas import DatasetSamplesQuerySchema
 from superset.exceptions import SupersetSecurityException
-from superset.utils.core import QueryStatus
-
-logger = logging.getLogger(__name__)
+from superset.utils.core import DatasourceDict, QueryStatus
 
 
 class SamplesDatasetCommand(BaseCommand):
-    def __init__(self, model_id: int, force: bool):
+    def __init__(
+        self,
+        model_id: int,
+        force: bool,
+        *,
+        payload: Optional[DatasetSamplesQuerySchema] = None,
+        page: Optional[int] = None,
+        per_page: Optional[int] = None,
+    ):
         self._model_id = model_id
         self._force = force
         self._model: Optional[SqlaTable] = None
+        self._payload = payload
+        self._page = page
+        self._per_page = per_page
 
     def run(self) -> Dict[str, Any]:
         self.validate()
-        if not self._model:
-            raise DatasetNotFoundError()
+        limit_clause = self.get_limit_clause(self._page, self._per_page)
+        self._model = cast(SqlaTable, self._model)
+        datasource: DatasourceDict = {
+            "type": self._model.type,
+            "id": self._model.id,
+        }
 
-        qc_instance = QueryContextFactory().create(
-            datasource={
-                "type": self._model.type,
-                "id": self._model.id,
-            },
-            queries=[{}],
+        # constructing samples query
+        samples_instance = QueryContextFactory().create(
+            datasource=datasource,
+            queries=[
+                {**self._payload, **limit_clause} if self._payload else limit_clause
+            ],
             result_type=ChartDataResultType.SAMPLES,
             force=self._force,
         )
-        results = qc_instance.get_payload()
+
+        # constructing count(*) query
+        count_star_payload = {
+            "metrics": [
+                {
+                    "expressionType": "SQL",
+                    "sqlExpression": "COUNT(*)",
+                    "label": "COUNT(*)",
+                }
+            ]
+        }
+        count_star_instance = QueryContextFactory().create(
+            datasource=datasource,
+            queries=[count_star_payload],
+            result_type=ChartDataResultType.FULL,
+            force=self._force,
+        )
+        samples_results = samples_instance.get_payload()
+        count_star_results = count_star_instance.get_payload()
+
         try:
-            sample_data = results["queries"][0]
-            error_msg = sample_data.get("error")
-            if sample_data.get("status") == QueryStatus.FAILED and error_msg:
+            sample_data = samples_results["queries"][0]
+            count_star_data = count_star_results["queries"][0]
+            failed_status = (
+                sample_data.get("status") == QueryStatus.FAILED
+                or count_star_data.get("status") == QueryStatus.FAILED
+            )
+            error_msg = sample_data.get("error") or count_star_data.get("error")
+            if failed_status and error_msg:
                 cache_key = sample_data.get("cache_key")
                 QueryCacheManager.delete(cache_key, region=CacheRegion.DATA)
                 raise DatasetSamplesFailedError(error_msg)
+            sample_data["dataset_count_star"] = count_star_data["data"][0]["COUNT(*)"]

Review Comment:
   the response of the `samples` endpoint added a new field that represents size of data in a dataset 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on a diff in pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on code in PR #20683:
URL: https://github.com/apache/superset/pull/20683#discussion_r924316315


##########
superset/datasets/commands/samples.py:
##########
@@ -78,3 +116,24 @@ def validate(self) -> None:
             security_manager.raise_for_ownership(self._model)
         except SupersetSecurityException as ex:
             raise DatasetForbiddenError() from ex
+
+    @staticmethod
+    def get_limit_clause(
+        page: Optional[int], per_page: Optional[int]
+    ) -> Dict[str, int]:
+        samples_row_limit = app.config.get("SAMPLES_ROW_LIMIT", 1000)
+        limit = samples_row_limit
+        offset = 0
+
+        if isinstance(page, int) and isinstance(per_page, int):
+            limit = int(per_page)
+            if limit < 0 or limit > samples_row_limit:
+                # reset limit value if input is invalid
+                limit = samples_row_limit
+
+            offset = (int(page) - 1) * limit

Review Comment:
   The page number starts from 1 instead of 0.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on a diff in pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on code in PR #20683:
URL: https://github.com/apache/superset/pull/20683#discussion_r924316315


##########
superset/datasets/commands/samples.py:
##########
@@ -78,3 +116,24 @@ def validate(self) -> None:
             security_manager.raise_for_ownership(self._model)
         except SupersetSecurityException as ex:
             raise DatasetForbiddenError() from ex
+
+    @staticmethod
+    def get_limit_clause(
+        page: Optional[int], per_page: Optional[int]
+    ) -> Dict[str, int]:
+        samples_row_limit = app.config.get("SAMPLES_ROW_LIMIT", 1000)
+        limit = samples_row_limit
+        offset = 0
+
+        if isinstance(page, int) and isinstance(per_page, int):
+            limit = int(per_page)
+            if limit < 0 or limit > samples_row_limit:
+                # reset limit value if input is invalid
+                limit = samples_row_limit
+
+            offset = (int(page) - 1) * limit

Review Comment:
   The page number starts from zero instead of 0.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codyml commented on a diff in pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
codyml commented on code in PR #20683:
URL: https://github.com/apache/superset/pull/20683#discussion_r922566596


##########
superset/datasets/commands/samples.py:
##########
@@ -78,3 +116,24 @@ def validate(self) -> None:
             security_manager.raise_for_ownership(self._model)
         except SupersetSecurityException as ex:
             raise DatasetForbiddenError() from ex
+
+    @staticmethod
+    def get_limit_clause(
+        page: Optional[int], per_page: Optional[int]
+    ) -> Dict[str, int]:
+        samples_row_limit = app.config.get("SAMPLES_ROW_LIMIT", 1000)
+        limit = samples_row_limit
+        offset = 0
+
+        if isinstance(page, int) and isinstance(per_page, int):
+            limit = int(per_page)
+            if limit < 0 or limit > samples_row_limit:
+                # reset limit value if input is invalid
+                limit = samples_row_limit
+
+            offset = (int(page) - 1) * limit

Review Comment:
   From what I've seen on the frontend, page indices start at 0.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on a diff in pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on code in PR #20683:
URL: https://github.com/apache/superset/pull/20683#discussion_r921055103


##########
superset/datasets/commands/samples.py:
##########
@@ -30,40 +29,79 @@
     DatasetSamplesFailedError,
 )
 from superset.datasets.dao import DatasetDAO
+from superset.datasets.schemas import DatasetSamplesQuerySchema
 from superset.exceptions import SupersetSecurityException
-from superset.utils.core import QueryStatus
-
-logger = logging.getLogger(__name__)
+from superset.utils.core import DatasourceDict, QueryStatus
 
 
 class SamplesDatasetCommand(BaseCommand):
-    def __init__(self, model_id: int, force: bool):
+    def __init__(
+        self,
+        model_id: int,
+        force: bool,
+        *,
+        payload: Optional[DatasetSamplesQuerySchema] = None,
+        page: Optional[int] = None,
+        per_page: Optional[int] = None,
+    ):
         self._model_id = model_id
         self._force = force
         self._model: Optional[SqlaTable] = None
+        self._payload = payload
+        self._page = page
+        self._per_page = per_page
 
     def run(self) -> Dict[str, Any]:
         self.validate()
-        if not self._model:
-            raise DatasetNotFoundError()
+        limit_clause = self.get_limit_clause(self._page, self._per_page)
+        self._model = cast(SqlaTable, self._model)
+        datasource: DatasourceDict = {
+            "type": self._model.type,
+            "id": self._model.id,
+        }
 
-        qc_instance = QueryContextFactory().create(
-            datasource={
-                "type": self._model.type,
-                "id": self._model.id,
-            },
-            queries=[{}],
+        # constructing samples query
+        samples_instance = QueryContextFactory().create(
+            datasource=datasource,
+            queries=[
+                {**self._payload, **limit_clause} if self._payload else limit_clause
+            ],
             result_type=ChartDataResultType.SAMPLES,
             force=self._force,
         )
-        results = qc_instance.get_payload()
+
+        # constructing count(*) query
+        count_star_payload = {
+            "metrics": [
+                {
+                    "expressionType": "SQL",
+                    "sqlExpression": "COUNT(*)",
+                    "label": "COUNT(*)",
+                }
+            ]
+        }
+        count_star_instance = QueryContextFactory().create(
+            datasource=datasource,
+            queries=[count_star_payload],
+            result_type=ChartDataResultType.FULL,
+            force=self._force,
+        )
+        samples_results = samples_instance.get_payload()
+        count_star_results = count_star_instance.get_payload()
+
         try:
-            sample_data = results["queries"][0]
-            error_msg = sample_data.get("error")
-            if sample_data.get("status") == QueryStatus.FAILED and error_msg:
+            sample_data = samples_results["queries"][0]
+            count_star_data = count_star_results["queries"][0]
+            failed_status = (
+                sample_data.get("status") == QueryStatus.FAILED
+                or count_star_data.get("status") == QueryStatus.FAILED
+            )
+            error_msg = sample_data.get("error") or count_star_data.get("error")
+            if failed_status and error_msg:
                 cache_key = sample_data.get("cache_key")
                 QueryCacheManager.delete(cache_key, region=CacheRegion.DATA)
                 raise DatasetSamplesFailedError(error_msg)
+            sample_data["dataset_count_star"] = count_star_data["data"][0]["COUNT(*)"]

Review Comment:
   the response of the `samples` endpoint added a new field that represents the size of data in a dataset.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [PR] feat: the samples endpoint supports filters and pagination [superset]

Posted by "Smallhi (via GitHub)" <gi...@apache.org>.
Smallhi commented on PR #20683:
URL: https://github.com/apache/superset/pull/20683#issuecomment-1848950518

   Does the superset-fronted support sample filters when request sample ? I've tested the v3.1, it did't work. For hive, presto, this may cause all partitions scan.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on a diff in pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on code in PR #20683:
URL: https://github.com/apache/superset/pull/20683#discussion_r924319869


##########
superset/datasets/commands/samples.py:
##########
@@ -78,3 +116,24 @@ def validate(self) -> None:
             security_manager.raise_for_ownership(self._model)
         except SupersetSecurityException as ex:
             raise DatasetForbiddenError() from ex
+
+    @staticmethod
+    def get_limit_clause(
+        page: Optional[int], per_page: Optional[int]
+    ) -> Dict[str, int]:
+        samples_row_limit = app.config.get("SAMPLES_ROW_LIMIT", 1000)

Review Comment:
   we are unable to ignore the limit value in any detailed query(e.g. `select * from tbl`) because the unlimited query is a heavy load for some db.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codyml commented on a diff in pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
codyml commented on code in PR #20683:
URL: https://github.com/apache/superset/pull/20683#discussion_r924585847


##########
superset/datasets/commands/samples.py:
##########
@@ -78,3 +116,24 @@ def validate(self) -> None:
             security_manager.raise_for_ownership(self._model)
         except SupersetSecurityException as ex:
             raise DatasetForbiddenError() from ex
+
+    @staticmethod
+    def get_limit_clause(
+        page: Optional[int], per_page: Optional[int]
+    ) -> Dict[str, int]:
+        samples_row_limit = app.config.get("SAMPLES_ROW_LIMIT", 1000)

Review Comment:
   Ok, understood.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie merged pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
zhaoyongjie merged PR #20683:
URL: https://github.com/apache/superset/pull/20683


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] michael-s-molina commented on a diff in pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
michael-s-molina commented on code in PR #20683:
URL: https://github.com/apache/superset/pull/20683#discussion_r922232391


##########
superset/datasets/api.py:
##########
@@ -810,7 +822,15 @@ def samples(self, pk: int) -> Response:
         """
         try:
             force = parse_boolean_string(request.args.get("force"))
-            rv = SamplesDatasetCommand(pk, force).run()
+            page = request.args.get("page")
+            per_page = request.args.get("per_page")
+            rv = SamplesDatasetCommand(
+                pk,
+                force,

Review Comment:
   ```suggestion
                   model_id=pk,
                   force=force,
   ```



##########
superset/datasets/commands/samples.py:
##########
@@ -30,40 +29,79 @@
     DatasetSamplesFailedError,
 )
 from superset.datasets.dao import DatasetDAO
+from superset.datasets.schemas import DatasetSamplesQuerySchema
 from superset.exceptions import SupersetSecurityException
-from superset.utils.core import QueryStatus
-
-logger = logging.getLogger(__name__)
+from superset.utils.core import DatasourceDict, QueryStatus
 
 
 class SamplesDatasetCommand(BaseCommand):
-    def __init__(self, model_id: int, force: bool):
+    def __init__(
+        self,
+        model_id: int,
+        force: bool,
+        *,

Review Comment:
   ```suggestion
   ```



##########
superset/datasets/commands/samples.py:
##########
@@ -78,3 +116,24 @@ def validate(self) -> None:
             security_manager.raise_for_ownership(self._model)
         except SupersetSecurityException as ex:
             raise DatasetForbiddenError() from ex
+
+    @staticmethod
+    def get_limit_clause(
+        page: Optional[int], per_page: Optional[int]
+    ) -> Dict[str, int]:
+        samples_row_limit = app.config.get("SAMPLES_ROW_LIMIT", 1000)
+        limit = samples_row_limit
+        offset = 0
+
+        if isinstance(page, int) and isinstance(per_page, int):

Review Comment:
   `isinstance` fails because `page` and `per_page` are strings in the URL, so they are being ignored.



##########
superset/datasets/commands/samples.py:
##########
@@ -30,40 +29,79 @@
     DatasetSamplesFailedError,
 )
 from superset.datasets.dao import DatasetDAO
+from superset.datasets.schemas import DatasetSamplesQuerySchema
 from superset.exceptions import SupersetSecurityException
-from superset.utils.core import QueryStatus
-
-logger = logging.getLogger(__name__)
+from superset.utils.core import DatasourceDict, QueryStatus
 
 
 class SamplesDatasetCommand(BaseCommand):
-    def __init__(self, model_id: int, force: bool):
+    def __init__(
+        self,
+        model_id: int,
+        force: bool,
+        *,
+        payload: Optional[DatasetSamplesQuerySchema] = None,
+        page: Optional[int] = None,
+        per_page: Optional[int] = None,
+    ):
         self._model_id = model_id
         self._force = force
         self._model: Optional[SqlaTable] = None
+        self._payload = payload
+        self._page = page
+        self._per_page = per_page
 
     def run(self) -> Dict[str, Any]:
         self.validate()
-        if not self._model:
-            raise DatasetNotFoundError()
+        limit_clause = self.get_limit_clause(self._page, self._per_page)
+        self._model = cast(SqlaTable, self._model)
+        datasource: DatasourceDict = {
+            "type": self._model.type,
+            "id": self._model.id,
+        }
 
-        qc_instance = QueryContextFactory().create(
-            datasource={
-                "type": self._model.type,
-                "id": self._model.id,
-            },
-            queries=[{}],
+        # constructing samples query
+        samples_instance = QueryContextFactory().create(
+            datasource=datasource,
+            queries=[
+                {**self._payload, **limit_clause} if self._payload else limit_clause
+            ],
             result_type=ChartDataResultType.SAMPLES,
             force=self._force,
         )
-        results = qc_instance.get_payload()
+
+        # constructing count(*) query
+        count_star_payload = {

Review Comment:
   We should consider the filters for the count as well because the pagination is calculated considering them.
   
   Right now, if I do a query with:
   ```
   {"filters": [{"col": "gender", "op": "==", "val": "boy"}]}
   ```
   I get the total count in `dataset_count_star` without the filters.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on a diff in pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on code in PR #20683:
URL: https://github.com/apache/superset/pull/20683#discussion_r924322059


##########
superset/datasets/commands/samples.py:
##########
@@ -30,40 +29,79 @@
     DatasetSamplesFailedError,
 )
 from superset.datasets.dao import DatasetDAO
+from superset.datasets.schemas import DatasetSamplesQuerySchema
 from superset.exceptions import SupersetSecurityException
-from superset.utils.core import QueryStatus
-
-logger = logging.getLogger(__name__)
+from superset.utils.core import DatasourceDict, QueryStatus
 
 
 class SamplesDatasetCommand(BaseCommand):
-    def __init__(self, model_id: int, force: bool):
+    def __init__(
+        self,
+        model_id: int,
+        force: bool,
+        *,
+        payload: Optional[DatasetSamplesQuerySchema] = None,
+        page: Optional[int] = None,
+        per_page: Optional[int] = None,
+    ):
         self._model_id = model_id
         self._force = force
         self._model: Optional[SqlaTable] = None
+        self._payload = payload
+        self._page = page
+        self._per_page = per_page
 
     def run(self) -> Dict[str, Any]:
         self.validate()
-        if not self._model:
-            raise DatasetNotFoundError()
+        limit_clause = self.get_limit_clause(self._page, self._per_page)
+        self._model = cast(SqlaTable, self._model)
+        datasource: DatasourceDict = {
+            "type": self._model.type,
+            "id": self._model.id,
+        }
 
-        qc_instance = QueryContextFactory().create(
-            datasource={
-                "type": self._model.type,
-                "id": self._model.id,
-            },
-            queries=[{}],
+        # constructing samples query
+        samples_instance = QueryContextFactory().create(
+            datasource=datasource,
+            queries=[
+                {**self._payload, **limit_clause} if self._payload else limit_clause
+            ],
             result_type=ChartDataResultType.SAMPLES,
             force=self._force,
         )
-        results = qc_instance.get_payload()
+
+        # constructing count(*) query
+        count_star_payload = {

Review Comment:
   done. Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on a diff in pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on code in PR #20683:
URL: https://github.com/apache/superset/pull/20683#discussion_r924319869


##########
superset/datasets/commands/samples.py:
##########
@@ -78,3 +116,24 @@ def validate(self) -> None:
             security_manager.raise_for_ownership(self._model)
         except SupersetSecurityException as ex:
             raise DatasetForbiddenError() from ex
+
+    @staticmethod
+    def get_limit_clause(
+        page: Optional[int], per_page: Optional[int]
+    ) -> Dict[str, int]:
+        samples_row_limit = app.config.get("SAMPLES_ROW_LIMIT", 1000)

Review Comment:
   we cannot ignore the limit value in any detailed query(e.g. `select * from tbl`) because the unlimited query is a heavy load for some db.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on a diff in pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on code in PR #20683:
URL: https://github.com/apache/superset/pull/20683#discussion_r924573586


##########
superset/datasets/commands/samples.py:
##########
@@ -78,3 +116,24 @@ def validate(self) -> None:
             security_manager.raise_for_ownership(self._model)
         except SupersetSecurityException as ex:
             raise DatasetForbiddenError() from ex
+
+    @staticmethod
+    def get_limit_clause(
+        page: Optional[int], per_page: Optional[int]
+    ) -> Dict[str, int]:
+        samples_row_limit = app.config.get("SAMPLES_ROW_LIMIT", 1000)

Review Comment:
   The purpose of the `SAMPLES_ROW_LIMIT` here is to calculate the **page size** when the user inputs more than `SAMPLES_ROW_LIMIT`, With the correct page size, the page offset can be calculated ---- and not to calculate the offset over the `SAMPLES_ROW_LIMIT`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] michael-s-molina commented on pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
michael-s-molina commented on PR #20683:
URL: https://github.com/apache/superset/pull/20683#issuecomment-1192490643

   > @codyml @michael-s-molina this is an interesting question, because now it is not dataset samples but different types of datasouce samples, so it should not be put under dataset, ----- introduced different datasource from [PR](https://github.com/apache/superset/pull/20281) ---- That's why I moved it to datasource.
   
   Thank you for the explanation @zhaoyongjie. It seems we're missing this new concept in v1 as you said.
   
   About deleting the old endpoints. I did something slightly different for the old `/superset/explore` endpoint in `views/core`. I added a [warning indicating that the endpoint will be deprecated in 3.0 and redirected the user to the new endpoint](https://github.com/apache/superset/blob/39545352d29655e8b1c44987ad128c33d1b6eac5/superset/views/core.py#L750). This is debatable because I'm not sure we're considering changes to endpoints that are not under v1 breaking changes. I took the safest approach but I leave the decision to you if you also want to follow a similar pattern or if you don't consider this a breaking change.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on a diff in pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on code in PR #20683:
URL: https://github.com/apache/superset/pull/20683#discussion_r921052780


##########
superset/datasets/commands/samples.py:
##########
@@ -78,3 +116,24 @@ def validate(self) -> None:
             security_manager.raise_for_ownership(self._model)
         except SupersetSecurityException as ex:
             raise DatasetForbiddenError() from ex
+
+    @staticmethod
+    def get_limit_clause(
+        page: Optional[int], per_page: Optional[int]
+    ) -> Dict[str, int]:
+        samples_row_limit = app.config.get("SAMPLES_ROW_LIMIT", 1000)
+        limit = samples_row_limit
+        offset = 0
+
+        if isinstance(page, int) and isinstance(per_page, int):
+            limit = int(per_page)
+            if limit < 0 or limit > samples_row_limit:
+                # reset limit value if input is invalid
+                limit = samples_row_limit

Review Comment:
   Can be easily refactored to raise an exception if needed



##########
superset/datasets/commands/samples.py:
##########
@@ -78,3 +116,24 @@ def validate(self) -> None:
             security_manager.raise_for_ownership(self._model)
         except SupersetSecurityException as ex:
             raise DatasetForbiddenError() from ex
+
+    @staticmethod
+    def get_limit_clause(
+        page: Optional[int], per_page: Optional[int]
+    ) -> Dict[str, int]:
+        samples_row_limit = app.config.get("SAMPLES_ROW_LIMIT", 1000)
+        limit = samples_row_limit
+        offset = 0
+
+        if isinstance(page, int) and isinstance(per_page, int):
+            limit = int(per_page)
+            if limit < 0 or limit > samples_row_limit:
+                # reset limit value if input is invalid
+                limit = samples_row_limit
+
+            offset = (int(page) - 1) * limit
+            if offset < 0:
+                # reset offset value if input is invalid
+                offset = 0

Review Comment:
   same before



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on PR #20683:
URL: https://github.com/apache/superset/pull/20683#issuecomment-1192044714

   > @zhaoyongjie Thank you so much for cleaning the duplicated `samples` endpoint in `datasets/api` and `explore/api`. May I suggest creating the endpoint under `api/v1/dataset/samples` instead of `superset/views`? This would help with the new API migration.
   
   @codyml @michael-s-molina this is an interesting question, because now it is not dataset samples but different types of datasouce samples, so it should not be put under dataset. 
   
   v1 API still has a lot of work to do, for example:
   1.  a lot of error handles(try...except statement) v1 are not well solved
   2. `/api/v1/chart/data` has lots of duplicated samples code  
   
   So the view and v1 api will coexist for a long time. I will address these issues in the new PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] commented on pull request #20683: feat: the samples endpoint supports filters

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on PR #20683:
URL: https://github.com/apache/superset/pull/20683#issuecomment-1181845240

   # [Codecov](https://codecov.io/gh/apache/superset/pull/20683?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#20683](https://codecov.io/gh/apache/superset/pull/20683?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (71536a7) into [master](https://codecov.io/gh/apache/superset/commit/8d4994a89900c2cf636444e4febad61ce3b69d68?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (8d4994a) will **decrease** coverage by `11.65%`.
   > The diff coverage is `53.84%`.
   
   > :exclamation: Current head 71536a7 differs from pull request most recent head 7b479eb. Consider uploading reports for the commit 7b479eb to get more accurate results
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #20683       +/-   ##
   ===========================================
   - Coverage   66.85%   55.20%   -11.66%     
   ===========================================
     Files        1753     1753               
     Lines       65825    65840       +15     
     Branches     7006     7006               
   ===========================================
   - Hits        44010    36346     -7664     
   - Misses      20030    27709     +7679     
     Partials     1785     1785               
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `53.87% <50.00%> (-0.01%)` | :arrow_down: |
   | mysql | `?` | |
   | postgres | `?` | |
   | presto | `53.73% <50.00%> (-0.01%)` | :arrow_down: |
   | python | `58.71% <50.00%> (-24.28%)` | :arrow_down: |
   | sqlite | `?` | |
   | unit | `50.95% <50.00%> (-0.01%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/20683?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [superset/datasets/api.py](https://codecov.io/gh/apache/superset/pull/20683/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGF0YXNldHMvYXBpLnB5) | `48.98% <40.00%> (-39.54%)` | :arrow_down: |
   | [superset/datasets/commands/samples.py](https://codecov.io/gh/apache/superset/pull/20683/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGF0YXNldHMvY29tbWFuZHMvc2FtcGxlcy5weQ==) | `38.29% <45.45%> (-50.08%)` | :arrow_down: |
   | [superset/datasets/schemas.py](https://codecov.io/gh/apache/superset/pull/20683/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGF0YXNldHMvc2NoZW1hcy5weQ==) | `95.56% <62.50%> (-1.77%)` | :arrow_down: |
   | [...erset-frontend/src/components/Chart/chartAction.js](https://codecov.io/gh/apache/superset/pull/20683/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQtZnJvbnRlbmQvc3JjL2NvbXBvbmVudHMvQ2hhcnQvY2hhcnRBY3Rpb24uanM=) | `55.92% <100.00%> (ø)` | |
   | [superset/utils/dashboard\_import\_export.py](https://codecov.io/gh/apache/superset/pull/20683/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvdXRpbHMvZGFzaGJvYXJkX2ltcG9ydF9leHBvcnQucHk=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [superset/key\_value/commands/upsert.py](https://codecov.io/gh/apache/superset/pull/20683/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL3Vwc2VydC5weQ==) | `0.00% <0.00%> (-89.14%)` | :arrow_down: |
   | [superset/key\_value/commands/update.py](https://codecov.io/gh/apache/superset/pull/20683/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL3VwZGF0ZS5weQ==) | `0.00% <0.00%> (-88.89%)` | :arrow_down: |
   | [superset/key\_value/commands/delete.py](https://codecov.io/gh/apache/superset/pull/20683/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL2RlbGV0ZS5weQ==) | `0.00% <0.00%> (-85.30%)` | :arrow_down: |
   | [superset/key\_value/commands/delete\_expired.py](https://codecov.io/gh/apache/superset/pull/20683/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL2RlbGV0ZV9leHBpcmVkLnB5) | `0.00% <0.00%> (-80.77%)` | :arrow_down: |
   | [superset/dashboards/commands/importers/v0.py](https://codecov.io/gh/apache/superset/pull/20683/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGFzaGJvYXJkcy9jb21tYW5kcy9pbXBvcnRlcnMvdjAucHk=) | `15.62% <0.00%> (-76.25%)` | :arrow_down: |
   | ... and [279 more](https://codecov.io/gh/apache/superset/pull/20683/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/20683?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/20683?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [8d4994a...7b479eb](https://codecov.io/gh/apache/superset/pull/20683?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on a diff in pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on code in PR #20683:
URL: https://github.com/apache/superset/pull/20683#discussion_r924321799


##########
superset/datasets/commands/samples.py:
##########
@@ -78,3 +116,24 @@ def validate(self) -> None:
             security_manager.raise_for_ownership(self._model)
         except SupersetSecurityException as ex:
             raise DatasetForbiddenError() from ex
+
+    @staticmethod
+    def get_limit_clause(
+        page: Optional[int], per_page: Optional[int]
+    ) -> Dict[str, int]:
+        samples_row_limit = app.config.get("SAMPLES_ROW_LIMIT", 1000)
+        limit = samples_row_limit
+        offset = 0
+
+        if isinstance(page, int) and isinstance(per_page, int):

Review Comment:
   good catch! type checker and type cast has moved to the marshmallow load stage.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on a diff in pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on code in PR #20683:
URL: https://github.com/apache/superset/pull/20683#discussion_r924323728


##########
superset/datasets/api.py:
##########
@@ -810,7 +822,15 @@ def samples(self, pk: int) -> Response:
         """
         try:
             force = parse_boolean_string(request.args.get("force"))
-            rv = SamplesDatasetCommand(pk, force).run()
+            page = request.args.get("page")
+            per_page = request.args.get("per_page")
+            rv = SamplesDatasetCommand(
+                pk,
+                force,

Review Comment:
   In order to support different datasource type, this interface has been redesigned.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] geido commented on a diff in pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
geido commented on code in PR #20683:
URL: https://github.com/apache/superset/pull/20683#discussion_r925518497


##########
tests/integration_tests/datasource_tests.py:
##########
@@ -414,3 +416,157 @@ def test_get_datasource_invalid_datasource_failed(self):
         self.login(username="admin")
         resp = self.get_json_resp("/datasource/get/druid/500000/", raise_on_error=False)
         self.assertEqual(resp.get("error"), "'druid' is not a valid DatasourceType")
+
+    def test_get_dataset_samples(self):
+        """
+        Dataset API: Test get dataset samples
+        """
+        if backend() == "sqlite":
+            return
+
+        dataset = SqlaTable(
+            table_name="virtual_dataset",
+            sql=("SELECT 'foo' as foo, 'bar' as bar"),
+            database=get_example_database(),
+        )
+        TableColumn(column_name="foo", type="VARCHAR(255)", table=dataset)
+        TableColumn(column_name="bar", type="VARCHAR(255)", table=dataset)
+        SqlMetric(metric_name="count", expression="count(*)", table=dataset)
+
+        self.login(username="admin")
+        uri = f"datasource/samples?datasource_id={dataset.id}&datasource_type=table"
+
+        # 1. should cache data
+        # feeds data
+        self.client.post(uri)
+        # get from cache
+        rv = self.client.post(uri)
+        rv_data = json.loads(rv.data)
+        assert rv.status_code == 200
+        assert "result" in rv_data
+        assert rv_data["result"]["cached_dttm"] is not None
+        cache_key1 = rv_data["result"]["cache_key"]
+        assert QueryCacheManager.has(cache_key1, region=CacheRegion.DATA)
+
+        # 2. should through cache

Review Comment:
   Typo here



##########
superset/views/datasource/utils.py:
##########
@@ -0,0 +1,113 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from typing import Any, Dict, Optional
+
+from superset import app, db
+from superset.common.chart_data import ChartDataResultType
+from superset.common.query_context_factory import QueryContextFactory
+from superset.common.utils.query_cache_manager import QueryCacheManager
+from superset.constants import CacheRegion
+from superset.datasets.commands.exceptions import DatasetSamplesFailedError
+from superset.datasource.dao import DatasourceDAO
+from superset.utils.core import QueryStatus
+from superset.views.datasource.schemas import SamplesPayloadSchema
+
+
+def get_limit_clause(page: Optional[int], per_page: Optional[int]) -> Dict[str, int]:
+    samples_row_limit = app.config.get("SAMPLES_ROW_LIMIT", 1000)
+    limit = samples_row_limit
+    offset = 0
+
+    if isinstance(page, int) and isinstance(per_page, int):
+        limit = int(per_page)
+        if limit < 0 or limit > samples_row_limit:
+            # reset limit value if input is invalid
+            limit = samples_row_limit
+
+        offset = max((int(page) - 1) * limit, 0)
+
+    return {"row_offset": offset, "row_limit": limit}
+
+
+def get_samples(  # pylint: disable=too-many-arguments,too-many-locals
+    datasource_type: str,
+    datasource_id: int,
+    force: bool = False,
+    page: int = 1,
+    per_page: int = 1000,

Review Comment:
   Shouldn't use the `SAMPLES_ROW_LIMIT` here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codyml commented on a diff in pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
codyml commented on code in PR #20683:
URL: https://github.com/apache/superset/pull/20683#discussion_r922527182


##########
superset/datasets/commands/samples.py:
##########
@@ -78,3 +116,24 @@ def validate(self) -> None:
             security_manager.raise_for_ownership(self._model)
         except SupersetSecurityException as ex:
             raise DatasetForbiddenError() from ex
+
+    @staticmethod
+    def get_limit_clause(
+        page: Optional[int], per_page: Optional[int]
+    ) -> Dict[str, int]:
+        samples_row_limit = app.config.get("SAMPLES_ROW_LIMIT", 1000)

Review Comment:
   Isn't the global samples row limit already applied at [query_object_factory.py#L64](https://github.com/apache/superset/blob/d1861c7d3c95e72336466cde17fee9ae1be9155a/superset/common/query_object_factory.py#L64)?  In either case, wouldn't we want to ignore this limit for this purpose, in case they've set it to something really low?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codyml commented on a diff in pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
codyml commented on code in PR #20683:
URL: https://github.com/apache/superset/pull/20683#discussion_r924535326


##########
superset/datasets/commands/samples.py:
##########
@@ -78,3 +116,24 @@ def validate(self) -> None:
             security_manager.raise_for_ownership(self._model)
         except SupersetSecurityException as ex:
             raise DatasetForbiddenError() from ex
+
+    @staticmethod
+    def get_limit_clause(
+        page: Optional[int], per_page: Optional[int]
+    ) -> Dict[str, int]:
+        samples_row_limit = app.config.get("SAMPLES_ROW_LIMIT", 1000)

Review Comment:
   I meant that if the user has configured `SAMPLES_ROW_LIMIT` to be something really low, won't this not return full sets of results?  The 1000 limit fallback makes sense in case the API consumer requests crazy high page sizes but I'm worried that if the user has set `SAMPLES_ROW_LIMIT` to be something like 5, this will unexpectedly return pages of 5 results.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codyml commented on pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
codyml commented on PR #20683:
URL: https://github.com/apache/superset/pull/20683#issuecomment-1191728157

   @zhaoyongjie This looks good to me!  One question, are we not considering this part of the public API, and that's why it's not under the `/v1` prefix?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on a diff in pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on code in PR #20683:
URL: https://github.com/apache/superset/pull/20683#discussion_r926445422


##########
tests/integration_tests/datasource_tests.py:
##########
@@ -414,3 +416,157 @@ def test_get_datasource_invalid_datasource_failed(self):
         self.login(username="admin")
         resp = self.get_json_resp("/datasource/get/druid/500000/", raise_on_error=False)
         self.assertEqual(resp.get("error"), "'druid' is not a valid DatasourceType")
+
+    def test_get_dataset_samples(self):
+        """
+        Dataset API: Test get dataset samples
+        """
+        if backend() == "sqlite":
+            return
+
+        dataset = SqlaTable(
+            table_name="virtual_dataset",
+            sql=("SELECT 'foo' as foo, 'bar' as bar"),
+            database=get_example_database(),
+        )
+        TableColumn(column_name="foo", type="VARCHAR(255)", table=dataset)
+        TableColumn(column_name="bar", type="VARCHAR(255)", table=dataset)
+        SqlMetric(metric_name="count", expression="count(*)", table=dataset)
+
+        self.login(username="admin")
+        uri = f"datasource/samples?datasource_id={dataset.id}&datasource_type=table"
+
+        # 1. should cache data
+        # feeds data
+        self.client.post(uri)
+        # get from cache
+        rv = self.client.post(uri)
+        rv_data = json.loads(rv.data)
+        assert rv.status_code == 200
+        assert "result" in rv_data
+        assert rv_data["result"]["cached_dttm"] is not None
+        cache_key1 = rv_data["result"]["cache_key"]
+        assert QueryCacheManager.has(cache_key1, region=CacheRegion.DATA)
+
+        # 2. should through cache

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] michael-s-molina commented on pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
michael-s-molina commented on PR #20683:
URL: https://github.com/apache/superset/pull/20683#issuecomment-1191833562

   @zhaoyongjie Thank you so much for cleaning the duplicated `samples` endpoint in `datasets/api` and `explore/api`. May I suggest creating the endpoint under `api/v1/dataset/samples` instead of `superset/views`? This would help with the new API migration. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codyml commented on a diff in pull request #20683: feat: the samples endpoint supports filters and pagination

Posted by GitBox <gi...@apache.org>.
codyml commented on code in PR #20683:
URL: https://github.com/apache/superset/pull/20683#discussion_r924554979


##########
superset/datasets/commands/samples.py:
##########
@@ -78,3 +116,24 @@ def validate(self) -> None:
             security_manager.raise_for_ownership(self._model)
         except SupersetSecurityException as ex:
             raise DatasetForbiddenError() from ex
+
+    @staticmethod
+    def get_limit_clause(
+        page: Optional[int], per_page: Optional[int]
+    ) -> Dict[str, int]:
+        samples_row_limit = app.config.get("SAMPLES_ROW_LIMIT", 1000)

Review Comment:
   Right, and I was suggesting that we ignore `SAMPLES_ROW_LIMIT` for paginated results so the pages are always the requested size regardless of the `SAMPLES_ROW_LIMIT` setting (unless they request a very large page size), otherwise users might be surprised to find that their requested page size isn't being respected.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org