You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2022/05/24 12:45:36 UTC

[GitHub] [superset] zhaoyongjie opened a new pull request, #20170: feat: add samples endpoint

zhaoyongjie opened a new pull request, #20170:
URL: https://github.com/apache/superset/pull/20170

   ### SUMMARY
   <!--- Describe the change below, including rationale and design decisions -->
   Add a new samples endpoint for getting dataset sampling
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   <!--- Skip this if not applicable -->
   N/A
   
   ### TESTING INSTRUCTIONS
   <!--- Required! What steps can be taken to manually verify the changes? -->
   
   1. run `superset init`
   2. login Superset
   3. open `http://localhost:9000/api/v1/dataset/<dataset_id>/samples` in browser
   4. open `http://localhost:9000/api/v1/dataset/<dataset_id>/samples?force=true` to force query
   
   
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [ ] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in [SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [x] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] commented on pull request #20170: feat: add samples endpoint

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on PR #20170:
URL: https://github.com/apache/superset/pull/20170#issuecomment-1135903067

   # [Codecov](https://codecov.io/gh/apache/superset/pull/20170?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#20170](https://codecov.io/gh/apache/superset/pull/20170?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6ef25bb) into [master](https://codecov.io/gh/apache/superset/commit/0bcc21bc45ac672d82674a325cc7e94a944e2bc3?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (0bcc21b) will **decrease** coverage by `12.07%`.
   > The diff coverage is `44.64%`.
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #20170       +/-   ##
   ===========================================
   - Coverage   66.46%   54.38%   -12.08%     
   ===========================================
     Files        1721     1722        +1     
     Lines       64512    64578       +66     
     Branches     6806     6806               
   ===========================================
   - Hits        42875    35118     -7757     
   - Misses      19905    27728     +7823     
     Partials     1732     1732               
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `?` | |
   | mysql | `?` | |
   | postgres | `?` | |
   | presto | `53.56% <44.64%> (-0.02%)` | :arrow_down: |
   | python | `57.62% <44.64%> (-25.01%)` | :arrow_down: |
   | sqlite | `?` | |
   | unit | `49.48% <44.64%> (+0.04%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/20170?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [superset/datasets/api.py](https://codecov.io/gh/apache/superset/pull/20170/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGF0YXNldHMvYXBpLnB5) | `49.37% <36.84%> (-38.92%)` | :arrow_down: |
   | [superset/datasets/commands/samples.py](https://codecov.io/gh/apache/superset/pull/20170/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGF0YXNldHMvY29tbWFuZHMvc2FtcGxlcy5weQ==) | `45.71% <45.71%> (ø)` | |
   | [superset/datasets/commands/exceptions.py](https://codecov.io/gh/apache/superset/pull/20170/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGF0YXNldHMvY29tbWFuZHMvZXhjZXB0aW9ucy5weQ==) | `80.32% <100.00%> (-12.90%)` | :arrow_down: |
   | [superset/utils/dashboard\_import\_export.py](https://codecov.io/gh/apache/superset/pull/20170/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvdXRpbHMvZGFzaGJvYXJkX2ltcG9ydF9leHBvcnQucHk=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [superset/key\_value/commands/upsert.py](https://codecov.io/gh/apache/superset/pull/20170/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL3Vwc2VydC5weQ==) | `0.00% <0.00%> (-89.59%)` | :arrow_down: |
   | [superset/key\_value/commands/update.py](https://codecov.io/gh/apache/superset/pull/20170/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL3VwZGF0ZS5weQ==) | `0.00% <0.00%> (-89.37%)` | :arrow_down: |
   | [superset/key\_value/commands/delete.py](https://codecov.io/gh/apache/superset/pull/20170/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL2RlbGV0ZS5weQ==) | `0.00% <0.00%> (-85.30%)` | :arrow_down: |
   | [superset/db\_engines/hive.py](https://codecov.io/gh/apache/superset/pull/20170/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGJfZW5naW5lcy9oaXZlLnB5) | `0.00% <0.00%> (-85.19%)` | :arrow_down: |
   | [superset/key\_value/commands/delete\_expired.py](https://codecov.io/gh/apache/superset/pull/20170/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL2RlbGV0ZV9leHBpcmVkLnB5) | `0.00% <0.00%> (-80.77%)` | :arrow_down: |
   | [superset/dashboards/commands/importers/v0.py](https://codecov.io/gh/apache/superset/pull/20170/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGFzaGJvYXJkcy9jb21tYW5kcy9pbXBvcnRlcnMvdjAucHk=) | `15.62% <0.00%> (-76.25%)` | :arrow_down: |
   | ... and [272 more](https://codecov.io/gh/apache/superset/pull/20170/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/20170?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/20170?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [0bcc21b...6ef25bb](https://codecov.io/gh/apache/superset/pull/20170?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on a diff in pull request #20170: feat: add samples endpoint

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on code in PR #20170:
URL: https://github.com/apache/superset/pull/20170#discussion_r881334929


##########
tests/integration_tests/datasets/api_tests.py:
##########
@@ -1842,3 +1842,30 @@ def test_get_datasets_is_certified_filter(self):
 
         db.session.delete(table_w_certification)
         db.session.commit()
+
+    @pytest.mark.usefixtures("create_datasets")
+    def test_get_dataset_samples(self):
+        """
+        Dataset API: Test get dataset samples
+        """
+        dataset = self.get_fixture_datasets()[0]
+
+        self.login(username="admin")
+        uri = f"api/v1/dataset/{dataset.id}/samples"
+        # feeds data
+        self.client.get(uri)
+        # get from cache
+        rv = self.client.get(uri)
+        rv_data = json.loads(rv.data)
+        assert rv.status_code == 200
+        assert "samples" in rv_data
+        assert rv_data["samples"]["cached_dttm"] is not None

Review Comment:
   It seems that have not a standardized dataset for the data integrality test now. I will add a assert to test data field.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on pull request #20170: feat: add samples endpoint

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on PR #20170:
URL: https://github.com/apache/superset/pull/20170#issuecomment-1135898852

   > Looks great @zhaoyongjie ! Can we add a few simple tests?
   
   Sure, I will do.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on a diff in pull request #20170: feat: add samples endpoint

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on code in PR #20170:
URL: https://github.com/apache/superset/pull/20170#discussion_r881400080


##########
tests/integration_tests/datasets/api_tests.py:
##########
@@ -1842,3 +1842,30 @@ def test_get_datasets_is_certified_filter(self):
 
         db.session.delete(table_w_certification)
         db.session.commit()
+
+    @pytest.mark.usefixtures("create_datasets")
+    def test_get_dataset_samples(self):
+        """
+        Dataset API: Test get dataset samples
+        """
+        dataset = self.get_fixture_datasets()[0]
+
+        self.login(username="admin")
+        uri = f"api/v1/dataset/{dataset.id}/samples"
+        # feeds data
+        self.client.get(uri)
+        # get from cache
+        rv = self.client.get(uri)
+        rv_data = json.loads(rv.data)
+        assert rv.status_code == 200
+        assert "samples" in rv_data
+        assert rv_data["samples"]["cached_dttm"] is not None

Review Comment:
   done, I have added a test case for this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] villebro commented on pull request #20170: feat: add samples endpoint

Posted by GitBox <gi...@apache.org>.
villebro commented on PR #20170:
URL: https://github.com/apache/superset/pull/20170#issuecomment-1135890205

   Looks great @zhaoyongjie ! Can we add a few simple tests?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on a diff in pull request #20170: feat: add samples endpoint

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on code in PR #20170:
URL: https://github.com/apache/superset/pull/20170#discussion_r881400080


##########
tests/integration_tests/datasets/api_tests.py:
##########
@@ -1842,3 +1842,30 @@ def test_get_datasets_is_certified_filter(self):
 
         db.session.delete(table_w_certification)
         db.session.commit()
+
+    @pytest.mark.usefixtures("create_datasets")
+    def test_get_dataset_samples(self):
+        """
+        Dataset API: Test get dataset samples
+        """
+        dataset = self.get_fixture_datasets()[0]
+
+        self.login(username="admin")
+        uri = f"api/v1/dataset/{dataset.id}/samples"
+        # feeds data
+        self.client.get(uri)
+        # get from cache
+        rv = self.client.get(uri)
+        rv_data = json.loads(rv.data)
+        assert rv.status_code == 200
+        assert "samples" in rv_data
+        assert rv_data["samples"]["cached_dttm"] is not None

Review Comment:
   done, I have a test case for this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] villebro commented on a diff in pull request #20170: feat: add samples endpoint

Posted by GitBox <gi...@apache.org>.
villebro commented on code in PR #20170:
URL: https://github.com/apache/superset/pull/20170#discussion_r881353485


##########
tests/integration_tests/datasets/api_tests.py:
##########
@@ -1842,3 +1842,30 @@ def test_get_datasets_is_certified_filter(self):
 
         db.session.delete(table_w_certification)
         db.session.commit()
+
+    @pytest.mark.usefixtures("create_datasets")
+    def test_get_dataset_samples(self):
+        """
+        Dataset API: Test get dataset samples
+        """
+        dataset = self.get_fixture_datasets()[0]
+
+        self.login(username="admin")
+        uri = f"api/v1/dataset/{dataset.id}/samples"
+        # feeds data
+        self.client.get(uri)
+        # get from cache
+        rv = self.client.get(uri)
+        rv_data = json.loads(rv.data)
+        assert rv.status_code == 200
+        assert "samples" in rv_data
+        assert rv_data["samples"]["cached_dttm"] is not None

Review Comment:
   I think a good enough test for now could be just asserting that the first row in a dataset that's contained in the `create_datasets` fixture contains the correct columns.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] villebro commented on a diff in pull request #20170: feat: add samples endpoint

Posted by GitBox <gi...@apache.org>.
villebro commented on code in PR #20170:
URL: https://github.com/apache/superset/pull/20170#discussion_r881237705


##########
superset/datasets/commands/exceptions.py:
##########
@@ -173,6 +173,10 @@ class DatasetRefreshFailedError(UpdateFailedError):
     message = _("Dataset could not be updated.")
 
 
+class DatasetSamplesFailedError(CommandInvalidError):
+    message = _("Dataset could not be sampled.")

Review Comment:
   nit/suggestion - this might be more explicit:
   ```suggestion
   class DatasetSamplesFailedError(CommandInvalidError):
       message = _("Samples for dataset could not be retrieved.")
   ```



##########
tests/integration_tests/datasets/api_tests.py:
##########
@@ -1842,3 +1842,30 @@ def test_get_datasets_is_certified_filter(self):
 
         db.session.delete(table_w_certification)
         db.session.commit()
+
+    @pytest.mark.usefixtures("create_datasets")
+    def test_get_dataset_samples(self):
+        """
+        Dataset API: Test get dataset samples
+        """
+        dataset = self.get_fixture_datasets()[0]
+
+        self.login(username="admin")
+        uri = f"api/v1/dataset/{dataset.id}/samples"
+        # feeds data
+        self.client.get(uri)
+        # get from cache
+        rv = self.client.get(uri)
+        rv_data = json.loads(rv.data)
+        assert rv.status_code == 200
+        assert "samples" in rv_data
+        assert rv_data["samples"]["cached_dttm"] is not None

Review Comment:
   Thought: I think it would be a good idea to also assert that the response contains actual rows from the table. But looking at the chart data endpoint integration tests they don't seem to test for actual rows, so maybe we should consider making those tests more comprehensive in a follow-up PR (or alternatively adding more comprehensive tests here if we do decide do deprecate sampling support from the chart data endpoint).



##########
superset/datasets/api.py:
##########
@@ -760,3 +763,65 @@ def import_(self) -> Response:
         )
         command.run()
         return self.response(200, message="OK")
+
+    @expose("/<pk>/samples")
+    @protect()
+    @safe
+    @statsd_metrics
+    @event_logger.log_this_with_context(
+        action=lambda self, *args, **kwargs: f"{self.__class__.__name__}.samples",
+        log_to_statsd=False,
+    )
+    def samples(self, pk: int) -> Response:
+        """get samples from a Dataset
+        ---
+        get:
+          description: >-
+            get samples from a Dataset
+          parameters:
+          - in: path
+            schema:
+              type: integer
+            name: pk
+          - in: query
+            schema:
+              type: boolean
+            name: force
+          responses:
+            200:
+              description: Dataset samples
+              content:
+                application/json:
+                  schema:
+                    type: object
+                    properties:
+                      samples:
+                        description: dataset samples
+                        type: object

Review Comment:
   I think it's usually the convention to return a `result` similarly to here: https://github.com/apache/superset/blob/7e9b85f76ca8cae38c38e11f857634216b1cd71c/superset/dashboards/api.py#L314-L315



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie merged pull request #20170: feat: add samples endpoint

Posted by GitBox <gi...@apache.org>.
zhaoyongjie merged PR #20170:
URL: https://github.com/apache/superset/pull/20170


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org