You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2022/05/24 12:45:36 UTC
[GitHub] [superset] zhaoyongjie opened a new pull request, #20170: feat: add samples endpoint
zhaoyongjie opened a new pull request, #20170:
URL: https://github.com/apache/superset/pull/20170
### SUMMARY
<!--- Describe the change below, including rationale and design decisions -->
Add a new samples endpoint for getting dataset sampling
### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
<!--- Skip this if not applicable -->
N/A
### TESTING INSTRUCTIONS
<!--- Required! What steps can be taken to manually verify the changes? -->
1. run `superset init`
2. login Superset
3. open `http://localhost:9000/api/v1/dataset/<dataset_id>/samples` in browser
4. open `http://localhost:9000/api/v1/dataset/<dataset_id>/samples?force=true` to force query
### ADDITIONAL INFORMATION
<!--- Check any relevant boxes with "x" -->
<!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
- [ ] Has associated issue:
- [ ] Required feature flags:
- [ ] Changes UI
- [ ] Includes DB Migration (follow approval process in [SIP-59](https://github.com/apache/superset/issues/13351))
- [ ] Migration is atomic, supports rollback & is backwards-compatible
- [ ] Confirm DB migration upgrade and downgrade tested
- [ ] Runtime estimates and downtime expectations provided
- [x] Introduces new feature or API
- [ ] Removes existing feature or API
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org
[GitHub] [superset] codecov[bot] commented on pull request #20170: feat: add samples endpoint
Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on PR #20170:
URL: https://github.com/apache/superset/pull/20170#issuecomment-1135903067
# [Codecov](https://codecov.io/gh/apache/superset/pull/20170?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#20170](https://codecov.io/gh/apache/superset/pull/20170?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6ef25bb) into [master](https://codecov.io/gh/apache/superset/commit/0bcc21bc45ac672d82674a325cc7e94a944e2bc3?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (0bcc21b) will **decrease** coverage by `12.07%`.
> The diff coverage is `44.64%`.
```diff
@@ Coverage Diff @@
## master #20170 +/- ##
===========================================
- Coverage 66.46% 54.38% -12.08%
===========================================
Files 1721 1722 +1
Lines 64512 64578 +66
Branches 6806 6806
===========================================
- Hits 42875 35118 -7757
- Misses 19905 27728 +7823
Partials 1732 1732
```
| Flag | Coverage Δ | |
|---|---|---|
| hive | `?` | |
| mysql | `?` | |
| postgres | `?` | |
| presto | `53.56% <44.64%> (-0.02%)` | :arrow_down: |
| python | `57.62% <44.64%> (-25.01%)` | :arrow_down: |
| sqlite | `?` | |
| unit | `49.48% <44.64%> (+0.04%)` | :arrow_up: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/superset/pull/20170?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [superset/datasets/api.py](https://codecov.io/gh/apache/superset/pull/20170/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGF0YXNldHMvYXBpLnB5) | `49.37% <36.84%> (-38.92%)` | :arrow_down: |
| [superset/datasets/commands/samples.py](https://codecov.io/gh/apache/superset/pull/20170/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGF0YXNldHMvY29tbWFuZHMvc2FtcGxlcy5weQ==) | `45.71% <45.71%> (ø)` | |
| [superset/datasets/commands/exceptions.py](https://codecov.io/gh/apache/superset/pull/20170/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGF0YXNldHMvY29tbWFuZHMvZXhjZXB0aW9ucy5weQ==) | `80.32% <100.00%> (-12.90%)` | :arrow_down: |
| [superset/utils/dashboard\_import\_export.py](https://codecov.io/gh/apache/superset/pull/20170/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvdXRpbHMvZGFzaGJvYXJkX2ltcG9ydF9leHBvcnQucHk=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [superset/key\_value/commands/upsert.py](https://codecov.io/gh/apache/superset/pull/20170/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL3Vwc2VydC5weQ==) | `0.00% <0.00%> (-89.59%)` | :arrow_down: |
| [superset/key\_value/commands/update.py](https://codecov.io/gh/apache/superset/pull/20170/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL3VwZGF0ZS5weQ==) | `0.00% <0.00%> (-89.37%)` | :arrow_down: |
| [superset/key\_value/commands/delete.py](https://codecov.io/gh/apache/superset/pull/20170/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL2RlbGV0ZS5weQ==) | `0.00% <0.00%> (-85.30%)` | :arrow_down: |
| [superset/db\_engines/hive.py](https://codecov.io/gh/apache/superset/pull/20170/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGJfZW5naW5lcy9oaXZlLnB5) | `0.00% <0.00%> (-85.19%)` | :arrow_down: |
| [superset/key\_value/commands/delete\_expired.py](https://codecov.io/gh/apache/superset/pull/20170/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL2RlbGV0ZV9leHBpcmVkLnB5) | `0.00% <0.00%> (-80.77%)` | :arrow_down: |
| [superset/dashboards/commands/importers/v0.py](https://codecov.io/gh/apache/superset/pull/20170/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGFzaGJvYXJkcy9jb21tYW5kcy9pbXBvcnRlcnMvdjAucHk=) | `15.62% <0.00%> (-76.25%)` | :arrow_down: |
| ... and [272 more](https://codecov.io/gh/apache/superset/pull/20170/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/20170?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/20170?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [0bcc21b...6ef25bb](https://codecov.io/gh/apache/superset/pull/20170?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org
[GitHub] [superset] zhaoyongjie commented on a diff in pull request #20170: feat: add samples endpoint
Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on code in PR #20170:
URL: https://github.com/apache/superset/pull/20170#discussion_r881334929
##########
tests/integration_tests/datasets/api_tests.py:
##########
@@ -1842,3 +1842,30 @@ def test_get_datasets_is_certified_filter(self):
db.session.delete(table_w_certification)
db.session.commit()
+
+ @pytest.mark.usefixtures("create_datasets")
+ def test_get_dataset_samples(self):
+ """
+ Dataset API: Test get dataset samples
+ """
+ dataset = self.get_fixture_datasets()[0]
+
+ self.login(username="admin")
+ uri = f"api/v1/dataset/{dataset.id}/samples"
+ # feeds data
+ self.client.get(uri)
+ # get from cache
+ rv = self.client.get(uri)
+ rv_data = json.loads(rv.data)
+ assert rv.status_code == 200
+ assert "samples" in rv_data
+ assert rv_data["samples"]["cached_dttm"] is not None
Review Comment:
It seems that have not a standardized dataset for the data integrality test now. I will add a assert to test data field.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org
[GitHub] [superset] zhaoyongjie commented on pull request #20170: feat: add samples endpoint
Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on PR #20170:
URL: https://github.com/apache/superset/pull/20170#issuecomment-1135898852
> Looks great @zhaoyongjie ! Can we add a few simple tests?
Sure, I will do.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org
[GitHub] [superset] zhaoyongjie commented on a diff in pull request #20170: feat: add samples endpoint
Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on code in PR #20170:
URL: https://github.com/apache/superset/pull/20170#discussion_r881400080
##########
tests/integration_tests/datasets/api_tests.py:
##########
@@ -1842,3 +1842,30 @@ def test_get_datasets_is_certified_filter(self):
db.session.delete(table_w_certification)
db.session.commit()
+
+ @pytest.mark.usefixtures("create_datasets")
+ def test_get_dataset_samples(self):
+ """
+ Dataset API: Test get dataset samples
+ """
+ dataset = self.get_fixture_datasets()[0]
+
+ self.login(username="admin")
+ uri = f"api/v1/dataset/{dataset.id}/samples"
+ # feeds data
+ self.client.get(uri)
+ # get from cache
+ rv = self.client.get(uri)
+ rv_data = json.loads(rv.data)
+ assert rv.status_code == 200
+ assert "samples" in rv_data
+ assert rv_data["samples"]["cached_dttm"] is not None
Review Comment:
done, I have added a test case for this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org
[GitHub] [superset] villebro commented on pull request #20170: feat: add samples endpoint
Posted by GitBox <gi...@apache.org>.
villebro commented on PR #20170:
URL: https://github.com/apache/superset/pull/20170#issuecomment-1135890205
Looks great @zhaoyongjie ! Can we add a few simple tests?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org
[GitHub] [superset] zhaoyongjie commented on a diff in pull request #20170: feat: add samples endpoint
Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on code in PR #20170:
URL: https://github.com/apache/superset/pull/20170#discussion_r881400080
##########
tests/integration_tests/datasets/api_tests.py:
##########
@@ -1842,3 +1842,30 @@ def test_get_datasets_is_certified_filter(self):
db.session.delete(table_w_certification)
db.session.commit()
+
+ @pytest.mark.usefixtures("create_datasets")
+ def test_get_dataset_samples(self):
+ """
+ Dataset API: Test get dataset samples
+ """
+ dataset = self.get_fixture_datasets()[0]
+
+ self.login(username="admin")
+ uri = f"api/v1/dataset/{dataset.id}/samples"
+ # feeds data
+ self.client.get(uri)
+ # get from cache
+ rv = self.client.get(uri)
+ rv_data = json.loads(rv.data)
+ assert rv.status_code == 200
+ assert "samples" in rv_data
+ assert rv_data["samples"]["cached_dttm"] is not None
Review Comment:
done, I have a test case for this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org
[GitHub] [superset] villebro commented on a diff in pull request #20170: feat: add samples endpoint
Posted by GitBox <gi...@apache.org>.
villebro commented on code in PR #20170:
URL: https://github.com/apache/superset/pull/20170#discussion_r881353485
##########
tests/integration_tests/datasets/api_tests.py:
##########
@@ -1842,3 +1842,30 @@ def test_get_datasets_is_certified_filter(self):
db.session.delete(table_w_certification)
db.session.commit()
+
+ @pytest.mark.usefixtures("create_datasets")
+ def test_get_dataset_samples(self):
+ """
+ Dataset API: Test get dataset samples
+ """
+ dataset = self.get_fixture_datasets()[0]
+
+ self.login(username="admin")
+ uri = f"api/v1/dataset/{dataset.id}/samples"
+ # feeds data
+ self.client.get(uri)
+ # get from cache
+ rv = self.client.get(uri)
+ rv_data = json.loads(rv.data)
+ assert rv.status_code == 200
+ assert "samples" in rv_data
+ assert rv_data["samples"]["cached_dttm"] is not None
Review Comment:
I think a good enough test for now could be just asserting that the first row in a dataset that's contained in the `create_datasets` fixture contains the correct columns.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org
[GitHub] [superset] villebro commented on a diff in pull request #20170: feat: add samples endpoint
Posted by GitBox <gi...@apache.org>.
villebro commented on code in PR #20170:
URL: https://github.com/apache/superset/pull/20170#discussion_r881237705
##########
superset/datasets/commands/exceptions.py:
##########
@@ -173,6 +173,10 @@ class DatasetRefreshFailedError(UpdateFailedError):
message = _("Dataset could not be updated.")
+class DatasetSamplesFailedError(CommandInvalidError):
+ message = _("Dataset could not be sampled.")
Review Comment:
nit/suggestion - this might be more explicit:
```suggestion
class DatasetSamplesFailedError(CommandInvalidError):
message = _("Samples for dataset could not be retrieved.")
```
##########
tests/integration_tests/datasets/api_tests.py:
##########
@@ -1842,3 +1842,30 @@ def test_get_datasets_is_certified_filter(self):
db.session.delete(table_w_certification)
db.session.commit()
+
+ @pytest.mark.usefixtures("create_datasets")
+ def test_get_dataset_samples(self):
+ """
+ Dataset API: Test get dataset samples
+ """
+ dataset = self.get_fixture_datasets()[0]
+
+ self.login(username="admin")
+ uri = f"api/v1/dataset/{dataset.id}/samples"
+ # feeds data
+ self.client.get(uri)
+ # get from cache
+ rv = self.client.get(uri)
+ rv_data = json.loads(rv.data)
+ assert rv.status_code == 200
+ assert "samples" in rv_data
+ assert rv_data["samples"]["cached_dttm"] is not None
Review Comment:
Thought: I think it would be a good idea to also assert that the response contains actual rows from the table. But looking at the chart data endpoint integration tests they don't seem to test for actual rows, so maybe we should consider making those tests more comprehensive in a follow-up PR (or alternatively adding more comprehensive tests here if we do decide do deprecate sampling support from the chart data endpoint).
##########
superset/datasets/api.py:
##########
@@ -760,3 +763,65 @@ def import_(self) -> Response:
)
command.run()
return self.response(200, message="OK")
+
+ @expose("/<pk>/samples")
+ @protect()
+ @safe
+ @statsd_metrics
+ @event_logger.log_this_with_context(
+ action=lambda self, *args, **kwargs: f"{self.__class__.__name__}.samples",
+ log_to_statsd=False,
+ )
+ def samples(self, pk: int) -> Response:
+ """get samples from a Dataset
+ ---
+ get:
+ description: >-
+ get samples from a Dataset
+ parameters:
+ - in: path
+ schema:
+ type: integer
+ name: pk
+ - in: query
+ schema:
+ type: boolean
+ name: force
+ responses:
+ 200:
+ description: Dataset samples
+ content:
+ application/json:
+ schema:
+ type: object
+ properties:
+ samples:
+ description: dataset samples
+ type: object
Review Comment:
I think it's usually the convention to return a `result` similarly to here: https://github.com/apache/superset/blob/7e9b85f76ca8cae38c38e11f857634216b1cd71c/superset/dashboards/api.py#L314-L315
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org
[GitHub] [superset] zhaoyongjie merged pull request #20170: feat: add samples endpoint
Posted by GitBox <gi...@apache.org>.
zhaoyongjie merged PR #20170:
URL: https://github.com/apache/superset/pull/20170
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org