You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by vi...@apache.org on 2021/08/22 01:22:16 UTC
[spark] branch branch-3.1 updated: [MINOR][SS][DOCS] Update doc for
streaming deduplication
This is an automated email from the ASF dual-hosted git repository.
viirya pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.1 by this push:
new 6fcf029 [MINOR][SS][DOCS] Update doc for streaming deduplication
6fcf029 is described below
commit 6fcf029f78cfba32bb0a7d74a20ea0757df9c33a
Author: Liang-Chi Hsieh <vi...@gmail.com>
AuthorDate: Sat Aug 21 18:20:17 2021 -0700
[MINOR][SS][DOCS] Update doc for streaming deduplication
### What changes were proposed in this pull request?
This patch fixes an error about streaming dedupliaction is Structured Streaming, and also updates an item about unsupported operation.
### Why are the changes needed?
Update the user document.
### Does this PR introduce _any_ user-facing change?
No. It's a doc only change.
### How was this patch tested?
Doc only change.
Closes #33801 from viirya/minor-ss-deduplication.
Authored-by: Liang-Chi Hsieh <vi...@gmail.com>
Signed-off-by: Liang-Chi Hsieh <vi...@gmail.com>
(cherry picked from commit 5876e04de284b8ff84108b80627353870e852a36)
Signed-off-by: Liang-Chi Hsieh <vi...@gmail.com>
---
docs/structured-streaming-programming-guide.md | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md
index 3b93ab8..d88cf91b 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -1671,6 +1671,8 @@ Some of them are as follows.
- Distinct operations on streaming Datasets are not supported.
+- Deduplication operation is not supported after aggregation on a streaming Datasets.
+
- Sorting operations are supported on streaming Datasets only after an aggregation and in Complete Output Mode.
- Few types of outer joins on streaming Datasets are not supported. See the
@@ -3220,7 +3222,7 @@ the effect of the change is not well-defined. For all of them:
- *Streaming aggregation*: For example, `sdf.groupBy("a").agg(...)`. Any change in number or type of grouping keys or aggregates is not allowed.
- - *Streaming deduplication*: For example, `sdf.dropDuplicates("a")`. Any change in number or type of grouping keys or aggregates is not allowed.
+ - *Streaming deduplication*: For example, `sdf.dropDuplicates("a")`. Any change in number or type of deduplicating columns is not allowed.
- *Stream-stream join*: For example, `sdf1.join(sdf2, ...)` (i.e. both inputs are generated with `sparkSession.readStream`). Changes
in the schema or equi-joining columns are not allowed. Changes in join type (outer or inner) are not allowed. Other changes in the join condition are ill-defined.
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org