You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/21 23:33:45 UTC

[GitHub] HeartSaVioR commented on issue #23859: [SPARK-26956][SQL] remove streaming output mode from data source v2 APIs

HeartSaVioR commented on issue #23859: [SPARK-26956][SQL] remove streaming output mode from data source v2 APIs
URL: https://github.com/apache/spark/pull/23859#issuecomment-466212629

Does we have any docs representing the background/discussion around this change? Doesn't sound small change and directly impact to structured streaming so I would like to fully understand it.

> 2. complete mode: call `SupportsTruncate#truncate`. Complete mode means truncating all the old data and appending new data, and `SupportsTruncate` has exactly the same semantic.

Looks like we will have only new data then. Is it correct or it should be fixed as appending all data instead of new data?

> 3. update mode: fail. The current streaming framework can't propagate the update keys, so v2 sinks are not able to implement update mode. In the future we can introduce a `SupportsUpdate` trait.

I guess this means we don't separate keys and values while passing to sink, so sink cannot perform upsert (though target system still can upsert if target system knows about keys and values). So SupportsUpdate will get keys and values separately. Do I understand correctly?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org