You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/13 01:45:13 UTC

[GitHub] HeartSaVioR commented on a change in pull request #23747: [SPARK-26848][SQL] Introduce new option to Kafka source: offset by timestamp (starting/ending)

HeartSaVioR commented on a change in pull request #23747: [SPARK-26848][SQL] Introduce new option to Kafka source: offset by timestamp (starting/ending)
URL: https://github.com/apache/spark/pull/23747#discussion_r256219279
 
 

 ##########
 File path: docs/structured-streaming-kafka-integration.md
 ##########
 @@ -310,6 +310,23 @@ The following configurations are optional:
 
 <table class="table">
 <tr><th>Option</th><th>value</th><th>default</th><th>query type</th><th>meaning</th></tr>
+<tr>
 
 Review comment:
   For 1), given that this is a Spark document I would say it's better to not explain Kafka stuff deeply. Kafka should have specific documentation regarding this (even KIP) and end users should rely on that to understand how it works. 
   
   We could refer the doc if we find any (I'll try to find one hopefully not KIP), but even there's no doc, we could mention this feature leverage `KafkaConsumer.offsetsForTimes ` and refer to javadoc on Kafka API.
   
   For 2), I agree it's ideal to mention timestamp offset here is related to when message is ingested, but I also feel it may be a bad signal that they are not understanding either Kafka's `offsetsForTimes` or Spark's processing time vs event time if end users don't know this before mentioned it. Given both have docs for each so might be redundant if we mention this here again. But that's just my 2 cents and if we can explain it as one-liner or two it may not be too redundant.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org