You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@samza.apache.org by Yan Fang <ya...@gmail.com> on 2014/07/09 10:30:02 UTC

Review Request 23358: SAMZA-225

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23358/
-----------------------------------------------------------

Review request for samza.


Repository: samza


Description
-------

Comparison of Spark Streaming and Samza


Diffs
-----

  docs/learn/documentation/0.7.0/comparisons/spark-streaming.md PRE-CREATION 
  docs/learn/documentation/0.7.0/comparisons/storm.md 4a21094 
  docs/learn/documentation/0.7.0/index.html 149ff2b 

Diff: https://reviews.apache.org/r/23358/diff/


Testing
-------


Thanks,

Yan Fang

Re: Review Request 23358: SAMZA-225

Posted by Martin Kleppmann <mk...@linkedin.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23358/#review47736
-----------------------------------------------------------


I started commenting on this RB, but then noticed that the RB doesn't reflect the latest patch. Could you update the RB please? Here are just some comments on the first few paragraphs, I'll go over the rest when it's up-to-date.


docs/learn/documentation/0.7.0/comparisons/spark-streaming.md
<https://reviews.apache.org/r/23358/#comment83927>

    Storm can run in two different modes: with its lower-level API of bolts, it processes messages as they are received, whereas its higher-level Trident API performs batching (somewhat similarly to Spark Streaming). As that subtlety isn't really relevant here, I'd suggest just removing the mention of Storm in this paragraph.
    
    "Discretized stream ... is a continuous sequence": I find this juxtaposition of "discrete" and "continuous" a bit jarring. It's confusing because you're using "continuous" in the sense of "neverending", but "continuous" is sometimes also used as the opposite of "discrete". "continuous" is just a bit ambiguous here.
    
    Perhaps say something like: "Spark Streaming groups the stream into batches of a fixed duration (such as 1 second). Each batch is represented as an [RDD](...) file. A neverending sequence of these RDDs is called a _discretized stream_ ([DStream](...))."



docs/learn/documentation/0.7.0/comparisons/spark-streaming.md
<https://reviews.apache.org/r/23358/#comment83929>

    "let me give a brief overview": we're not using the first person elsewhere in the docs; I'd prefer to keep the tone consistent.
    
    You could probably also break these long paragraphs up into shorter ones, to make them easier to read. Bullet points or numbered lists also help readability. "There are two main parts...:" suggests a good place for a numbered list, for example.



docs/learn/documentation/0.7.0/comparisons/spark-streaming.md
<https://reviews.apache.org/r/23358/#comment83930>

    s/core number/number of cores/


- Martin Kleppmann


On July 11, 2014, 8:37 a.m., Yan Fang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23358/
> -----------------------------------------------------------
> 
> (Updated July 11, 2014, 8:37 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Comparison of Spark Streaming and Samza
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/0.7.0/comparisons/spark-streaming.md PRE-CREATION 
>   docs/learn/documentation/0.7.0/comparisons/storm.md 4a21094 
>   docs/learn/documentation/0.7.0/index.html 149ff2b 
> 
> Diff: https://reviews.apache.org/r/23358/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Yan Fang
> 
>