You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Hai Lu <lh...@gmail.com> on 2017/01/24 02:07:25 UTC

Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/
-----------------------------------------------------------

(Updated Jan. 24, 2017, 2:07 a.m.)


Review request for samza.


Bugs: SAMZA-1025
    https://issues.apache.org/jira/browse/SAMZA-1025


Repository: samza


Description (updated)
-------

documentation for hdfs system consumer


Diffs (updated)
-----

  docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
  docs/learn/documentation/versioned/hdfs/producer.md b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
  docs/learn/documentation/versioned/index.html d0b14ece94341e2cb937cf32db480e69f93303c2 
  docs/learn/documentation/versioned/jobs/configuration-table.html ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 

Diff: https://reviews.apache.org/r/52570/diff/


Testing
-------

N/A


Thanks,

Hai Lu


Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

Posted by Xinyu Liu <xi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review162971
-----------------------------------------------------------


Fix it, then Ship it!




Looks good. Have a few minor comments about adding javadoc links. Please test the changes locally after finalizing.


docs/learn/documentation/versioned/hdfs/consumer.md (line 22)
<https://reviews.apache.org/r/52570/#comment234345>

    Please link HdfsSystemConsumer to java doc.



docs/learn/documentation/versioned/hdfs/consumer.md (line 32)
<https://reviews.apache.org/r/52570/#comment234346>

    camel case HdfsSystemConsumer? link to Java doc.



docs/learn/documentation/versioned/hdfs/consumer.md (line 34)
<https://reviews.apache.org/r/52570/#comment234347>

    replace 'SystemStreamPartition" with partitions?



docs/learn/documentation/versioned/hdfs/consumer.md (line 40)
<https://reviews.apache.org/r/52570/#comment234348>

    java doc link to HdfsSystemConsumer, IncomingMessageEvelope and GenericRecord.



docs/learn/documentation/versioned/hdfs/consumer.md (line 42)
<https://reviews.apache.org/r/52570/#comment234349>

    please link java doc to SingleFileHdfsReader, instead of putting the file name here.


- Xinyu Liu


On Jan. 24, 2017, 2:07 a.m., Hai Lu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2017, 2:07 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
>     https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> -------
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>


Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

Posted by Hai Lu <lh...@gmail.com>.

> On Jan. 27, 2017, 7:03 a.m., Jagadish Venkatraman wrote:
> > This is looking pretty good. Thank you for the effort in writing the docs!

Thank you for taking the time to review it. Really appreciate it:)


> On Jan. 27, 2017, 7:03 a.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/hdfs/consumer.md, line 30
> > <https://reviews.apache.org/r/52570/diff/4/?file=1617159#file1617159line30>
> >
> >     `to process these partitions` appears twice? Copy-paste error?

Oops, sorry for such a reckless mistake.


- Hai


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163234
-----------------------------------------------------------


On Jan. 27, 2017, 5:48 p.m., Hai Lu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> -----------------------------------------------------------
> 
> (Updated Jan. 27, 2017, 5:48 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
>     https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> -------
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>


Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

Posted by Jagadish Venkatraman <ja...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163234
-----------------------------------------------------------



This is looking pretty good. Thank you for the effort in writing the docs!


docs/learn/documentation/versioned/hdfs/consumer.md (line 30)
<https://reviews.apache.org/r/52570/#comment234687>

    `to process these partitions` appears twice? Copy-paste error?



docs/learn/documentation/versioned/hdfs/consumer.md (line 31)
<https://reviews.apache.org/r/52570/#comment234688>

    s/one single/a single


- Jagadish Venkatraman


On Jan. 27, 2017, 12:03 a.m., Hai Lu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> -----------------------------------------------------------
> 
> (Updated Jan. 27, 2017, 12:03 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
>     https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> -------
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>


Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

Posted by Jagadish Venkatraman <ja...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163545
-----------------------------------------------------------



Thanks Hai for the docs! Committed to master.

- Jagadish Venkatraman


On Jan. 28, 2017, 6:15 a.m., Hai Lu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> -----------------------------------------------------------
> 
> (Updated Jan. 28, 2017, 6:15 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
>     https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> -------
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>


Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

Posted by Hai Lu <lh...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/
-----------------------------------------------------------

(Updated Jan. 28, 2017, 6:15 a.m.)


Review request for samza.


Bugs: SAMZA-1025
    https://issues.apache.org/jira/browse/SAMZA-1025


Repository: samza


Description
-------

documentation for hdfs system consumer


Diffs (updated)
-----

  docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
  docs/learn/documentation/versioned/hdfs/producer.md b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
  docs/learn/documentation/versioned/index.html d0b14ece94341e2cb937cf32db480e69f93303c2 
  docs/learn/documentation/versioned/jobs/configuration-table.html ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 

Diff: https://reviews.apache.org/r/52570/diff/


Testing
-------

N/A


Thanks,

Hai Lu


Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

Posted by Jagadish Venkatraman <ja...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163333
-----------------------------------------------------------


Ship it!




Thanks!


docs/learn/documentation/versioned/hdfs/consumer.md (line 78)
<https://reviews.apache.org/r/52570/#comment234750>

    I meant use a dummy name. not the `dummyname` as string. Re-name it to `your-principal-name`



docs/learn/documentation/versioned/hdfs/consumer.md (line 89)
<https://reviews.apache.org/r/52570/#comment234745>

    nit: the way you want/arbitrarily


- Jagadish Venkatraman


On Jan. 27, 2017, 5:48 p.m., Hai Lu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> -----------------------------------------------------------
> 
> (Updated Jan. 27, 2017, 5:48 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
>     https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> -------
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>


Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

Posted by Navina Ramesh <nr...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163355
-----------------------------------------------------------


Fix it, then Ship it!




Some nits and comments. Otherwise, looks good. Thanks! +1


docs/learn/documentation/versioned/hdfs/consumer.md (line 39)
<https://reviews.apache.org/r/52570/#comment234764>

    This line is confusing. Are you implying that I can read from non-avro formatted files that are in HDFS ? 
    What is the significance of the SingleFileHdfsReader interface ? It is not clear to the reader.



docs/learn/documentation/versioned/hdfs/consumer.md (line 89)
<https://reviews.apache.org/r/52570/#comment234762>

    Nit: Can you move the explanation of what advanced partitioning is  outside of the code block? 
    You can emphasize the reserved term note by doing -> 
    **note**  , when it is outside the code block



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1822)
<https://reviews.apache.org/r/52570/#comment234763>

    Look like a typo. It should "systems.*, instead of "system.*" ?


- Navina Ramesh


On Jan. 27, 2017, 5:48 p.m., Hai Lu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> -----------------------------------------------------------
> 
> (Updated Jan. 27, 2017, 5:48 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
>     https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> -------
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>


Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

Posted by Hai Lu <lh...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/
-----------------------------------------------------------

(Updated Jan. 27, 2017, 5:48 p.m.)


Review request for samza.


Bugs: SAMZA-1025
    https://issues.apache.org/jira/browse/SAMZA-1025


Repository: samza


Description
-------

documentation for hdfs system consumer


Diffs (updated)
-----

  docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
  docs/learn/documentation/versioned/hdfs/producer.md b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
  docs/learn/documentation/versioned/index.html d0b14ece94341e2cb937cf32db480e69f93303c2 
  docs/learn/documentation/versioned/jobs/configuration-table.html ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 

Diff: https://reviews.apache.org/r/52570/diff/


Testing
-------

N/A


Thanks,

Hai Lu


Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

Posted by Hai Lu <lh...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/
-----------------------------------------------------------

(Updated Jan. 27, 2017, 12:03 a.m.)


Review request for samza.


Bugs: SAMZA-1025
    https://issues.apache.org/jira/browse/SAMZA-1025


Repository: samza


Description
-------

documentation for hdfs system consumer


Diffs (updated)
-----

  docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
  docs/learn/documentation/versioned/hdfs/producer.md b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
  docs/learn/documentation/versioned/index.html d0b14ece94341e2cb937cf32db480e69f93303c2 
  docs/learn/documentation/versioned/jobs/configuration-table.html ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 

Diff: https://reviews.apache.org/r/52570/diff/


Testing
-------

N/A


Thanks,

Hai Lu


Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

Posted by Hai Lu <lh...@gmail.com>.

> On Jan. 26, 2017, 10:49 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/hdfs/consumer.md, line 92
> > <https://reviews.apache.org/r/52570/diff/3/?file=1616638#file1616638line92>
> >
> >     nit: Use capitalizations consistently
> >     
> >     1. Is `id` of any significance here?

Yes. it has to be exactly "[id]"


> On Jan. 26, 2017, 10:49 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/jobs/configuration-table.html, line 1832
> > <https://reviews.apache.org/r/52570/diff/3/?file=1616641#file1616641line1832>
> >
> >     Do we allow any other `partitioner` other than the defaultPartitioner? If not, it feels slightly unconventional to have `default` in the property name?

This was discussed in the code review for implementation. And yes, we do allow other type of partitioner potentially in the future. That's why I was suggested to put "defaultPartitioner" here.


> On Jan. 26, 2017, 10:49 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/jobs/configuration-table.html, line 1839
> > <https://reviews.apache.org/r/52570/diff/3/?file=1616641#file1616641line1839>
> >
> >     Is the group identifier the partition name? 
> >     Is `id` a reserved term? 
> >     Do we always expect `[id]`?

yes, it's a reserved term. updated the doc.


> On Jan. 26, 2017, 10:49 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/jobs/configuration-table.html, line 1847
> > <https://reviews.apache.org/r/52570/diff/3/?file=1616641#file1616641line1847>
> >
> >     The other minor observation is that the camel case naming scheme is slightly inconsistent with what we have for our public configs.
> >     
> >     For instance, job.coordinator.
> >     monitor-partition-change follows a different convention. I'd really prefer consistency in our config scheme. 
> >     
> >     Navina or Prateek thoughts?

I'm not sure what kind of discussion was going on there. But when I implemented the codes, I was explicitly told to adopt such a style (camel case).


> On Jan. 26, 2017, 10:49 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/jobs/configuration-table.html, line 1842
> > <https://reviews.apache.org/r/52570/diff/3/?file=1616641#file1616641line1842>
> >
> >     Is this the name of the class? I'm still not clear how this is used?

It's literally "avro", or "plain", or "json". Though the last two are not supported now. No, they are not class name.


- Hai


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163177
-----------------------------------------------------------


On Jan. 26, 2017, 6:47 p.m., Hai Lu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> -----------------------------------------------------------
> 
> (Updated Jan. 26, 2017, 6:47 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
>     https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> -------
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>


Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

Posted by Jagadish Venkatraman <ja...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163177
-----------------------------------------------------------




docs/learn/documentation/versioned/hdfs/consumer.md (line 22)
<https://reviews.apache.org/r/52570/#comment234613>

    Re-word to be concise.
    
    Avro encoded records are supported out of the box and it is easy to extend to support other formats (plain text, csv, json etc).
    `in the future` is probably not needed.



docs/learn/documentation/versioned/hdfs/consumer.md (line 30)
<https://reviews.apache.org/r/52570/#comment234621>

    nit: How do new lines look when rendered in the web-page? Do they appear like paragraphs? If so, do you think this can be made a single paragraph? (instead of 3).



docs/learn/documentation/versioned/hdfs/consumer.md (line 60)
<https://reviews.apache.org/r/52570/#comment234634>

    typo here task.inputs



docs/learn/documentation/versioned/hdfs/consumer.md (line 68)
<https://reviews.apache.org/r/52570/#comment234633>

    Would be good to provide dummy values? For instance, the above property (`whitelist`) uses it consistently.



docs/learn/documentation/versioned/hdfs/consumer.md (line 76)
<https://reviews.apache.org/r/52570/#comment234632>

    Redundant with previous line. Can potentially abridge?



docs/learn/documentation/versioned/hdfs/consumer.md (line 77)
<https://reviews.apache.org/r/52570/#comment234635>

    contains both a '.' and a ':'. Not sure if both are required.



docs/learn/documentation/versioned/hdfs/consumer.md (line 81)
<https://reviews.apache.org/r/52570/#comment234640>

    Use newlines after defining properties consistently.



docs/learn/documentation/versioned/hdfs/consumer.md (line 83)
<https://reviews.apache.org/r/52570/#comment234636>

    Can provide a dummy value here consistently?



docs/learn/documentation/versioned/hdfs/consumer.md (line 89)
<https://reviews.apache.org/r/52570/#comment234637>

    Use periods consistently. For example, the above section ended with a period.



docs/learn/documentation/versioned/hdfs/consumer.md (line 92)
<https://reviews.apache.org/r/52570/#comment234639>

    nit: Use capitalizations consistently
    
    1. Is `id` of any significance here?



docs/learn/documentation/versioned/hdfs/consumer.md (line 93)
<https://reviews.apache.org/r/52570/#comment234654>

    Please provide a value.



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1819)
<https://reviews.apache.org/r/52570/#comment234645>

    Thanks for calling out the trade-offs in the docs.



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1832)
<https://reviews.apache.org/r/52570/#comment234647>

    Do we allow any other `partitioner` other than the defaultPartitioner? If not, it feels slightly unconventional to have `default` in the property name?



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1839)
<https://reviews.apache.org/r/52570/#comment234641>

    Is the group identifier the partition name? 
    Is `id` a reserved term? 
    Do we always expect `[id]`?



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1840)
<https://reviews.apache.org/r/52570/#comment234649>

    Thanks for adding the example.
    
    Re-word docs.
    
    That you want to organize into `three` partitions as ...



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1842)
<https://reviews.apache.org/r/52570/#comment234650>

    Is this the name of the class? I'm still not clear how this is used?



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1847)
<https://reviews.apache.org/r/52570/#comment234642>

    The other minor observation is that the camel case naming scheme is slightly inconsistent with what we have for our public configs.
    
    For instance, job.coordinator.
    monitor-partition-change follows a different convention. I'd really prefer consistency in our config scheme. 
    
    Navina or Prateek thoughts?



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1849)
<https://reviews.apache.org/r/52570/#comment234643>

    typo: be default ? Did you mean By default?



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1850)
<https://reviews.apache.org/r/52570/#comment234644>

    Typo: Did you want to use two periods `..`?


- Jagadish Venkatraman


On Jan. 26, 2017, 6:47 p.m., Hai Lu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> -----------------------------------------------------------
> 
> (Updated Jan. 26, 2017, 6:47 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
>     https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> -------
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>


Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

Posted by Hai Lu <lh...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/
-----------------------------------------------------------

(Updated Jan. 26, 2017, 6:47 p.m.)


Review request for samza.


Bugs: SAMZA-1025
    https://issues.apache.org/jira/browse/SAMZA-1025


Repository: samza


Description
-------

documentation for hdfs system consumer


Diffs (updated)
-----

  docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
  docs/learn/documentation/versioned/hdfs/producer.md b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
  docs/learn/documentation/versioned/index.html d0b14ece94341e2cb937cf32db480e69f93303c2 
  docs/learn/documentation/versioned/jobs/configuration-table.html ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 

Diff: https://reviews.apache.org/r/52570/diff/


Testing
-------

N/A


Thanks,

Hai Lu


Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

Posted by Navina Ramesh <nr...@linkedin.com>.

> On Jan. 25, 2017, 10:50 p.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/hdfs/consumer.md, line 26
> > <https://reviews.apache.org/r/52570/diff/2/?file=1613256#file1613256line26>
> >
> >     Can you include the diagram from your design document?  Or something similar to elaborate how the setup should look like?
> 
> Hai Lu wrote:
>     The diagram was mostly for the situation at LinkedIn where we have separte yarn clusters - one for Samza, one for Hadoop. "Your job needs to run on the same YARN cluster which hosts the HDFS you want to consume from."  Is this statement not clear enough? What suggestion do you have in terms of the wording?

My bad. I thought it was generic architecture diagram. Didn't realize it was specific to LinkedIn's deployment. Please ignore this comment.


- Navina


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163036
-----------------------------------------------------------


On Jan. 27, 2017, 5:48 p.m., Hai Lu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> -----------------------------------------------------------
> 
> (Updated Jan. 27, 2017, 5:48 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
>     https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> -------
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>


Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

Posted by Hai Lu <lh...@gmail.com>.

> On Jan. 25, 2017, 10:50 p.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/hdfs/consumer.md, line 26
> > <https://reviews.apache.org/r/52570/diff/2/?file=1613256#file1613256line26>
> >
> >     Can you include the diagram from your design document?  Or something similar to elaborate how the setup should look like?

The diagram was mostly for the situation at LinkedIn where we have separte yarn clusters - one for Samza, one for Hadoop. "Your job needs to run on the same YARN cluster which hosts the HDFS you want to consume from."  Is this statement not clear enough? What suggestion do you have in terms of the wording?


> On Jan. 25, 2017, 10:50 p.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/hdfs/consumer.md, line 42
> > <https://reviews.apache.org/r/52570/diff/2/?file=1613256#file1613256line42>
> >
> >     Can you add a snippet of the interface here as well? It is easier for the user to skim through.

Linked to the java doc.


> On Jan. 25, 2017, 10:50 p.m., Navina Ramesh wrote:
> > docs/learn/documentation/versioned/hdfs/consumer.md, line 50
> > <https://reviews.apache.org/r/52570/diff/2/?file=1613256#file1613256line50>
> >
> >     Replace "users" with "user application". 
> >     
> >     We provide the capability for the user application to get notified when ...
> >     
> >     Rephrase "To do so, simply implement the interface [EndOfStreamListenerTask]" as "In order to receieve notications on EndOfStream with the task, the user application should simply implement the interface ..."

I changed the wording as Jagadish suggested above. Let me know if you have further suggestion on top of that.


- Hai


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163036
-----------------------------------------------------------


On Jan. 24, 2017, 2:07 a.m., Hai Lu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2017, 2:07 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
>     https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> -------
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>


Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

Posted by Navina Ramesh <nr...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163036
-----------------------------------------------------------



Thanks for adding the documentation!


docs/learn/documentation/versioned/hdfs/consumer.md (line 26)
<https://reviews.apache.org/r/52570/#comment234455>

    Can you include the diagram from your design document?  Or something similar to elaborate how the setup should look like?



docs/learn/documentation/versioned/hdfs/consumer.md (line 42)
<https://reviews.apache.org/r/52570/#comment234458>

    Can you add a snippet of the interface here as well? It is easier for the user to skim through.



docs/learn/documentation/versioned/hdfs/consumer.md (line 50)
<https://reviews.apache.org/r/52570/#comment234460>

    Replace "users" with "user application". 
    
    We provide the capability for the user application to get notified when ...
    
    Rephrase "To do so, simply implement the interface [EndOfStreamListenerTask]" as "In order to receieve notications on EndOfStream with the task, the user application should simply implement the interface ..."



docs/learn/documentation/versioned/hdfs/consumer.md (line 54)
<https://reviews.apache.org/r/52570/#comment234461>

    I think you can skip the "job.properties" file. The readers may easily assume there is a separate properties file.



docs/learn/documentation/versioned/hdfs/consumer.md (line 75)
<https://reviews.apache.org/r/52570/#comment234462>

    Typo "configs are required"



docs/learn/documentation/versioned/hdfs/consumer.md (line 77)
<https://reviews.apache.org/r/52570/#comment234464>

    I don't think you need to mentioned the JIRA that introduced a feature. If there is documentation related to security in Samza, you can link to it. You can link to the javadoc for SamzaContainerSecurityManager.
    
    You can add a brief description of the feature. For example, the SamzaContainer fetches and renews the Kerberos delegation tokens when the job is running in a secure environment. User application needs to specify ..



docs/learn/documentation/versioned/hdfs/consumer.md (line 93)
<https://reviews.apache.org/r/52570/#comment234465>

    Can you elaborate the "advanced partitioning" feature here  and remove the link for design doc? If it helps, you can just copy-and-paste the design doc content here and edit it :)



docs/learn/documentation/versioned/hdfs/consumer.md (line 102)
<https://reviews.apache.org/r/52570/#comment234468>

    Looks like there are more configurations that are mentioned in the configuration table. Can you please add the link to configuration table to imply that?



docs/learn/documentation/versioned/hdfs/consumer.md (line 105)
<https://reviews.apache.org/r/52570/#comment234466>

    You can add a link to the design doc pdf here , instead of the JIRA link.


- Navina Ramesh


On Jan. 24, 2017, 2:07 a.m., Hai Lu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2017, 2:07 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
>     https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> -------
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>


Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

Posted by Navina Ramesh <nr...@linkedin.com>.

> On Jan. 25, 2017, 10:36 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/hdfs/consumer.md, line 67
> > <https://reviews.apache.org/r/52570/diff/2/?file=1613256#file1613256line67>
> >
> >     The relationship between whitelist and blacklist was not very obvious to me.
> >     
> >     Is the behavior that the whitelist is applied first, and the blacklist is applied to the matched files later? (to determine which files are to be ignored).
> 
> Hai Lu wrote:
>     The order doesn't matter. (X & whitelist) - blacklist == (X - blacklist) & whitelist

This is assuming that whitelist and blacklist are mutually exclusive, right?


- Navina


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163011
-----------------------------------------------------------


On Jan. 27, 2017, 5:48 p.m., Hai Lu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> -----------------------------------------------------------
> 
> (Updated Jan. 27, 2017, 5:48 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
>     https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> -------
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>


Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

Posted by Jagadish Venkatraman <ja...@gmail.com>.

> On Jan. 25, 2017, 10:36 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/hdfs/consumer.md, line 97
> > <https://reviews.apache.org/r/52570/diff/2/?file=1613256#file1613256line97>
> >
> >     Not clear to me how this differs from the whitelist (*.avro which specifies what files to process).
> 
> Hai Lu wrote:
>     They completely different: reader is the type of data encoded in the file content; whitelist is used to filter based on file name. Technically you can have an avro file that has ".java" as it's extention, right?

But, as a user of the API, it was not obvious to me how the value of `avro` is used. (or what it's significance is)

Does this have to be tied with the serde?


- Jagadish


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163011
-----------------------------------------------------------


On Jan. 26, 2017, 6:47 p.m., Hai Lu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> -----------------------------------------------------------
> 
> (Updated Jan. 26, 2017, 6:47 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
>     https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> -------
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>


Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

Posted by Hai Lu <lh...@gmail.com>.

> On Jan. 25, 2017, 10:36 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/hdfs/consumer.md, line 67
> > <https://reviews.apache.org/r/52570/diff/2/?file=1613256#file1613256line67>
> >
> >     The relationship between whitelist and blacklist was not very obvious to me.
> >     
> >     Is the behavior that the whitelist is applied first, and the blacklist is applied to the matched files later? (to determine which files are to be ignored).

The order doesn't matter. (X & whitelist) - blacklist == (X - blacklist) & whitelist


> On Jan. 25, 2017, 10:36 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/hdfs/consumer.md, line 97
> > <https://reviews.apache.org/r/52570/diff/2/?file=1613256#file1613256line97>
> >
> >     Not clear to me how this differs from the whitelist (*.avro which specifies what files to process).

They completely different: reader is the type of data encoded in the file content; whitelist is used to filter based on file name. Technically you can have an avro file that has ".java" as it's extention, right?


> On Jan. 25, 2017, 10:36 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/jobs/configuration-table.html, line 1819
> > <https://reviews.apache.org/r/52570/diff/2/?file=1613259#file1613259line1819>
> >
> >     What's this configuration? Is this the number of messages? What are the implications of this? 
> >     
> >     I'm not in favor of exposing this tunable if this is not super-significant.

This is important for performance tuning in some cases. Added a bit more details to explain


- Hai


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163011
-----------------------------------------------------------


On Jan. 24, 2017, 2:07 a.m., Hai Lu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2017, 2:07 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
>     https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> -------
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>


Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

Posted by Jagadish Venkatraman <ja...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163011
-----------------------------------------------------------




docs/learn/documentation/versioned/hdfs/consumer.md (line 22)
<https://reviews.apache.org/r/52570/#comment234385>

    Can re-word docs for clarity.
    1. Remove `Now` here: s/Now// - "You can configure your Samza jobs to read data from HDFS files". 
    2. No need to mention that it's implemented in `samza-hdfs`.
    3. Substitute: "Implemented in `samza-hdfs`, the `HdfsSystemConsumer` will read from HDFS files (`avro` records only for now. /->> "Only avro encoded records are supported for now."
    4. "very easy to extend to plain text, csv, json, etc. in the future" -> How is this supported? A hint on how to do this is useful. If not, we can nuke this.
    5. Reword: "Deliver messages to your tasks" can be left.



docs/learn/documentation/versioned/hdfs/consumer.md (line 26)
<https://reviews.apache.org/r/52570/#comment234404>

    Thanks for calling this out explicitly.



docs/learn/documentation/versioned/hdfs/consumer.md (line 30)
<https://reviews.apache.org/r/52570/#comment234406>

    Do you think it can be more concise:
    - "The way `HDFSSystemConsumer` does partitioning is basically to treat each HDFS file as a partition. Using Kafka as an analogy, a HDFS directory is close to the concept of a Kafka topic, and all the files within the directory are like topic partitions in the sense of Kafka. "
    
    Reword it to be: (something on the lines of.)
    "Partitioning works at the level of individual HDFS files. Each file is treated as a stream partition."
    
    - This line can be removed completely. The subsequent examples expands on the idea.



docs/learn/documentation/versioned/hdfs/consumer.md (line 34)
<https://reviews.apache.org/r/52570/#comment234407>

    Reword:
    
    You can configure upto 10 Samza containers to process these partitions.



docs/learn/documentation/versioned/hdfs/consumer.md (line 40)
<https://reviews.apache.org/r/52570/#comment234412>

    1. Not sure we have a noun called `payload` anywhere else in the Samza docs.
    
    2. Can be more precise in wording this.
    The received IncomingMessageEnvelope contains three significant fields.
    - The key which is empty.
    - The message which is set to the avro GenericRecord.
    - The stream partition which is set to the name of the HDFS file.
    
    3. Please link the IncomingMessageEnvelope API.



docs/learn/documentation/versioned/hdfs/consumer.md (line 42)
<https://reviews.apache.org/r/52570/#comment234414>

    1. Please link the javadocs of SingleFileHDFSReader
    2. Suggest re-word:
    2.1 one can implement/ you can implement (Since, the rest of the documentation is directed at the user)
    2.2 The second line on "few methods need to be implemented" can be ignored.



docs/learn/documentation/versioned/hdfs/consumer.md (line 43)
<https://reviews.apache.org/r/52570/#comment234416>

    It's not super-obvious how the implementation of the `SingleFileHDFSReader` ties to the underlying consumer. Am I missing something?



docs/learn/documentation/versioned/hdfs/consumer.md (line 46)
<https://reviews.apache.org/r/52570/#comment234417>

    1. Not sure we want to introduce a new noun called `data stream` in the docs.
    
    Suggest re-word to be precise:
    
    While a kafka topic has an unbounded stream of messages, HDFS files are bounded and have a notion of EOF.



docs/learn/documentation/versioned/hdfs/consumer.md (line 48)
<https://reviews.apache.org/r/52570/#comment234428>

    Reword docs to be concise:
    
    You can choose to implement `EndOfStreamListenerTask` to receive a callback when all partitions are at end of stream. When all partitions being processed by the task are at end of stream (ie. EOF has been reached for all files), the Samza job exits automatically.
    
    Remove `What happens when we finish reading all data from the HDFS files? The behavior is that the Samza job will shut itself down once the job is done with all files.`



docs/learn/documentation/versioned/hdfs/consumer.md (line 67)
<https://reviews.apache.org/r/52570/#comment234430>

    The relationship between whitelist and blacklist was not very obvious to me.
    
    Is the behavior that the whitelist is applied first, and the blacklist is applied to the matched files later? (to determine which files are to be ignored).



docs/learn/documentation/versioned/hdfs/consumer.md (line 75)
<https://reviews.apache.org/r/52570/#comment234433>

    1.fix typo: required
    2.For HDFS clusters that have kerberos enabled,...



docs/learn/documentation/versioned/hdfs/consumer.md (line 77)
<https://reviews.apache.org/r/52570/#comment234436>

    Suggestions on docs:
    
    1. Not needed to mention the jira.
    2. Not needed to mention the class name.
    
    The following additional configs are required when accessing HDFS clusters that have kerberos enabled.



docs/learn/documentation/versioned/hdfs/consumer.md (line 93)
<https://reviews.apache.org/r/52570/#comment234437>

    Might be worth presenting an abridged version of what advanced partitioning is supported? (or atleast, how it can be configured).



docs/learn/documentation/versioned/hdfs/consumer.md (line 97)
<https://reviews.apache.org/r/52570/#comment234439>

    Not clear to me how this differs from the whitelist (*.avro which specifies what files to process).



docs/learn/documentation/versioned/hdfs/consumer.md (line 99)
<https://reviews.apache.org/r/52570/#comment234441>

    1. Please Fix typo: retries
    2. Re-word to be precise: maximum number of retries (per-partition) before the container fails.



docs/learn/documentation/versioned/hdfs/consumer.md (line 105)
<https://reviews.apache.org/r/52570/#comment234443>

    We probably don't want to reference Jiras here in the docs.



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1819)
<https://reviews.apache.org/r/52570/#comment234444>

    What's this configuration? Is this the number of messages? What are the implications of this? 
    
    I'm not in favor of exposing this tunable if this is not super-significant.



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1824)
<https://reviews.apache.org/r/52570/#comment234446>

    The number of retry attempts when there is a failure to fetch messages from HDFS.
    
    Also, should document the behavior after failures after numRetries?



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1829)
<https://reviews.apache.org/r/52570/#comment234448>

    Docs for whitelist and blacklist are the same? Are both referring to `unwanted` files? Do we want to re-word them?



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1839)
<https://reviews.apache.org/r/52570/#comment234452>

    Not clear how this is useful? This needs more docs.



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1842)
<https://reviews.apache.org/r/52570/#comment234453>

    What are the implications of this config? It's not obvious to me from the docs.



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1847)
<https://reviews.apache.org/r/52570/#comment234454>

    If this is purely internal, why do we need this config?
    
    If there is no need for this to be set, what's the default value?



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1849)
<https://reviews.apache.org/r/52570/#comment234456>

    Also, the config-key convention is slightly different from the rest of the Samza configs:
    
    For example, there's camelCase for configs used in the consumer but the producer has a different config style.
    
    (producer.hdfs.compression.type versus producer.hdfsCompressionType)


- Jagadish Venkatraman


On Jan. 24, 2017, 2:07 a.m., Hai Lu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2017, 2:07 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
>     https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> -------
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>