You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Edi Bice <ed...@yahoo.com> on 2016/02/18 20:46:27 UTC

Review Request 43732: Implemented AvroDataFileHdfsWriter and exposed several RocksDb config

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43732/
-----------------------------------------------------------

Review request for samza.


Repository: samza


Description
-------

https://issues.apache.org/jira/browse/SAMZA-876 

Implemented AvroDataFileHdfsWriter fashioned loosely after BinarySequenceFileHDFSWriter.

Exposed several RocksDb configuration options (recommended in RocksDb tuning guide):

rocksdb.log.level
rocksdb.log.keepfilenum
rocksdb.log.timetoroll
rocksdb.log.maxfilesize

rocksdb.bloomfilter.bits
rocksdb.max.background.compactions
rocksdb.max.background.flushes
rocksdb.num.write.buffers
rocksdb.target.file.size.base
rocksdb.max.bytes.level.base


Diffs
-----

  gradle.properties 16e1f5d 
  gradle/dependency-versions.gradle 52e25aa 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 7993119 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala PRE-CREATION 
  samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbOptionsHelper.java d4f765c 

Diff: https://reviews.apache.org/r/43732/diff/


Testing
-------

I am using AvroDataFileHdfsWriter at the end of my pipeline. I feed the generated avro files to Apache Samoa. Have processed millions of records successfully. The RocksDb config changes are older and were used and verified to be working when originally implemented.


Thanks,

Edi Bice


Re: Review Request 43732: Implemented AvroDataFileHdfsWriter

Posted by Edi Bice <ed...@yahoo.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43732/
-----------------------------------------------------------

(Updated March 3, 2016, 6:58 p.m.)


Review request for samza.


Changes
-------

Another fresh rebase since Jackson has been upgraded in master


Repository: samza


Description
-------

https://issues.apache.org/jira/browse/SAMZA-876 

Implemented AvroDataFileHdfsWriter fashioned loosely after BinarySequenceFileHDFSWriter.


Diffs (updated)
-----

  docs/learn/documentation/versioned/hdfs/producer.md cfd22c6 
  docs/learn/documentation/versioned/jobs/configuration-table.html 175437c 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 7993119 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-avro.properties PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-job-avro.properties PRE-CREATION 
  samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala c4b04a1 

Diff: https://reviews.apache.org/r/43732/diff/


Testing
-------

Two JUnit tests similar to the Text/BinarySequenceFileHdfsWriter ones. In addition I've been using AvroDataFileHdfsWriter at the end of my pipeline. I feed the generated avro files to Apache Samoa. Have processed millions of records successfully.


Thanks,

Edi Bice


Re: Review Request 43732: Implemented AvroDataFileHdfsWriter

Posted by Edi Bice <ed...@yahoo.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43732/
-----------------------------------------------------------

(Updated March 2, 2016, 4:35 p.m.)


Review request for samza.


Changes
-------

Rebased with latest from master branch as requested


Repository: samza


Description
-------

https://issues.apache.org/jira/browse/SAMZA-876 

Implemented AvroDataFileHdfsWriter fashioned loosely after BinarySequenceFileHDFSWriter.


Diffs (updated)
-----

  docs/learn/documentation/versioned/hdfs/producer.md cfd22c6 
  docs/learn/documentation/versioned/jobs/configuration-table.html 6705530 
  gradle/dependency-versions.gradle 52e25aa 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 7993119 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-avro.properties PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-job-avro.properties PRE-CREATION 
  samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala c4b04a1 

Diff: https://reviews.apache.org/r/43732/diff/


Testing
-------

Two JUnit tests similar to the Text/BinarySequenceFileHdfsWriter ones. In addition I've been using AvroDataFileHdfsWriter at the end of my pipeline. I feed the generated avro files to Apache Samoa. Have processed millions of records successfully.


Thanks,

Edi Bice


Re: Review Request 43732: Implemented AvroDataFileHdfsWriter

Posted by "Yi Pan (Data Infrastructure)" <yi...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43732/#review121594
-----------------------------------------------------------



Overall lgtm. There is some problem in the last file in the diff uploaded. Could you try to rebase and upload again?Thanks!


gradle/dependency-versions.gradle (line 23)
<https://reviews.apache.org/r/43732/#comment183340>

    Please rebase w/ the latest master branch.


- Yi Pan (Data Infrastructure)


On Feb. 25, 2016, 7:39 p.m., Edi Bice wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43732/
> -----------------------------------------------------------
> 
> (Updated Feb. 25, 2016, 7:39 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> https://issues.apache.org/jira/browse/SAMZA-876 
> 
> Implemented AvroDataFileHdfsWriter fashioned loosely after BinarySequenceFileHDFSWriter.
> 
> 
> Diffs
> -----
> 
>   docs/learn/documentation/versioned/hdfs/producer.md cfd22c6 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 6705530 
>   gradle/dependency-versions.gradle 52e25aa 
>   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 7993119 
>   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala PRE-CREATION 
>   samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-avro.properties PRE-CREATION 
>   samza-hdfs/src/test/resources/samza-hdfs-test-job-avro.properties PRE-CREATION 
>   samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala c4b04a1 
> 
> Diff: https://reviews.apache.org/r/43732/diff/
> 
> 
> Testing
> -------
> 
> Two JUnit tests similar to the Text/BinarySequenceFileHdfsWriter ones. In addition I've been using AvroDataFileHdfsWriter at the end of my pipeline. I feed the generated avro files to Apache Samoa. Have processed millions of records successfully.
> 
> 
> Thanks,
> 
> Edi Bice
> 
>


Re: Review Request 43732: Implemented AvroDataFileHdfsWriter

Posted by Edi Bice <ed...@yahoo.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43732/
-----------------------------------------------------------

(Updated Feb. 25, 2016, 7:39 p.m.)


Review request for samza.


Changes
-------

Reducing the scope as Yi recommended (moving the RocksDb changes to a separate JIRA/patch)


Summary (updated)
-----------------

Implemented AvroDataFileHdfsWriter


Repository: samza


Description (updated)
-------

https://issues.apache.org/jira/browse/SAMZA-876 

Implemented AvroDataFileHdfsWriter fashioned loosely after BinarySequenceFileHDFSWriter.


Diffs
-----

  docs/learn/documentation/versioned/hdfs/producer.md cfd22c6 
  docs/learn/documentation/versioned/jobs/configuration-table.html 6705530 
  gradle/dependency-versions.gradle 52e25aa 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 7993119 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-avro.properties PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-job-avro.properties PRE-CREATION 
  samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala c4b04a1 

Diff: https://reviews.apache.org/r/43732/diff/


Testing (updated)
-------

Two JUnit tests similar to the Text/BinarySequenceFileHdfsWriter ones. In addition I've been using AvroDataFileHdfsWriter at the end of my pipeline. I feed the generated avro files to Apache Samoa. Have processed millions of records successfully.


Thanks,

Edi Bice


Re: Review Request 43732: Implemented AvroDataFileHdfsWriter and exposed several RocksDb config

Posted by Edi Bice <ed...@yahoo.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43732/
-----------------------------------------------------------

(Updated Feb. 25, 2016, 4:50 p.m.)


Review request for samza.


Changes
-------

The new patch addressing all issues Yi brought up.


Repository: samza


Description
-------

https://issues.apache.org/jira/browse/SAMZA-876 

Implemented AvroDataFileHdfsWriter fashioned loosely after BinarySequenceFileHDFSWriter.

Exposed several RocksDb configuration options (recommended in RocksDb tuning guide):

rocksdb.log.level
rocksdb.log.keepfilenum
rocksdb.log.timetoroll
rocksdb.log.maxfilesize

rocksdb.bloomfilter.bits
rocksdb.max.background.compactions
rocksdb.max.background.flushes
rocksdb.num.write.buffers
rocksdb.target.file.size.base
rocksdb.max.bytes.level.base


Diffs (updated)
-----

  docs/learn/documentation/versioned/hdfs/producer.md cfd22c6 
  docs/learn/documentation/versioned/jobs/configuration-table.html 6705530 
  gradle/dependency-versions.gradle 52e25aa 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 7993119 
  samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-avro.properties PRE-CREATION 
  samza-hdfs/src/test/resources/samza-hdfs-test-job-avro.properties PRE-CREATION 
  samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala c4b04a1 

Diff: https://reviews.apache.org/r/43732/diff/


Testing
-------

I am using AvroDataFileHdfsWriter at the end of my pipeline. I feed the generated avro files to Apache Samoa. Have processed millions of records successfully. The RocksDb config changes are older and were used and verified to be working when originally implemented.


Thanks,

Edi Bice


Re: Review Request 43732: Implemented AvroDataFileHdfsWriter and exposed several RocksDb config

Posted by Edi Bice <ed...@yahoo.com>.

> On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote:
> > Thanks for the contribution! I added a few comments below. Thanks

You're welcome. Thanks for Samza.


> On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote:
> > gradle.properties, line 18
> > <https://reviews.apache.org/r/43732/diff/1/?file=1254930#file1254930line18>
> >
> >     Please keep the -SNAPSHOT suffix here.

Fixed in new patch.


> On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote:
> > gradle/dependency-versions.gradle, line 32
> > <https://reviews.apache.org/r/43732/diff/1/?file=1254931#file1254931line32>
> >
> >     Please remove -MINE.

I had (Snappy linking) issues with the standard RocksDb and had to compile my own. See SAMZA-870. Fixed in new patch.


> On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote:
> > gradle/dependency-versions.gradle, line 23
> > <https://reviews.apache.org/r/43732/diff/1/?file=1254931#file1254931line23>
> >
> >     There is SAMZA-878 opened for this jackson library upgrade. You can associate the patch w/ that ticket as well.

Attached the whole patch submitted here as I wasn't sure. Seems overkill.


> On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote:
> > gradle/dependency-versions.gradle, line 33
> > <https://reviews.apache.org/r/43732/diff/1/?file=1254931#file1254931line33>
> >
> >     The minimum supported YARN version is still 2.6.1 in Samza and we have not deprecated the support yet. Hence, I am curious why you need 2.7.1 here?

Had trouble getting jobs to work on my 2.7.1 cluster but I'm no longer certain that changing this was what cured it. Dropping.


> On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala, line 50
> > <https://reviews.apache.org/r/43732/diff/1/?file=1254932#file1254932line50>
> >
> >     Adding configuration variables also requires adding documentation to docs/learn/documentation/versioned/jobs/configuration-table.html.

Fixed in new patch.


> On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala, line 30
> > <https://reviews.apache.org/r/43732/diff/1/?file=1254933#file1254933line30>
> >
> >     Please remove the auto-generated author info. Apache does not allow this.

Fixed in new patch.


> On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala, line 33
> > <https://reviews.apache.org/r/43732/diff/1/?file=1254933#file1254933line33>
> >
> >     How is this class used? It would be nice to create some javadoc here to explain it.
> >     
> >     A set of unit tests for the new class is also needed here.

Fixed in new patch.


> On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbOptionsHelper.java, line 19
> > <https://reviews.apache.org/r/43732/diff/1/?file=1254934#file1254934line19>
> >
> >     This does not seem to be relevant to the AvroDataFileWriter? Can we move it to another patch?

Dropped from new patch.


- Edi


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43732/#review120357
-----------------------------------------------------------


On Feb. 18, 2016, 7:46 p.m., Edi Bice wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43732/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2016, 7:46 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> https://issues.apache.org/jira/browse/SAMZA-876 
> 
> Implemented AvroDataFileHdfsWriter fashioned loosely after BinarySequenceFileHDFSWriter.
> 
> Exposed several RocksDb configuration options (recommended in RocksDb tuning guide):
> 
> rocksdb.log.level
> rocksdb.log.keepfilenum
> rocksdb.log.timetoroll
> rocksdb.log.maxfilesize
> 
> rocksdb.bloomfilter.bits
> rocksdb.max.background.compactions
> rocksdb.max.background.flushes
> rocksdb.num.write.buffers
> rocksdb.target.file.size.base
> rocksdb.max.bytes.level.base
> 
> 
> Diffs
> -----
> 
>   gradle.properties 16e1f5d 
>   gradle/dependency-versions.gradle 52e25aa 
>   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 7993119 
>   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala PRE-CREATION 
>   samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbOptionsHelper.java d4f765c 
> 
> Diff: https://reviews.apache.org/r/43732/diff/
> 
> 
> Testing
> -------
> 
> I am using AvroDataFileHdfsWriter at the end of my pipeline. I feed the generated avro files to Apache Samoa. Have processed millions of records successfully. The RocksDb config changes are older and were used and verified to be working when originally implemented.
> 
> 
> Thanks,
> 
> Edi Bice
> 
>


Re: Review Request 43732: Implemented AvroDataFileHdfsWriter and exposed several RocksDb config

Posted by "Yi Pan (Data Infrastructure)" <yi...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43732/#review120357
-----------------------------------------------------------



Thanks for the contribution! I added a few comments below. Thanks


gradle.properties (line 18)
<https://reviews.apache.org/r/43732/#comment181788>

    Please keep the -SNAPSHOT suffix here.



gradle/dependency-versions.gradle (line 23)
<https://reviews.apache.org/r/43732/#comment181789>

    There is SAMZA-878 opened for this jackson library upgrade. You can associate the patch w/ that ticket as well.



gradle/dependency-versions.gradle (line 32)
<https://reviews.apache.org/r/43732/#comment181790>

    Please remove -MINE.



gradle/dependency-versions.gradle (line 33)
<https://reviews.apache.org/r/43732/#comment181791>

    The minimum supported YARN version is still 2.6.1 in Samza and we have not deprecated the support yet. Hence, I am curious why you need 2.7.1 here?



samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala (line 50)
<https://reviews.apache.org/r/43732/#comment181797>

    Adding configuration variables also requires adding documentation to docs/learn/documentation/versioned/jobs/configuration-table.html.



samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala (line 30)
<https://reviews.apache.org/r/43732/#comment181792>

    Please remove the auto-generated author info. Apache does not allow this.



samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala (line 33)
<https://reviews.apache.org/r/43732/#comment181796>

    How is this class used? It would be nice to create some javadoc here to explain it.
    
    A set of unit tests for the new class is also needed here.



samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbOptionsHelper.java (line 19)
<https://reviews.apache.org/r/43732/#comment181795>

    This does not seem to be relevant to the AvroDataFileWriter? Can we move it to another patch?


- Yi Pan (Data Infrastructure)


On Feb. 18, 2016, 7:46 p.m., Edi Bice wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43732/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2016, 7:46 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> https://issues.apache.org/jira/browse/SAMZA-876 
> 
> Implemented AvroDataFileHdfsWriter fashioned loosely after BinarySequenceFileHDFSWriter.
> 
> Exposed several RocksDb configuration options (recommended in RocksDb tuning guide):
> 
> rocksdb.log.level
> rocksdb.log.keepfilenum
> rocksdb.log.timetoroll
> rocksdb.log.maxfilesize
> 
> rocksdb.bloomfilter.bits
> rocksdb.max.background.compactions
> rocksdb.max.background.flushes
> rocksdb.num.write.buffers
> rocksdb.target.file.size.base
> rocksdb.max.bytes.level.base
> 
> 
> Diffs
> -----
> 
>   gradle.properties 16e1f5d 
>   gradle/dependency-versions.gradle 52e25aa 
>   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 7993119 
>   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala PRE-CREATION 
>   samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbOptionsHelper.java d4f765c 
> 
> Diff: https://reviews.apache.org/r/43732/diff/
> 
> 
> Testing
> -------
> 
> I am using AvroDataFileHdfsWriter at the end of my pipeline. I feed the generated avro files to Apache Samoa. Have processed millions of records successfully. The RocksDb config changes are older and were used and verified to be working when originally implemented.
> 
> 
> Thanks,
> 
> Edi Bice
> 
>