You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Edi Bice <ed...@yahoo.com> on 2016/02/18 20:46:27 UTC
Review Request 43732: Implemented AvroDataFileHdfsWriter and exposed
several RocksDb config
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43732/
-----------------------------------------------------------
Review request for samza.
Repository: samza
Description
-------
https://issues.apache.org/jira/browse/SAMZA-876
Implemented AvroDataFileHdfsWriter fashioned loosely after BinarySequenceFileHDFSWriter.
Exposed several RocksDb configuration options (recommended in RocksDb tuning guide):
rocksdb.log.level
rocksdb.log.keepfilenum
rocksdb.log.timetoroll
rocksdb.log.maxfilesize
rocksdb.bloomfilter.bits
rocksdb.max.background.compactions
rocksdb.max.background.flushes
rocksdb.num.write.buffers
rocksdb.target.file.size.base
rocksdb.max.bytes.level.base
Diffs
-----
gradle.properties 16e1f5d
gradle/dependency-versions.gradle 52e25aa
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 7993119
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala PRE-CREATION
samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbOptionsHelper.java d4f765c
Diff: https://reviews.apache.org/r/43732/diff/
Testing
-------
I am using AvroDataFileHdfsWriter at the end of my pipeline. I feed the generated avro files to Apache Samoa. Have processed millions of records successfully. The RocksDb config changes are older and were used and verified to be working when originally implemented.
Thanks,
Edi Bice
Re: Review Request 43732: Implemented AvroDataFileHdfsWriter
Posted by Edi Bice <ed...@yahoo.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43732/
-----------------------------------------------------------
(Updated March 3, 2016, 6:58 p.m.)
Review request for samza.
Changes
-------
Another fresh rebase since Jackson has been upgraded in master
Repository: samza
Description
-------
https://issues.apache.org/jira/browse/SAMZA-876
Implemented AvroDataFileHdfsWriter fashioned loosely after BinarySequenceFileHDFSWriter.
Diffs (updated)
-----
docs/learn/documentation/versioned/hdfs/producer.md cfd22c6
docs/learn/documentation/versioned/jobs/configuration-table.html 175437c
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 7993119
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala PRE-CREATION
samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-avro.properties PRE-CREATION
samza-hdfs/src/test/resources/samza-hdfs-test-job-avro.properties PRE-CREATION
samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala c4b04a1
Diff: https://reviews.apache.org/r/43732/diff/
Testing
-------
Two JUnit tests similar to the Text/BinarySequenceFileHdfsWriter ones. In addition I've been using AvroDataFileHdfsWriter at the end of my pipeline. I feed the generated avro files to Apache Samoa. Have processed millions of records successfully.
Thanks,
Edi Bice
Re: Review Request 43732: Implemented AvroDataFileHdfsWriter
Posted by Edi Bice <ed...@yahoo.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43732/
-----------------------------------------------------------
(Updated March 2, 2016, 4:35 p.m.)
Review request for samza.
Changes
-------
Rebased with latest from master branch as requested
Repository: samza
Description
-------
https://issues.apache.org/jira/browse/SAMZA-876
Implemented AvroDataFileHdfsWriter fashioned loosely after BinarySequenceFileHDFSWriter.
Diffs (updated)
-----
docs/learn/documentation/versioned/hdfs/producer.md cfd22c6
docs/learn/documentation/versioned/jobs/configuration-table.html 6705530
gradle/dependency-versions.gradle 52e25aa
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 7993119
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala PRE-CREATION
samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-avro.properties PRE-CREATION
samza-hdfs/src/test/resources/samza-hdfs-test-job-avro.properties PRE-CREATION
samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala c4b04a1
Diff: https://reviews.apache.org/r/43732/diff/
Testing
-------
Two JUnit tests similar to the Text/BinarySequenceFileHdfsWriter ones. In addition I've been using AvroDataFileHdfsWriter at the end of my pipeline. I feed the generated avro files to Apache Samoa. Have processed millions of records successfully.
Thanks,
Edi Bice
Re: Review Request 43732: Implemented AvroDataFileHdfsWriter
Posted by "Yi Pan (Data Infrastructure)" <yi...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43732/#review121594
-----------------------------------------------------------
Overall lgtm. There is some problem in the last file in the diff uploaded. Could you try to rebase and upload again?Thanks!
gradle/dependency-versions.gradle (line 23)
<https://reviews.apache.org/r/43732/#comment183340>
Please rebase w/ the latest master branch.
- Yi Pan (Data Infrastructure)
On Feb. 25, 2016, 7:39 p.m., Edi Bice wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43732/
> -----------------------------------------------------------
>
> (Updated Feb. 25, 2016, 7:39 p.m.)
>
>
> Review request for samza.
>
>
> Repository: samza
>
>
> Description
> -------
>
> https://issues.apache.org/jira/browse/SAMZA-876
>
> Implemented AvroDataFileHdfsWriter fashioned loosely after BinarySequenceFileHDFSWriter.
>
>
> Diffs
> -----
>
> docs/learn/documentation/versioned/hdfs/producer.md cfd22c6
> docs/learn/documentation/versioned/jobs/configuration-table.html 6705530
> gradle/dependency-versions.gradle 52e25aa
> samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 7993119
> samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala PRE-CREATION
> samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-avro.properties PRE-CREATION
> samza-hdfs/src/test/resources/samza-hdfs-test-job-avro.properties PRE-CREATION
> samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala c4b04a1
>
> Diff: https://reviews.apache.org/r/43732/diff/
>
>
> Testing
> -------
>
> Two JUnit tests similar to the Text/BinarySequenceFileHdfsWriter ones. In addition I've been using AvroDataFileHdfsWriter at the end of my pipeline. I feed the generated avro files to Apache Samoa. Have processed millions of records successfully.
>
>
> Thanks,
>
> Edi Bice
>
>
Re: Review Request 43732: Implemented AvroDataFileHdfsWriter
Posted by Edi Bice <ed...@yahoo.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43732/
-----------------------------------------------------------
(Updated Feb. 25, 2016, 7:39 p.m.)
Review request for samza.
Changes
-------
Reducing the scope as Yi recommended (moving the RocksDb changes to a separate JIRA/patch)
Summary (updated)
-----------------
Implemented AvroDataFileHdfsWriter
Repository: samza
Description (updated)
-------
https://issues.apache.org/jira/browse/SAMZA-876
Implemented AvroDataFileHdfsWriter fashioned loosely after BinarySequenceFileHDFSWriter.
Diffs
-----
docs/learn/documentation/versioned/hdfs/producer.md cfd22c6
docs/learn/documentation/versioned/jobs/configuration-table.html 6705530
gradle/dependency-versions.gradle 52e25aa
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 7993119
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala PRE-CREATION
samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-avro.properties PRE-CREATION
samza-hdfs/src/test/resources/samza-hdfs-test-job-avro.properties PRE-CREATION
samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala c4b04a1
Diff: https://reviews.apache.org/r/43732/diff/
Testing (updated)
-------
Two JUnit tests similar to the Text/BinarySequenceFileHdfsWriter ones. In addition I've been using AvroDataFileHdfsWriter at the end of my pipeline. I feed the generated avro files to Apache Samoa. Have processed millions of records successfully.
Thanks,
Edi Bice
Re: Review Request 43732: Implemented AvroDataFileHdfsWriter and
exposed several RocksDb config
Posted by Edi Bice <ed...@yahoo.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43732/
-----------------------------------------------------------
(Updated Feb. 25, 2016, 4:50 p.m.)
Review request for samza.
Changes
-------
The new patch addressing all issues Yi brought up.
Repository: samza
Description
-------
https://issues.apache.org/jira/browse/SAMZA-876
Implemented AvroDataFileHdfsWriter fashioned loosely after BinarySequenceFileHDFSWriter.
Exposed several RocksDb configuration options (recommended in RocksDb tuning guide):
rocksdb.log.level
rocksdb.log.keepfilenum
rocksdb.log.timetoroll
rocksdb.log.maxfilesize
rocksdb.bloomfilter.bits
rocksdb.max.background.compactions
rocksdb.max.background.flushes
rocksdb.num.write.buffers
rocksdb.target.file.size.base
rocksdb.max.bytes.level.base
Diffs (updated)
-----
docs/learn/documentation/versioned/hdfs/producer.md cfd22c6
docs/learn/documentation/versioned/jobs/configuration-table.html 6705530
gradle/dependency-versions.gradle 52e25aa
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 7993119
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala PRE-CREATION
samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-avro.properties PRE-CREATION
samza-hdfs/src/test/resources/samza-hdfs-test-job-avro.properties PRE-CREATION
samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala c4b04a1
Diff: https://reviews.apache.org/r/43732/diff/
Testing
-------
I am using AvroDataFileHdfsWriter at the end of my pipeline. I feed the generated avro files to Apache Samoa. Have processed millions of records successfully. The RocksDb config changes are older and were used and verified to be working when originally implemented.
Thanks,
Edi Bice
Re: Review Request 43732: Implemented AvroDataFileHdfsWriter and
exposed several RocksDb config
Posted by Edi Bice <ed...@yahoo.com>.
> On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote:
> > Thanks for the contribution! I added a few comments below. Thanks
You're welcome. Thanks for Samza.
> On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote:
> > gradle.properties, line 18
> > <https://reviews.apache.org/r/43732/diff/1/?file=1254930#file1254930line18>
> >
> > Please keep the -SNAPSHOT suffix here.
Fixed in new patch.
> On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote:
> > gradle/dependency-versions.gradle, line 32
> > <https://reviews.apache.org/r/43732/diff/1/?file=1254931#file1254931line32>
> >
> > Please remove -MINE.
I had (Snappy linking) issues with the standard RocksDb and had to compile my own. See SAMZA-870. Fixed in new patch.
> On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote:
> > gradle/dependency-versions.gradle, line 23
> > <https://reviews.apache.org/r/43732/diff/1/?file=1254931#file1254931line23>
> >
> > There is SAMZA-878 opened for this jackson library upgrade. You can associate the patch w/ that ticket as well.
Attached the whole patch submitted here as I wasn't sure. Seems overkill.
> On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote:
> > gradle/dependency-versions.gradle, line 33
> > <https://reviews.apache.org/r/43732/diff/1/?file=1254931#file1254931line33>
> >
> > The minimum supported YARN version is still 2.6.1 in Samza and we have not deprecated the support yet. Hence, I am curious why you need 2.7.1 here?
Had trouble getting jobs to work on my 2.7.1 cluster but I'm no longer certain that changing this was what cured it. Dropping.
> On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala, line 50
> > <https://reviews.apache.org/r/43732/diff/1/?file=1254932#file1254932line50>
> >
> > Adding configuration variables also requires adding documentation to docs/learn/documentation/versioned/jobs/configuration-table.html.
Fixed in new patch.
> On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala, line 30
> > <https://reviews.apache.org/r/43732/diff/1/?file=1254933#file1254933line30>
> >
> > Please remove the auto-generated author info. Apache does not allow this.
Fixed in new patch.
> On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala, line 33
> > <https://reviews.apache.org/r/43732/diff/1/?file=1254933#file1254933line33>
> >
> > How is this class used? It would be nice to create some javadoc here to explain it.
> >
> > A set of unit tests for the new class is also needed here.
Fixed in new patch.
> On Feb. 23, 2016, 6:51 p.m., Yi Pan (Data Infrastructure) wrote:
> > samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbOptionsHelper.java, line 19
> > <https://reviews.apache.org/r/43732/diff/1/?file=1254934#file1254934line19>
> >
> > This does not seem to be relevant to the AvroDataFileWriter? Can we move it to another patch?
Dropped from new patch.
- Edi
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43732/#review120357
-----------------------------------------------------------
On Feb. 18, 2016, 7:46 p.m., Edi Bice wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43732/
> -----------------------------------------------------------
>
> (Updated Feb. 18, 2016, 7:46 p.m.)
>
>
> Review request for samza.
>
>
> Repository: samza
>
>
> Description
> -------
>
> https://issues.apache.org/jira/browse/SAMZA-876
>
> Implemented AvroDataFileHdfsWriter fashioned loosely after BinarySequenceFileHDFSWriter.
>
> Exposed several RocksDb configuration options (recommended in RocksDb tuning guide):
>
> rocksdb.log.level
> rocksdb.log.keepfilenum
> rocksdb.log.timetoroll
> rocksdb.log.maxfilesize
>
> rocksdb.bloomfilter.bits
> rocksdb.max.background.compactions
> rocksdb.max.background.flushes
> rocksdb.num.write.buffers
> rocksdb.target.file.size.base
> rocksdb.max.bytes.level.base
>
>
> Diffs
> -----
>
> gradle.properties 16e1f5d
> gradle/dependency-versions.gradle 52e25aa
> samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 7993119
> samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala PRE-CREATION
> samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbOptionsHelper.java d4f765c
>
> Diff: https://reviews.apache.org/r/43732/diff/
>
>
> Testing
> -------
>
> I am using AvroDataFileHdfsWriter at the end of my pipeline. I feed the generated avro files to Apache Samoa. Have processed millions of records successfully. The RocksDb config changes are older and were used and verified to be working when originally implemented.
>
>
> Thanks,
>
> Edi Bice
>
>
Re: Review Request 43732: Implemented AvroDataFileHdfsWriter and
exposed several RocksDb config
Posted by "Yi Pan (Data Infrastructure)" <yi...@linkedin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43732/#review120357
-----------------------------------------------------------
Thanks for the contribution! I added a few comments below. Thanks
gradle.properties (line 18)
<https://reviews.apache.org/r/43732/#comment181788>
Please keep the -SNAPSHOT suffix here.
gradle/dependency-versions.gradle (line 23)
<https://reviews.apache.org/r/43732/#comment181789>
There is SAMZA-878 opened for this jackson library upgrade. You can associate the patch w/ that ticket as well.
gradle/dependency-versions.gradle (line 32)
<https://reviews.apache.org/r/43732/#comment181790>
Please remove -MINE.
gradle/dependency-versions.gradle (line 33)
<https://reviews.apache.org/r/43732/#comment181791>
The minimum supported YARN version is still 2.6.1 in Samza and we have not deprecated the support yet. Hence, I am curious why you need 2.7.1 here?
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala (line 50)
<https://reviews.apache.org/r/43732/#comment181797>
Adding configuration variables also requires adding documentation to docs/learn/documentation/versioned/jobs/configuration-table.html.
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala (line 30)
<https://reviews.apache.org/r/43732/#comment181792>
Please remove the auto-generated author info. Apache does not allow this.
samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala (line 33)
<https://reviews.apache.org/r/43732/#comment181796>
How is this class used? It would be nice to create some javadoc here to explain it.
A set of unit tests for the new class is also needed here.
samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbOptionsHelper.java (line 19)
<https://reviews.apache.org/r/43732/#comment181795>
This does not seem to be relevant to the AvroDataFileWriter? Can we move it to another patch?
- Yi Pan (Data Infrastructure)
On Feb. 18, 2016, 7:46 p.m., Edi Bice wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43732/
> -----------------------------------------------------------
>
> (Updated Feb. 18, 2016, 7:46 p.m.)
>
>
> Review request for samza.
>
>
> Repository: samza
>
>
> Description
> -------
>
> https://issues.apache.org/jira/browse/SAMZA-876
>
> Implemented AvroDataFileHdfsWriter fashioned loosely after BinarySequenceFileHDFSWriter.
>
> Exposed several RocksDb configuration options (recommended in RocksDb tuning guide):
>
> rocksdb.log.level
> rocksdb.log.keepfilenum
> rocksdb.log.timetoroll
> rocksdb.log.maxfilesize
>
> rocksdb.bloomfilter.bits
> rocksdb.max.background.compactions
> rocksdb.max.background.flushes
> rocksdb.num.write.buffers
> rocksdb.target.file.size.base
> rocksdb.max.bytes.level.base
>
>
> Diffs
> -----
>
> gradle.properties 16e1f5d
> gradle/dependency-versions.gradle 52e25aa
> samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala 7993119
> samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/AvroDataFileHdfsWriter.scala PRE-CREATION
> samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbOptionsHelper.java d4f765c
>
> Diff: https://reviews.apache.org/r/43732/diff/
>
>
> Testing
> -------
>
> I am using AvroDataFileHdfsWriter at the end of my pipeline. I feed the generated avro files to Apache Samoa. Have processed millions of records successfully. The RocksDb config changes are older and were used and verified to be working when originally implemented.
>
>
> Thanks,
>
> Edi Bice
>
>