You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by Josh Elser <jo...@gmail.com> on 2013/11/21 06:31:45 UTC

Review Request 15752: ACCUMULO-1854 Persist AccumuloInputFormat information from Configuration into RangeInputSplit

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15752/
-----------------------------------------------------------

Review request for accumulo.


Bugs: ACCUMULO-1854
    https://issues.apache.org/jira/browse/ACCUMULO-1854


Repository: accumulo


Description
-------

The current way that AccumuloInputFormat works requires that the same *exact* Configuration that was used to invoke getSplits() is also provided when createRecordReader() is called on the InputFormat. In practice, notably looking at InputFormat implementations which merge or delegate other InputFormats, this is a bad idea.

By serializing the necessary information into the RangeInputSplit from the provided Configuration object in getSplits() we can completely avoid this problem, at the minimal expense of serialization this information into each InputSplit. I tried to implement the changes in such a way that would be backwards compatible. If the information is not provided (is null) in the RangeInputSplit, the RecordReader will still attempt to pull a value from the Configuration object so as to not fail immediately. This should provide a little more flexibility if users have custom code built on top of the AccumuloInputFormat and RangeInputSplit


Diffs
-----

  src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/AccumuloInputFormat.java 4de131f 
  src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/InputFormatBase.java 8e238f1 
  src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/RangeInputSplit.java PRE-CREATION 
  src/core/src/test/java/org/apache/accumulo/core/client/mapreduce/AccumuloInputFormatTest.java ba647e9 
  src/core/src/test/java/org/apache/accumulo/core/client/mapreduce/AccumuloRowInputFormatTest.java 0673f1b 
  src/core/src/test/java/org/apache/accumulo/core/client/mapreduce/RangeInputSplitTest.java PRE-CREATION 
  src/examples/simple/src/test/java/org/apache/accumulo/examples/simple/filedata/ChunkInputFormatTest.java c31c738 

Diff: https://reviews.apache.org/r/15752/diff/


Testing
-------

Verified changes work as intended using PigInputFormat (which may delegate to many InputFormats). Added additional unit tests and verified sufficient coverage using cobertura.


Thanks,

Josh Elser


Re: Review Request 15752: ACCUMULO-1854 Persist AccumuloInputFormat information from Configuration into RangeInputSplit

Posted by Josh Elser <jo...@gmail.com>.

> On Nov. 21, 2013, 8:38 p.m., Bill Slacum wrote:
> > I like that RangeInputSplit was pulled out into its own class. Be careful w/ the autoboxing of booleans-- personally I'd discourage its practice unless absolutely necessary.

Ah, now I remember. I had to mix+match to preserve the `null == never set` semantics that I was using. Using only the primitive, I wouldn't have been able to make that work without introducing another variable.


- Josh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15752/#review29245
-----------------------------------------------------------


On Nov. 21, 2013, 5:31 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15752/
> -----------------------------------------------------------
> 
> (Updated Nov. 21, 2013, 5:31 a.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1854
>     https://issues.apache.org/jira/browse/ACCUMULO-1854
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> The current way that AccumuloInputFormat works requires that the same *exact* Configuration that was used to invoke getSplits() is also provided when createRecordReader() is called on the InputFormat. In practice, notably looking at InputFormat implementations which merge or delegate other InputFormats, this is a bad idea.
> 
> By serializing the necessary information into the RangeInputSplit from the provided Configuration object in getSplits() we can completely avoid this problem, at the minimal expense of serialization this information into each InputSplit. I tried to implement the changes in such a way that would be backwards compatible. If the information is not provided (is null) in the RangeInputSplit, the RecordReader will still attempt to pull a value from the Configuration object so as to not fail immediately. This should provide a little more flexibility if users have custom code built on top of the AccumuloInputFormat and RangeInputSplit
> 
> 
> Diffs
> -----
> 
>   src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/AccumuloInputFormat.java 4de131f 
>   src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/InputFormatBase.java 8e238f1 
>   src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/RangeInputSplit.java PRE-CREATION 
>   src/core/src/test/java/org/apache/accumulo/core/client/mapreduce/AccumuloInputFormatTest.java ba647e9 
>   src/core/src/test/java/org/apache/accumulo/core/client/mapreduce/AccumuloRowInputFormatTest.java 0673f1b 
>   src/core/src/test/java/org/apache/accumulo/core/client/mapreduce/RangeInputSplitTest.java PRE-CREATION 
>   src/examples/simple/src/test/java/org/apache/accumulo/examples/simple/filedata/ChunkInputFormatTest.java c31c738 
> 
> Diff: https://reviews.apache.org/r/15752/diff/
> 
> 
> Testing
> -------
> 
> Verified changes work as intended using PigInputFormat (which may delegate to many InputFormats). Added additional unit tests and verified sufficient coverage using cobertura.
> 
> 
> Thanks,
> 
> Josh Elser
> 
>


Re: Review Request 15752: ACCUMULO-1854 Persist AccumuloInputFormat information from Configuration into RangeInputSplit

Posted by Bill Slacum <uj...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15752/#review29245
-----------------------------------------------------------


I like that RangeInputSplit was pulled out into its own class. Be careful w/ the autoboxing of booleans-- personally I'd discourage its practice unless absolutely necessary.

- Bill Slacum


On Nov. 21, 2013, 5:31 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15752/
> -----------------------------------------------------------
> 
> (Updated Nov. 21, 2013, 5:31 a.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1854
>     https://issues.apache.org/jira/browse/ACCUMULO-1854
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> The current way that AccumuloInputFormat works requires that the same *exact* Configuration that was used to invoke getSplits() is also provided when createRecordReader() is called on the InputFormat. In practice, notably looking at InputFormat implementations which merge or delegate other InputFormats, this is a bad idea.
> 
> By serializing the necessary information into the RangeInputSplit from the provided Configuration object in getSplits() we can completely avoid this problem, at the minimal expense of serialization this information into each InputSplit. I tried to implement the changes in such a way that would be backwards compatible. If the information is not provided (is null) in the RangeInputSplit, the RecordReader will still attempt to pull a value from the Configuration object so as to not fail immediately. This should provide a little more flexibility if users have custom code built on top of the AccumuloInputFormat and RangeInputSplit
> 
> 
> Diffs
> -----
> 
>   src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/AccumuloInputFormat.java 4de131f 
>   src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/InputFormatBase.java 8e238f1 
>   src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/RangeInputSplit.java PRE-CREATION 
>   src/core/src/test/java/org/apache/accumulo/core/client/mapreduce/AccumuloInputFormatTest.java ba647e9 
>   src/core/src/test/java/org/apache/accumulo/core/client/mapreduce/AccumuloRowInputFormatTest.java 0673f1b 
>   src/core/src/test/java/org/apache/accumulo/core/client/mapreduce/RangeInputSplitTest.java PRE-CREATION 
>   src/examples/simple/src/test/java/org/apache/accumulo/examples/simple/filedata/ChunkInputFormatTest.java c31c738 
> 
> Diff: https://reviews.apache.org/r/15752/diff/
> 
> 
> Testing
> -------
> 
> Verified changes work as intended using PigInputFormat (which may delegate to many InputFormats). Added additional unit tests and verified sufficient coverage using cobertura.
> 
> 
> Thanks,
> 
> Josh Elser
> 
>


Re: Review Request 15752: ACCUMULO-1854 Persist AccumuloInputFormat information from Configuration into RangeInputSplit

Posted by Josh Elser <jo...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15752/#review29267
-----------------------------------------------------------



src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/InputFormatBase.java
<https://reviews.apache.org/r/15752/#comment56411>

    I need to pass the InputSplit down here and check there first (or just pass in the correct parameters to use)


- Josh Elser


On Nov. 21, 2013, 5:31 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15752/
> -----------------------------------------------------------
> 
> (Updated Nov. 21, 2013, 5:31 a.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1854
>     https://issues.apache.org/jira/browse/ACCUMULO-1854
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> The current way that AccumuloInputFormat works requires that the same *exact* Configuration that was used to invoke getSplits() is also provided when createRecordReader() is called on the InputFormat. In practice, notably looking at InputFormat implementations which merge or delegate other InputFormats, this is a bad idea.
> 
> By serializing the necessary information into the RangeInputSplit from the provided Configuration object in getSplits() we can completely avoid this problem, at the minimal expense of serialization this information into each InputSplit. I tried to implement the changes in such a way that would be backwards compatible. If the information is not provided (is null) in the RangeInputSplit, the RecordReader will still attempt to pull a value from the Configuration object so as to not fail immediately. This should provide a little more flexibility if users have custom code built on top of the AccumuloInputFormat and RangeInputSplit
> 
> 
> Diffs
> -----
> 
>   src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/AccumuloInputFormat.java 4de131f 
>   src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/InputFormatBase.java 8e238f1 
>   src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/RangeInputSplit.java PRE-CREATION 
>   src/core/src/test/java/org/apache/accumulo/core/client/mapreduce/AccumuloInputFormatTest.java ba647e9 
>   src/core/src/test/java/org/apache/accumulo/core/client/mapreduce/AccumuloRowInputFormatTest.java 0673f1b 
>   src/core/src/test/java/org/apache/accumulo/core/client/mapreduce/RangeInputSplitTest.java PRE-CREATION 
>   src/examples/simple/src/test/java/org/apache/accumulo/examples/simple/filedata/ChunkInputFormatTest.java c31c738 
> 
> Diff: https://reviews.apache.org/r/15752/diff/
> 
> 
> Testing
> -------
> 
> Verified changes work as intended using PigInputFormat (which may delegate to many InputFormats). Added additional unit tests and verified sufficient coverage using cobertura.
> 
> 
> Thanks,
> 
> Josh Elser
> 
>


Re: Review Request 15752: ACCUMULO-1854 Persist AccumuloInputFormat information from Configuration into RangeInputSplit

Posted by Josh Elser <jo...@gmail.com>.

> On Nov. 21, 2013, 9:31 p.m., Bill Havanki wrote:
> >

Good catch. Thanks!


- Josh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15752/#review29247
-----------------------------------------------------------


On Nov. 21, 2013, 5:31 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15752/
> -----------------------------------------------------------
> 
> (Updated Nov. 21, 2013, 5:31 a.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1854
>     https://issues.apache.org/jira/browse/ACCUMULO-1854
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> The current way that AccumuloInputFormat works requires that the same *exact* Configuration that was used to invoke getSplits() is also provided when createRecordReader() is called on the InputFormat. In practice, notably looking at InputFormat implementations which merge or delegate other InputFormats, this is a bad idea.
> 
> By serializing the necessary information into the RangeInputSplit from the provided Configuration object in getSplits() we can completely avoid this problem, at the minimal expense of serialization this information into each InputSplit. I tried to implement the changes in such a way that would be backwards compatible. If the information is not provided (is null) in the RangeInputSplit, the RecordReader will still attempt to pull a value from the Configuration object so as to not fail immediately. This should provide a little more flexibility if users have custom code built on top of the AccumuloInputFormat and RangeInputSplit
> 
> 
> Diffs
> -----
> 
>   src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/AccumuloInputFormat.java 4de131f 
>   src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/InputFormatBase.java 8e238f1 
>   src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/RangeInputSplit.java PRE-CREATION 
>   src/core/src/test/java/org/apache/accumulo/core/client/mapreduce/AccumuloInputFormatTest.java ba647e9 
>   src/core/src/test/java/org/apache/accumulo/core/client/mapreduce/AccumuloRowInputFormatTest.java 0673f1b 
>   src/core/src/test/java/org/apache/accumulo/core/client/mapreduce/RangeInputSplitTest.java PRE-CREATION 
>   src/examples/simple/src/test/java/org/apache/accumulo/examples/simple/filedata/ChunkInputFormatTest.java c31c738 
> 
> Diff: https://reviews.apache.org/r/15752/diff/
> 
> 
> Testing
> -------
> 
> Verified changes work as intended using PigInputFormat (which may delegate to many InputFormats). Added additional unit tests and verified sufficient coverage using cobertura.
> 
> 
> Thanks,
> 
> Josh Elser
> 
>


Re: Review Request 15752: ACCUMULO-1854 Persist AccumuloInputFormat information from Configuration into RangeInputSplit

Posted by Bill Havanki <bh...@clouderagovt.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15752/#review29247
-----------------------------------------------------------



src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/InputFormatBase.java
<https://reviews.apache.org/r/15752/#comment56386>

    Thie method no longer needs the conf parameter. And therefore, the method above it taking a TaskAttemptContext parameter doesn't need that either.



src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/InputFormatBase.java
<https://reviews.apache.org/r/15752/#comment56387>

    Thie method no longer needs the conf parameter. And therefore, the method above it taking a TaskAttemptContext parameter doesn't need that either.


- Bill Havanki


On Nov. 21, 2013, 12:31 a.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15752/
> -----------------------------------------------------------
> 
> (Updated Nov. 21, 2013, 12:31 a.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1854
>     https://issues.apache.org/jira/browse/ACCUMULO-1854
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> The current way that AccumuloInputFormat works requires that the same *exact* Configuration that was used to invoke getSplits() is also provided when createRecordReader() is called on the InputFormat. In practice, notably looking at InputFormat implementations which merge or delegate other InputFormats, this is a bad idea.
> 
> By serializing the necessary information into the RangeInputSplit from the provided Configuration object in getSplits() we can completely avoid this problem, at the minimal expense of serialization this information into each InputSplit. I tried to implement the changes in such a way that would be backwards compatible. If the information is not provided (is null) in the RangeInputSplit, the RecordReader will still attempt to pull a value from the Configuration object so as to not fail immediately. This should provide a little more flexibility if users have custom code built on top of the AccumuloInputFormat and RangeInputSplit
> 
> 
> Diffs
> -----
> 
>   src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/AccumuloInputFormat.java 4de131f 
>   src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/InputFormatBase.java 8e238f1 
>   src/core/src/main/java/org/apache/accumulo/core/client/mapreduce/RangeInputSplit.java PRE-CREATION 
>   src/core/src/test/java/org/apache/accumulo/core/client/mapreduce/AccumuloInputFormatTest.java ba647e9 
>   src/core/src/test/java/org/apache/accumulo/core/client/mapreduce/AccumuloRowInputFormatTest.java 0673f1b 
>   src/core/src/test/java/org/apache/accumulo/core/client/mapreduce/RangeInputSplitTest.java PRE-CREATION 
>   src/examples/simple/src/test/java/org/apache/accumulo/examples/simple/filedata/ChunkInputFormatTest.java c31c738 
> 
> Diff: https://reviews.apache.org/r/15752/diff/
> 
> 
> Testing
> -------
> 
> Verified changes work as intended using PigInputFormat (which may delegate to many InputFormats). Added additional unit tests and verified sufficient coverage using cobertura.
> 
> 
> Thanks,
> 
> Josh Elser
> 
>