You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by Jonathan Park <pa...@gmail.com> on 2014/06/12 23:54:14 UTC

Review Request 22537: HeapIterator optimization

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22537/
-----------------------------------------------------------

Review request for accumulo.


Bugs: ACCUMULO-2827
    https://issues.apache.org/jira/browse/ACCUMULO-2827


Repository: accumulo


Description
-------

HeapIterator optimization to avoid always re-inserting back into the heap.


Diffs
-----

  core/src/main/java/org/apache/accumulo/core/iterators/system/HeapIterator.java e54f37c 
  core/src/test/java/org/apache/accumulo/core/iterators/system/BenchmarkMultiIterator.java PRE-CREATION 

Diff: https://reviews.apache.org/r/22537/diff/


Testing
-------

mvn clean install

accumulo continuous ingest: saw ~5% improvement on uniformly distributed data. (see ticket for more details)


Thanks,

Jonathan Park


Re: Review Request 22537: HeapIterator optimization

Posted by ke...@deenlo.com.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22537/#review46289
-----------------------------------------------------------

Ship it!


Ship It!

- kturner


On June 12, 2014, 9:54 p.m., Jonathan Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22537/
> -----------------------------------------------------------
> 
> (Updated June 12, 2014, 9:54 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-2827
>     https://issues.apache.org/jira/browse/ACCUMULO-2827
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> HeapIterator optimization to avoid always re-inserting back into the heap.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/accumulo/core/iterators/system/HeapIterator.java e54f37c 
>   core/src/test/java/org/apache/accumulo/core/iterators/system/BenchmarkMultiIterator.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22537/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> accumulo continuous ingest: saw ~5% improvement on uniformly distributed data. (see ticket for more details)
> 
> 
> Thanks,
> 
> Jonathan Park
> 
>


Re: Review Request 22537: HeapIterator optimization

Posted by Josh Elser <jo...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22537/#review46298
-----------------------------------------------------------

Ship it!


Ship It!

- Josh Elser


On June 12, 2014, 9:54 p.m., Jonathan Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22537/
> -----------------------------------------------------------
> 
> (Updated June 12, 2014, 9:54 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-2827
>     https://issues.apache.org/jira/browse/ACCUMULO-2827
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> HeapIterator optimization to avoid always re-inserting back into the heap.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/accumulo/core/iterators/system/HeapIterator.java e54f37c 
>   core/src/test/java/org/apache/accumulo/core/iterators/system/BenchmarkMultiIterator.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22537/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> accumulo continuous ingest: saw ~5% improvement on uniformly distributed data. (see ticket for more details)
> 
> 
> Thanks,
> 
> Jonathan Park
> 
>


Re: Review Request 22537: HeapIterator optimization

Posted by Mike Drob <md...@mdrob.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22537/#review46403
-----------------------------------------------------------



core/src/main/java/org/apache/accumulo/core/iterators/system/HeapIterator.java
<https://reviews.apache.org/r/22537/#comment81763>

    nit: whitespace (here and elsewhere)


- Mike Drob


On June 23, 2014, 3:16 p.m., Jonathan Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22537/
> -----------------------------------------------------------
> 
> (Updated June 23, 2014, 3:16 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-2827
>     https://issues.apache.org/jira/browse/ACCUMULO-2827
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> HeapIterator optimization to avoid always re-inserting back into the heap.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/accumulo/core/iterators/system/HeapIterator.java e54f37c 
>   core/src/test/java/org/apache/accumulo/core/iterators/system/BenchmarkMultiIterator.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22537/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> accumulo continuous ingest: saw ~5% improvement on uniformly distributed data. (see ticket for more details)
> 
> 
> Thanks,
> 
> Jonathan Park
> 
>


Re: Review Request 22537: HeapIterator optimization

Posted by Jonathan Park <pa...@gmail.com>.

> On June 23, 2014, 4:41 p.m., kturner wrote:
> > core/src/main/java/org/apache/accumulo/core/iterators/system/HeapIterator.java, line 53
> > <https://reviews.apache.org/r/22537/diff/1-2/?file=608354#file608354line53>
> >
> >     Was any performance testing done using this different heap impelementation?  I suspect it will not matter, but it would be nice to verify that since the primary purpose of this issue is to improve performance.

Tests are now complete. The results are posted on the JIRA ticket.

The test harness you created was very convenient for running the tests.


> On June 23, 2014, 4:41 p.m., kturner wrote:
> > core/src/test/java/org/apache/accumulo/core/iterators/system/BenchmarkMultiIterator.java, line 1
> > <https://reviews.apache.org/r/22537/diff/3/?file=615073#file615073line1>
> >
> >     This file has no license (go rat check).  I am thinking this file should not be be included in the patch, as nothing will run it. Could just attach the test to the issue.
> >     
> >     If its going to be included, a few sentences of documentation would be nice describing why it exist and what it does (or just ref the issue).  Something like "measuer performance of MultiIterator reading from multiple sources w/ interleaved data".

Nice catch.

I separated it from the patch and also added the documentation. The java file itself was uploaded separately on the JIRA ticket.


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22537/#review46408
-----------------------------------------------------------


On June 23, 2014, 4:24 p.m., Jonathan Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22537/
> -----------------------------------------------------------
> 
> (Updated June 23, 2014, 4:24 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-2827
>     https://issues.apache.org/jira/browse/ACCUMULO-2827
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> HeapIterator optimization to avoid always re-inserting back into the heap.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/accumulo/core/iterators/system/HeapIterator.java e54f37c 
>   core/src/test/java/org/apache/accumulo/core/iterators/system/BenchmarkMultiIterator.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22537/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> accumulo continuous ingest: saw ~5% improvement on uniformly distributed data. (see ticket for more details)
> 
> 
> Thanks,
> 
> Jonathan Park
> 
>


Re: Review Request 22537: HeapIterator optimization

Posted by ke...@deenlo.com.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22537/#review46408
-----------------------------------------------------------



core/src/main/java/org/apache/accumulo/core/iterators/system/HeapIterator.java
<https://reviews.apache.org/r/22537/#comment81789>

    Was any performance testing done using this different heap impelementation?  I suspect it will not matter, but it would be nice to verify that since the primary purpose of this issue is to improve performance.



core/src/test/java/org/apache/accumulo/core/iterators/system/BenchmarkMultiIterator.java
<https://reviews.apache.org/r/22537/#comment81784>

    This file has no license (go rat check).  I am thinking this file should not be be included in the patch, as nothing will run it. Could just attach the test to the issue.
    
    If its going to be included, a few sentences of documentation would be nice describing why it exist and what it does (or just ref the issue).  Something like "measuer performance of MultiIterator reading from multiple sources w/ interleaved data".


- kturner


On June 23, 2014, 4:24 p.m., Jonathan Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22537/
> -----------------------------------------------------------
> 
> (Updated June 23, 2014, 4:24 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-2827
>     https://issues.apache.org/jira/browse/ACCUMULO-2827
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> HeapIterator optimization to avoid always re-inserting back into the heap.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/accumulo/core/iterators/system/HeapIterator.java e54f37c 
>   core/src/test/java/org/apache/accumulo/core/iterators/system/BenchmarkMultiIterator.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22537/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> accumulo continuous ingest: saw ~5% improvement on uniformly distributed data. (see ticket for more details)
> 
> 
> Thanks,
> 
> Jonathan Park
> 
>


Re: Review Request 22537: HeapIterator optimization

Posted by Mike Drob <md...@mdrob.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22537/#review46405
-----------------------------------------------------------

Ship it!


Ship It!

- Mike Drob


On June 23, 2014, 4:24 p.m., Jonathan Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22537/
> -----------------------------------------------------------
> 
> (Updated June 23, 2014, 4:24 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-2827
>     https://issues.apache.org/jira/browse/ACCUMULO-2827
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> HeapIterator optimization to avoid always re-inserting back into the heap.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/accumulo/core/iterators/system/HeapIterator.java e54f37c 
>   core/src/test/java/org/apache/accumulo/core/iterators/system/BenchmarkMultiIterator.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22537/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> accumulo continuous ingest: saw ~5% improvement on uniformly distributed data. (see ticket for more details)
> 
> 
> Thanks,
> 
> Jonathan Park
> 
>


Re: Review Request 22537: HeapIterator optimization

Posted by Jonathan Park <pa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22537/
-----------------------------------------------------------

(Updated June 23, 2014, 4:24 p.m.)


Review request for accumulo.


Changes
-------

Removing trailing whitespace on all lines.


Bugs: ACCUMULO-2827
    https://issues.apache.org/jira/browse/ACCUMULO-2827


Repository: accumulo


Description
-------

HeapIterator optimization to avoid always re-inserting back into the heap.


Diffs (updated)
-----

  core/src/main/java/org/apache/accumulo/core/iterators/system/HeapIterator.java e54f37c 
  core/src/test/java/org/apache/accumulo/core/iterators/system/BenchmarkMultiIterator.java PRE-CREATION 

Diff: https://reviews.apache.org/r/22537/diff/


Testing
-------

mvn clean install

accumulo continuous ingest: saw ~5% improvement on uniformly distributed data. (see ticket for more details)


Thanks,

Jonathan Park


Re: Review Request 22537: HeapIterator optimization

Posted by Jonathan Park <pa...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22537/
-----------------------------------------------------------

(Updated June 23, 2014, 3:16 p.m.)


Review request for accumulo.


Changes
-------

Replacing use of PriorityBuffer with java.util.PriorityQueue


Bugs: ACCUMULO-2827
    https://issues.apache.org/jira/browse/ACCUMULO-2827


Repository: accumulo


Description
-------

HeapIterator optimization to avoid always re-inserting back into the heap.


Diffs (updated)
-----

  core/src/main/java/org/apache/accumulo/core/iterators/system/HeapIterator.java e54f37c 
  core/src/test/java/org/apache/accumulo/core/iterators/system/BenchmarkMultiIterator.java PRE-CREATION 

Diff: https://reviews.apache.org/r/22537/diff/


Testing
-------

mvn clean install

accumulo continuous ingest: saw ~5% improvement on uniformly distributed data. (see ticket for more details)


Thanks,

Jonathan Park


Re: Review Request 22537: HeapIterator optimization

Posted by Jonathan Park <pa...@gmail.com>.

> On June 12, 2014, 11:03 p.m., Mike Drob wrote:
> > Can you add a unit test to ensure correctness? Something as simple as sorting from two iterators that have a couple values each. Also checking empty values.

Thank you for the review and sorry for the delay.

As for unit tests, I second what kturner mentioned but I'll make a sanity pass through the MultiIteratorTest to see if there are some interesting cases that could be added.


> On June 12, 2014, 11:03 p.m., Mike Drob wrote:
> > core/src/main/java/org/apache/accumulo/core/iterators/system/HeapIterator.java, line 28
> > <https://reviews.apache.org/r/22537/diff/1/?file=608354#file608354line28>
> >
> >     Can we switch this to be a java.util.PriorityQueue<E>? It will save us on unsafe casts when we pop elements.

This is a good suggestion. The unsafe casts also bother me. 

I'll work towards getting that in.


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22537/#review45550
-----------------------------------------------------------


On June 12, 2014, 9:54 p.m., Jonathan Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22537/
> -----------------------------------------------------------
> 
> (Updated June 12, 2014, 9:54 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-2827
>     https://issues.apache.org/jira/browse/ACCUMULO-2827
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> HeapIterator optimization to avoid always re-inserting back into the heap.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/accumulo/core/iterators/system/HeapIterator.java e54f37c 
>   core/src/test/java/org/apache/accumulo/core/iterators/system/BenchmarkMultiIterator.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22537/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> accumulo continuous ingest: saw ~5% improvement on uniformly distributed data. (see ticket for more details)
> 
> 
> Thanks,
> 
> Jonathan Park
> 
>


Re: Review Request 22537: HeapIterator optimization

Posted by Mike Drob <md...@mdrob.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22537/#review45550
-----------------------------------------------------------


Can you add a unit test to ensure correctness? Something as simple as sorting from two iterators that have a couple values each. Also checking empty values.


core/src/main/java/org/apache/accumulo/core/iterators/system/HeapIterator.java
<https://reviews.apache.org/r/22537/#comment80404>

    Can we switch this to be a java.util.PriorityQueue<E>? It will save us on unsafe casts when we pop elements.


- Mike Drob


On June 12, 2014, 9:54 p.m., Jonathan Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22537/
> -----------------------------------------------------------
> 
> (Updated June 12, 2014, 9:54 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-2827
>     https://issues.apache.org/jira/browse/ACCUMULO-2827
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> HeapIterator optimization to avoid always re-inserting back into the heap.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/accumulo/core/iterators/system/HeapIterator.java e54f37c 
>   core/src/test/java/org/apache/accumulo/core/iterators/system/BenchmarkMultiIterator.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22537/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> accumulo continuous ingest: saw ~5% improvement on uniformly distributed data. (see ticket for more details)
> 
> 
> Thanks,
> 
> Jonathan Park
> 
>


Re: Review Request 22537: HeapIterator optimization

Posted by ke...@deenlo.com.

> On June 18, 2014, 11:14 p.m., kturner wrote:
> > I took a look at the code.  The changes look good.  I am going to try running an end to end experiment like the one you ran w/ CI.  However I am going to generate random rows with multiple columns.

HeapIterator has no unit test itself, but its well covered by the MultiIteratorTest and RFileTest.  I ran those two test w/ the patch and looked at code coverage, its almost 100%.  The only things not covered were the cases where IllegalStateException was thrown.


- kturner


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22537/#review46142
-----------------------------------------------------------


On June 12, 2014, 9:54 p.m., Jonathan Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22537/
> -----------------------------------------------------------
> 
> (Updated June 12, 2014, 9:54 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-2827
>     https://issues.apache.org/jira/browse/ACCUMULO-2827
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> HeapIterator optimization to avoid always re-inserting back into the heap.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/accumulo/core/iterators/system/HeapIterator.java e54f37c 
>   core/src/test/java/org/apache/accumulo/core/iterators/system/BenchmarkMultiIterator.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22537/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> accumulo continuous ingest: saw ~5% improvement on uniformly distributed data. (see ticket for more details)
> 
> 
> Thanks,
> 
> Jonathan Park
> 
>


Re: Review Request 22537: HeapIterator optimization

Posted by Jonathan Park <pa...@gmail.com>.

> On June 18, 2014, 11:14 p.m., kturner wrote:
> > I took a look at the code.  The changes look good.  I am going to try running an end to end experiment like the one you ran w/ CI.  However I am going to generate random rows with multiple columns.
> 
> kturner wrote:
>     HeapIterator has no unit test itself, but its well covered by the MultiIteratorTest and RFileTest.  I ran those two test w/ the patch and looked at code coverage, its almost 100%.  The only things not covered were the cases where IllegalStateException was thrown.

Thank you for the review and also for running code coverage on the HeapIterator.


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22537/#review46142
-----------------------------------------------------------


On June 12, 2014, 9:54 p.m., Jonathan Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22537/
> -----------------------------------------------------------
> 
> (Updated June 12, 2014, 9:54 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-2827
>     https://issues.apache.org/jira/browse/ACCUMULO-2827
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> HeapIterator optimization to avoid always re-inserting back into the heap.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/accumulo/core/iterators/system/HeapIterator.java e54f37c 
>   core/src/test/java/org/apache/accumulo/core/iterators/system/BenchmarkMultiIterator.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22537/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> accumulo continuous ingest: saw ~5% improvement on uniformly distributed data. (see ticket for more details)
> 
> 
> Thanks,
> 
> Jonathan Park
> 
>


Re: Review Request 22537: HeapIterator optimization

Posted by ke...@deenlo.com.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22537/#review46142
-----------------------------------------------------------


I took a look at the code.  The changes look good.  I am going to try running an end to end experiment like the one you ran w/ CI.  However I am going to generate random rows with multiple columns.

- kturner


On June 12, 2014, 9:54 p.m., Jonathan Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22537/
> -----------------------------------------------------------
> 
> (Updated June 12, 2014, 9:54 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-2827
>     https://issues.apache.org/jira/browse/ACCUMULO-2827
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> HeapIterator optimization to avoid always re-inserting back into the heap.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/accumulo/core/iterators/system/HeapIterator.java e54f37c 
>   core/src/test/java/org/apache/accumulo/core/iterators/system/BenchmarkMultiIterator.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22537/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> accumulo continuous ingest: saw ~5% improvement on uniformly distributed data. (see ticket for more details)
> 
> 
> Thanks,
> 
> Jonathan Park
> 
>