You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Mike Drob <ma...@cloudera.com> on 2018/03/10 20:12:01 UTC

Questions about synchronization in compaction pipeline

Hi devs,

I was reading through HBASE-17434 trying to understand why we have two
linked lists in compaction pipeline and I'm having trouble following the
conversation there, especially since it seems intertwined with HBASE-17379
and jumps back and forth a few times.

It looks like we are implementing our own copy-on-write list, and there is
a claim that addFirst is faster on a LinkedList than an array based list. I
am concerned about the LL copy in pushHead - even if addFirst is faster, a
LL copy is fairly slow and likely loses us any gains. Also, I'm a little
dubious on the use of LL given that we support a replaceAtIndex which will
be much faster in an array.

Can we improve by using an ArrayDeque?

Eschar, Anastasia, WDYT?

Thanks,
Mike

Some observations about performance -
https://stuartmarks.wordpress.com/2015/12/18/some-java-list-benchmarks/

Re: Questions about synchronization in compaction pipeline

Posted by Anastasia Braginsky <an...@oath.com.INVALID>.

 Hi Mike,
Thanks for being interested in the CompactionPipeline implementation. It is pleasure to discuss it with you.
Regarding that we are implementing our own copy-on-write (COW) list. May be it is close, but in classic COW, everybody is sharing the same read-only copy and when someone tries to write on this copy it gets its own/personal copy updated according to this write. This is not what happens in the pipeline. In pipeline we let everyone read the same read-only copy, because read accesses are more frequent. When rare update to the pipeline happens, it is synchronized on the pipeline itself (writable) and the the read-only copy is updated (quickly). So all this is done for a faster synchronization. Anyway I am not aware of some from-the-shelf Java list, giving me the same synchronization as I want. Please update me if I am wrong.
Regarding "I am concerned about the LL copy in pushHead - even if addFirst is faster, a LL copy is fairly slow and likely loses us any gains". As you can see, recreation of the read-only-copy happens anytime the background pipeline changes (addFirst, swap, replaceAtIndex), which are rare operations happening on snapshot, compaction, flattening, respectively. The copy of the segment after all is the copy of the references without copying the entire data itself. We had previous type of synchronization before (without read-only-copy) and it was slower. So if you believe, read-only-copy creation is a key for some performance problem, please give provide any measurements.
Regarding "Also, I'm a little dubious on the use of LL given that we support a replaceAtIndex which will be much faster in an array". Generally I agree that change the implementation of "readOnlyCopy" from LinkedList to ArrayList, might be beneficial here. Specially for the replaceAtIndex case. I don't see how ArrayDeque helps us.
Thanks,Anastasia
    On Sunday, March 11, 2018, 8:06:05 AM GMT+2, 张铎(Duo Zhang) <pa...@gmail.com> wrote:  
 
 I believe the comments there are mainly about concurrency problem, not for
linked list vs. array list, at least for me...

2018-03-11 4:12 GMT+08:00 Mike Drob <ma...@cloudera.com>:

> Hi devs,
>
> I was reading through HBASE-17434 trying to understand why we have two
> linked lists in compaction pipeline and I'm having trouble following the
> conversation there, especially since it seems intertwined with HBASE-17379
> and jumps back and forth a few times.
>
> It looks like we are implementing our own copy-on-write list, and there is
> a claim that addFirst is faster on a LinkedList than an array based list. I
> am concerned about the LL copy in pushHead - even if addFirst is faster, a
> LL copy is fairly slow and likely loses us any gains. Also, I'm a little
> dubious on the use of LL given that we support a replaceAtIndex which will
> be much faster in an array.
>
> Can we improve by using an ArrayDeque?
>
> Eschar, Anastasia, WDYT?
>
> Thanks,
> Mike
>
> Some observations about performance -
> https://stuartmarks.wordpress.com/2015/12/18/some-java-list-benchmarks/
>

Re: Questions about synchronization in compaction pipeline

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.

I believe the comments there are mainly about concurrency problem, not for
linked list vs. array list, at least for me...

2018-03-11 4:12 GMT+08:00 Mike Drob <ma...@cloudera.com>:

> Hi devs,
>
> I was reading through HBASE-17434 trying to understand why we have two
> linked lists in compaction pipeline and I'm having trouble following the
> conversation there, especially since it seems intertwined with HBASE-17379
> and jumps back and forth a few times.
>
> It looks like we are implementing our own copy-on-write list, and there is
> a claim that addFirst is faster on a LinkedList than an array based list. I
> am concerned about the LL copy in pushHead - even if addFirst is faster, a
> LL copy is fairly slow and likely loses us any gains. Also, I'm a little
> dubious on the use of LL given that we support a replaceAtIndex which will
> be much faster in an array.
>
> Can we improve by using an ArrayDeque?
>
> Eschar, Anastasia, WDYT?
>
> Thanks,
> Mike
>
> Some observations about performance -
> https://stuartmarks.wordpress.com/2015/12/18/some-java-list-benchmarks/
>

Re: Questions about synchronization in compaction pipeline

Posted by Eshcar Hillel <es...@oath.com.INVALID>.

 I agree with Chia-Ping Tsai, read-only copy can be changed from LL to array list plus replacing getLast with get(size())However I am dubious about any of the suggested changes having any affect on the system performance.Would be very interesting to see the workload that can differentiate between these two implementations.
    On Sunday, March 11, 2018, 4:33:05 PM GMT+2, Chia-Ping Tsai <ch...@apache.org> wrote:  
 
 > LL copy is fairly slow and likely loses us any gains. Also, I'm a little
> dubious on the use of LL given that we support a replaceAtIndex which will
> be much faster in an array.
> 
> Can we improve by using an ArrayDeque?
Does ArrayDeque support to change element by arbitrary index? If not, it not easy to change the impl by one line since the replaceAtIndex() need to change the element in pipeline.

BTW, could we change the impl of "readOnlyCopy" from LinkedList to ArrayList? Most ops to  readOnlyCopy are iteration, and the getLast can be replaced by size() and get(index). 

On 2018/03/10 20:12:01, Mike Drob <ma...@cloudera.com> wrote: 
> Hi devs,
> 
> I was reading through HBASE-17434 trying to understand why we have two
> linked lists in compaction pipeline and I'm having trouble following the
> conversation there, especially since it seems intertwined with HBASE-17379
> and jumps back and forth a few times.
> 
> It looks like we are implementing our own copy-on-write list, and there is
> a claim that addFirst is faster on a LinkedList than an array based list. I
> am concerned about the LL copy in pushHead - even if addFirst is faster, a
> LL copy is fairly slow and likely loses us any gains. Also, I'm a little
> dubious on the use of LL given that we support a replaceAtIndex which will
> be much faster in an array.
> 
> Can we improve by using an ArrayDeque?
> 
> Eschar, Anastasia, WDYT?
> 
> Thanks,
> Mike
> 
> Some observations about performance -
> https://stuartmarks.wordpress.com/2015/12/18/some-java-list-benchmarks/
>

Re: Questions about synchronization in compaction pipeline

Posted by Ted Yu <yu...@gmail.com>.

Replacing with ArrayDeque wouldn't be straight forward since it implements
Deque interface where there is no method for accessing given index.

ArrayDeque provides remove by Object:

    public boolean remove(Object o) {

        return removeFirstOccurrence(o);

After the removal, add() or offer() only inserts at the end of the deque.

FYI

On Sun, Mar 11, 2018 at 7:33 AM, Chia-Ping Tsai <ch...@apache.org> wrote:

> > LL copy is fairly slow and likely loses us any gains. Also, I'm a little
> > dubious on the use of LL given that we support a replaceAtIndex which
> will
> > be much faster in an array.
> >
> > Can we improve by using an ArrayDeque?
> Does ArrayDeque support to change element by arbitrary index? If not, it
> not easy to change the impl by one line since the replaceAtIndex() need to
> change the element in pipeline.
>
> BTW, could we change the impl of "readOnlyCopy" from LinkedList to
> ArrayList? Most ops to  readOnlyCopy are iteration, and the getLast can be
> replaced by size() and get(index).
>
> On 2018/03/10 20:12:01, Mike Drob <ma...@cloudera.com> wrote:
> > Hi devs,
> >
> > I was reading through HBASE-17434 trying to understand why we have two
> > linked lists in compaction pipeline and I'm having trouble following the
> > conversation there, especially since it seems intertwined with
> HBASE-17379
> > and jumps back and forth a few times.
> >
> > It looks like we are implementing our own copy-on-write list, and there
> is
> > a claim that addFirst is faster on a LinkedList than an array based
> list. I
> > am concerned about the LL copy in pushHead - even if addFirst is faster,
> a
> > LL copy is fairly slow and likely loses us any gains. Also, I'm a little
> > dubious on the use of LL given that we support a replaceAtIndex which
> will
> > be much faster in an array.
> >
> > Can we improve by using an ArrayDeque?
> >
> > Eschar, Anastasia, WDYT?
> >
> > Thanks,
> > Mike
> >
> > Some observations about performance -
> > https://stuartmarks.wordpress.com/2015/12/18/some-java-list-benchmarks/
> >
>

Re: Questions about synchronization in compaction pipeline

Posted by Chia-Ping Tsai <ch...@apache.org>.

> LL copy is fairly slow and likely loses us any gains. Also, I'm a little
> dubious on the use of LL given that we support a replaceAtIndex which will
> be much faster in an array.
> 
> Can we improve by using an ArrayDeque?
Does ArrayDeque support to change element by arbitrary index? If not, it not easy to change the impl by one line since the replaceAtIndex() need to change the element in pipeline.

BTW, could we change the impl of "readOnlyCopy" from LinkedList to ArrayList? Most ops to  readOnlyCopy are iteration, and the getLast can be replaced by size() and get(index). 

On 2018/03/10 20:12:01, Mike Drob <ma...@cloudera.com> wrote: 
> Hi devs,
> 
> I was reading through HBASE-17434 trying to understand why we have two
> linked lists in compaction pipeline and I'm having trouble following the
> conversation there, especially since it seems intertwined with HBASE-17379
> and jumps back and forth a few times.
> 
> It looks like we are implementing our own copy-on-write list, and there is
> a claim that addFirst is faster on a LinkedList than an array based list. I
> am concerned about the LL copy in pushHead - even if addFirst is faster, a
> LL copy is fairly slow and likely loses us any gains. Also, I'm a little
> dubious on the use of LL given that we support a replaceAtIndex which will
> be much faster in an array.
> 
> Can we improve by using an ArrayDeque?
> 
> Eschar, Anastasia, WDYT?
> 
> Thanks,
> Mike
> 
> Some observations about performance -
> https://stuartmarks.wordpress.com/2015/12/18/some-java-list-benchmarks/
>