You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Jay Kreps <ja...@gmail.com> on 2015/02/04 06:15:44 UTC

kafka.utils.DoublyLinkedList

How come we added this? Is it better than the linked list in java and
scala? Also it was added with only one test which is actually put inside
UtilsTest.scala which is meant to house tests for Utils.scala.

-Jay

Re: kafka.utils.DoublyLinkedList

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.
Hi Jay,

The approach you suggested is exactly what we did at beginning. But Jun
mentioned a case in RB that could cause problem.

"Suppose there are 2 target partitions. Offset 1,2,4,5 can be routed to
partition 1 and offset 3 can be routed to partition 2. When all messages
are sent, you will see partition 1 with offset 5 and partition 2 with
offset 3. Since offset 3 and 5 are not consecutive, are you just going to
commit offset 3? If so, you may be stuck in offset 3 forever if no new
messages  are sent to partition 2."


I’ll update the Mirror Maker document later today.

BTW, actually even remove(ind index) in Java LinkedList still takes O(n)
to remove a node.

public E remove(int index) {
  checkElementIndex(index);
  return unlink(node(index));
    }

Node<E> node(int index) {
  // assert isElementIndex(index);

  if (index < (size >> 1)) {
    Node<E> x = first;
    for (int i = 0; i < index; i++)
      x = x.next;
    return x;
  } else {
    Node<E> x = last;
    for (int i = size - 1; i > index; i--)
      x = x.prev;
    return x;
  }
    }




Thanks.

Jiangjie (Becket) Qin


On 2/4/15, 9:00 AM, "Jay Kreps" <ja...@gmail.com> wrote:

>Yeah I think the Java LL is worse than that, you actually need to point at
>it with an iterator and call remove(). I didn't realize the scala ll was
>deprecated. Okay, makes sense. Would be good to move that test out of
>UtilsTest.scala, though.
>
>But actually, stepping back, can you guys document/explain what all that
>mm
>code is doing now? It is pretty tricky which makes me worry. Are we
>keeping
>track of each individual message until it is acknowledged in that list?
>Can't we just keep the largest input/output offset pair and commit that
>periodically and on rebalance?
>
>-Jay
>
>On Wed, Feb 4, 2015 at 8:39 AM, Joel Koshy <jj...@gmail.com> wrote:
>
>> Jay,
>>
>> My understanding is that java linked lists (even though they are backed
>>by
>> a doubly linked list) support O(1) deletion only if you have the index
>>of
>> the object. If you do a remove(object) it becomes O(n). As for scala
>>lists
>> - the scaladocs seem to indicate that the doubly linked lists are
>> deprecated as of 2.11
>>
>> Joel
>>
>> On Wednesday, February 4, 2015, Jay Kreps <ja...@gmail.com> wrote:
>>
>> > Hey Jiangjie,
>> >
>> > Both the scala and java doubly linked lists allow O(1) deletion--it
>>isn't
>> > obvious because the api is slightly different from the CS101 linked
>>list
>> > but they do. Are you sure those wouldn't work? Can we remove this
>>class
>> and
>> > use those?
>> >
>> > Also the convention for unit tests is if we have a class Foo.scala
>>then
>> we
>> > would have a test FooTest.scala. So you can definitely write a unit
>>test
>> > for classes used by MM without adding it to kafka.utils. The
>> > UtilsTest.scala is really for Utils.scala we shouldn't be adding other
>> > tests there.
>> >
>> > -Jay
>> >
>> > On Tue, Feb 3, 2015 at 11:46 PM, Jiangjie Qin
>><jqin@linkedin.com.invalid
>> >
>> > wrote:
>> >
>> > > We added this because in redesigned mirror maker we need a raw
>>linked
>> > list
>> > > so we can removed an acked offset from a linked list in O(1). Java
>>and
>> > > Scala linked list won¹t work in that case. It was initially put in
>>side
>> > > mirror maker as a private class, but later on we also want to have
>>unit
>> > > test for it, that¹s why it becomes a separate util class and we have
>> unit
>> > > test in UtilsTest.scala.
>> > >
>> > > -Jiangjie (Becket) Qin
>> > >
>> > > On 2/3/15, 9:15 PM, "Jay Kreps" <jay.kreps@gmail.com <javascript:;>>
>> > wrote:
>> > >
>> > > >How come we added this? Is it better than the linked list in java
>>and
>> > > >scala? Also it was added with only one test which is actually put
>> inside
>> > > >UtilsTest.scala which is meant to house tests for Utils.scala.
>> > > >
>> > > >-Jay
>> > >
>> > >
>> >
>>
>>
>> --
>> Sent from Gmail Mobile
>>


Re: kafka.utils.DoublyLinkedList

Posted by Jay Kreps <ja...@gmail.com>.
Yeah I think the Java LL is worse than that, you actually need to point at
it with an iterator and call remove(). I didn't realize the scala ll was
deprecated. Okay, makes sense. Would be good to move that test out of
UtilsTest.scala, though.

But actually, stepping back, can you guys document/explain what all that mm
code is doing now? It is pretty tricky which makes me worry. Are we keeping
track of each individual message until it is acknowledged in that list?
Can't we just keep the largest input/output offset pair and commit that
periodically and on rebalance?

-Jay

On Wed, Feb 4, 2015 at 8:39 AM, Joel Koshy <jj...@gmail.com> wrote:

> Jay,
>
> My understanding is that java linked lists (even though they are backed by
> a doubly linked list) support O(1) deletion only if you have the index of
> the object. If you do a remove(object) it becomes O(n). As for scala lists
> - the scaladocs seem to indicate that the doubly linked lists are
> deprecated as of 2.11
>
> Joel
>
> On Wednesday, February 4, 2015, Jay Kreps <ja...@gmail.com> wrote:
>
> > Hey Jiangjie,
> >
> > Both the scala and java doubly linked lists allow O(1) deletion--it isn't
> > obvious because the api is slightly different from the CS101 linked list
> > but they do. Are you sure those wouldn't work? Can we remove this class
> and
> > use those?
> >
> > Also the convention for unit tests is if we have a class Foo.scala then
> we
> > would have a test FooTest.scala. So you can definitely write a unit test
> > for classes used by MM without adding it to kafka.utils. The
> > UtilsTest.scala is really for Utils.scala we shouldn't be adding other
> > tests there.
> >
> > -Jay
> >
> > On Tue, Feb 3, 2015 at 11:46 PM, Jiangjie Qin <jqin@linkedin.com.invalid
> >
> > wrote:
> >
> > > We added this because in redesigned mirror maker we need a raw linked
> > list
> > > so we can removed an acked offset from a linked list in O(1). Java and
> > > Scala linked list won¹t work in that case. It was initially put in side
> > > mirror maker as a private class, but later on we also want to have unit
> > > test for it, that¹s why it becomes a separate util class and we have
> unit
> > > test in UtilsTest.scala.
> > >
> > > -Jiangjie (Becket) Qin
> > >
> > > On 2/3/15, 9:15 PM, "Jay Kreps" <jay.kreps@gmail.com <javascript:;>>
> > wrote:
> > >
> > > >How come we added this? Is it better than the linked list in java and
> > > >scala? Also it was added with only one test which is actually put
> inside
> > > >UtilsTest.scala which is meant to house tests for Utils.scala.
> > > >
> > > >-Jay
> > >
> > >
> >
>
>
> --
> Sent from Gmail Mobile
>

Re: kafka.utils.DoublyLinkedList

Posted by Joel Koshy <jj...@gmail.com>.
Jay,

My understanding is that java linked lists (even though they are backed by
a doubly linked list) support O(1) deletion only if you have the index of
the object. If you do a remove(object) it becomes O(n). As for scala lists
- the scaladocs seem to indicate that the doubly linked lists are
deprecated as of 2.11

Joel

On Wednesday, February 4, 2015, Jay Kreps <ja...@gmail.com> wrote:

> Hey Jiangjie,
>
> Both the scala and java doubly linked lists allow O(1) deletion--it isn't
> obvious because the api is slightly different from the CS101 linked list
> but they do. Are you sure those wouldn't work? Can we remove this class and
> use those?
>
> Also the convention for unit tests is if we have a class Foo.scala then we
> would have a test FooTest.scala. So you can definitely write a unit test
> for classes used by MM without adding it to kafka.utils. The
> UtilsTest.scala is really for Utils.scala we shouldn't be adding other
> tests there.
>
> -Jay
>
> On Tue, Feb 3, 2015 at 11:46 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
> wrote:
>
> > We added this because in redesigned mirror maker we need a raw linked
> list
> > so we can removed an acked offset from a linked list in O(1). Java and
> > Scala linked list won¹t work in that case. It was initially put in side
> > mirror maker as a private class, but later on we also want to have unit
> > test for it, that¹s why it becomes a separate util class and we have unit
> > test in UtilsTest.scala.
> >
> > -Jiangjie (Becket) Qin
> >
> > On 2/3/15, 9:15 PM, "Jay Kreps" <jay.kreps@gmail.com <javascript:;>>
> wrote:
> >
> > >How come we added this? Is it better than the linked list in java and
> > >scala? Also it was added with only one test which is actually put inside
> > >UtilsTest.scala which is meant to house tests for Utils.scala.
> > >
> > >-Jay
> >
> >
>


-- 
Sent from Gmail Mobile

Re: kafka.utils.DoublyLinkedList

Posted by Jay Kreps <ja...@gmail.com>.
Hey Jiangjie,

Both the scala and java doubly linked lists allow O(1) deletion--it isn't
obvious because the api is slightly different from the CS101 linked list
but they do. Are you sure those wouldn't work? Can we remove this class and
use those?

Also the convention for unit tests is if we have a class Foo.scala then we
would have a test FooTest.scala. So you can definitely write a unit test
for classes used by MM without adding it to kafka.utils. The
UtilsTest.scala is really for Utils.scala we shouldn't be adding other
tests there.

-Jay

On Tue, Feb 3, 2015 at 11:46 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
wrote:

> We added this because in redesigned mirror maker we need a raw linked list
> so we can removed an acked offset from a linked list in O(1). Java and
> Scala linked list won¹t work in that case. It was initially put in side
> mirror maker as a private class, but later on we also want to have unit
> test for it, that¹s why it becomes a separate util class and we have unit
> test in UtilsTest.scala.
>
> -Jiangjie (Becket) Qin
>
> On 2/3/15, 9:15 PM, "Jay Kreps" <ja...@gmail.com> wrote:
>
> >How come we added this? Is it better than the linked list in java and
> >scala? Also it was added with only one test which is actually put inside
> >UtilsTest.scala which is meant to house tests for Utils.scala.
> >
> >-Jay
>
>

Re: kafka.utils.DoublyLinkedList

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.
We added this because in redesigned mirror maker we need a raw linked list
so we can removed an acked offset from a linked list in O(1). Java and
Scala linked list won¹t work in that case. It was initially put in side
mirror maker as a private class, but later on we also want to have unit
test for it, that¹s why it becomes a separate util class and we have unit
test in UtilsTest.scala.

-Jiangjie (Becket) Qin

On 2/3/15, 9:15 PM, "Jay Kreps" <ja...@gmail.com> wrote:

>How come we added this? Is it better than the linked list in java and
>scala? Also it was added with only one test which is actually put inside
>UtilsTest.scala which is meant to house tests for Utils.scala.
>
>-Jay