You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@datafu.apache.org by Ido Hadanny <id...@gmail.com> on 2015/04/27 15:26:35 UTC

Re: why is data.fu implementing HyperLogLog as an accumulator and not as algebraic?

Hey guys,
patch is attached + tested on unit-tests + We're testing it on a 1000-nodes
real hadoop cluster as we speak.
Do you want us to create a jira issue for this, or is this good enough?
Thanks, Ilia and Ido

On 7 March 2015 at 23:09, Matthew Hayes <ma...@gmail.com>
wrote:

> I don't remember if there was a particular reason I didn't implement this
> as AlgebraicEvalFunc. It seems like it could be. I believe the Java
> MapReduce version leverages the combiner. If you want to try making this
> Algebraic we would be happy to accept a patch :)
>
> -Matt
>
> > On Mar 7, 2015, at 12:11 PM, Ido Hadanny <id...@gmail.com> wrote:
> >
> > data.fu has a nice implementation of HyperLogLog for estimating
> cardinality
> > here
> > <
> https://github.com/apache/incubator-datafu/blob/master/datafu-pig/src/main/java/datafu/pig/stats/HyperLogLogPlusPlus.java
> >
> >
> > However, it's implemented as Accumulator which means it will run only at
> > the reducer and not in the combiner (but it will never load the entire
> set
> > into memory as in normal EvalFunc). Why couldn't data.fu implement it as
> > Algebraic - and fill the registers at every combiner, then merge and
> reduce
> > the result? Am I missing something here?
> > also available here:
> >
> http://stackoverflow.com/questions/28908217/why-is-data-fu-implementing-hyperloglog-as-an-accumulator-and-not-as-algebraic
> >
> > thanks!
> >
> >
> > --
> > Sent from my androido
>



-- 
Sent from my androido

Re: why is data.fu implementing HyperLogLog as an accumulator and not as algebraic?

Posted by Matthew Hayes <ma...@gmail.com>.
Hey sorry for the delay.  I took a look at the diff and replied with some
comments in the JIRA.  Please take a look, thanks.

-Matt

On Sat, May 9, 2015 at 10:53 PM, Ido Hadanny <id...@gmail.com> wrote:

> Hey, I see that this is still in open and un-assigned - can you assign it
> to me so I can mark it as "patch available"? or do you want me just to mark
> it as "fixed"?
>
> On 28 April 2015 at 08:37, Ido Hadanny <id...@gmail.com> wrote:
>
>> https://issues.apache.org/jira/browse/DATAFU-91
>>
>>
>> On 27 April 2015 at 18:02, Matthew Hayes <matthew.terence.hayes@gmail.com
>> > wrote:
>>
>>> Great thanks :) Please file a JIRA and attach the patch there.
>>>
>>> -Matt
>>>
>>> On Apr 27, 2015, at 6:26 AM, Ido Hadanny <id...@gmail.com> wrote:
>>>
>>> Hey guys,
>>> patch is attached + tested on unit-tests + We're testing it on a
>>> 1000-nodes real hadoop cluster as we speak.
>>> Do you want us to create a jira issue for this, or is this good enough?
>>> Thanks, Ilia and Ido
>>>
>>> On 7 March 2015 at 23:09, Matthew Hayes <matthew.terence.hayes@gmail.com
>>> > wrote:
>>>
>>>> I don't remember if there was a particular reason I didn't implement
>>>> this as AlgebraicEvalFunc. It seems like it could be. I believe the Java
>>>> MapReduce version leverages the combiner. If you want to try making this
>>>> Algebraic we would be happy to accept a patch :)
>>>>
>>>> -Matt
>>>>
>>>> > On Mar 7, 2015, at 12:11 PM, Ido Hadanny <id...@gmail.com>
>>>> wrote:
>>>> >
>>>> > data.fu has a nice implementation of HyperLogLog for estimating
>>>> cardinality
>>>> > here
>>>> > <
>>>> https://github.com/apache/incubator-datafu/blob/master/datafu-pig/src/main/java/datafu/pig/stats/HyperLogLogPlusPlus.java
>>>> >
>>>> >
>>>> > However, it's implemented as Accumulator which means it will run only
>>>> at
>>>> > the reducer and not in the combiner (but it will never load the
>>>> entire set
>>>> > into memory as in normal EvalFunc). Why couldn't data.fu implement it
>>>> as
>>>> > Algebraic - and fill the registers at every combiner, then merge and
>>>> reduce
>>>> > the result? Am I missing something here?
>>>> > also available here:
>>>> >
>>>> http://stackoverflow.com/questions/28908217/why-is-data-fu-implementing-hyperloglog-as-an-accumulator-and-not-as-algebraic
>>>> >
>>>> > thanks!
>>>> >
>>>> >
>>>> > --
>>>> > Sent from my androido
>>>>
>>>
>>>
>>>
>>> --
>>> Sent from my androido
>>>
>>> <hyper-log-log-algebraic.diff>
>>>
>>>
>>
>>
>> --
>> Sent from my androido
>>
>
>
>
> --
> Sent from my androido
>

Re: why is data.fu implementing HyperLogLog as an accumulator and not as algebraic?

Posted by Ido Hadanny <id...@gmail.com>.
Hey, I see that this is still in open and un-assigned - can you assign it
to me so I can mark it as "patch available"? or do you want me just to mark
it as "fixed"?

On 28 April 2015 at 08:37, Ido Hadanny <id...@gmail.com> wrote:

> https://issues.apache.org/jira/browse/DATAFU-91
>
>
> On 27 April 2015 at 18:02, Matthew Hayes <ma...@gmail.com>
> wrote:
>
>> Great thanks :) Please file a JIRA and attach the patch there.
>>
>> -Matt
>>
>> On Apr 27, 2015, at 6:26 AM, Ido Hadanny <id...@gmail.com> wrote:
>>
>> Hey guys,
>> patch is attached + tested on unit-tests + We're testing it on a
>> 1000-nodes real hadoop cluster as we speak.
>> Do you want us to create a jira issue for this, or is this good enough?
>> Thanks, Ilia and Ido
>>
>> On 7 March 2015 at 23:09, Matthew Hayes <ma...@gmail.com>
>> wrote:
>>
>>> I don't remember if there was a particular reason I didn't implement
>>> this as AlgebraicEvalFunc. It seems like it could be. I believe the Java
>>> MapReduce version leverages the combiner. If you want to try making this
>>> Algebraic we would be happy to accept a patch :)
>>>
>>> -Matt
>>>
>>> > On Mar 7, 2015, at 12:11 PM, Ido Hadanny <id...@gmail.com>
>>> wrote:
>>> >
>>> > data.fu has a nice implementation of HyperLogLog for estimating
>>> cardinality
>>> > here
>>> > <
>>> https://github.com/apache/incubator-datafu/blob/master/datafu-pig/src/main/java/datafu/pig/stats/HyperLogLogPlusPlus.java
>>> >
>>> >
>>> > However, it's implemented as Accumulator which means it will run only
>>> at
>>> > the reducer and not in the combiner (but it will never load the entire
>>> set
>>> > into memory as in normal EvalFunc). Why couldn't data.fu implement it
>>> as
>>> > Algebraic - and fill the registers at every combiner, then merge and
>>> reduce
>>> > the result? Am I missing something here?
>>> > also available here:
>>> >
>>> http://stackoverflow.com/questions/28908217/why-is-data-fu-implementing-hyperloglog-as-an-accumulator-and-not-as-algebraic
>>> >
>>> > thanks!
>>> >
>>> >
>>> > --
>>> > Sent from my androido
>>>
>>
>>
>>
>> --
>> Sent from my androido
>>
>> <hyper-log-log-algebraic.diff>
>>
>>
>
>
> --
> Sent from my androido
>



-- 
Sent from my androido

Re: why is data.fu implementing HyperLogLog as an accumulator and not as algebraic?

Posted by Ido Hadanny <id...@gmail.com>.
https://issues.apache.org/jira/browse/DATAFU-91


On 27 April 2015 at 18:02, Matthew Hayes <ma...@gmail.com>
wrote:

> Great thanks :) Please file a JIRA and attach the patch there.
>
> -Matt
>
> On Apr 27, 2015, at 6:26 AM, Ido Hadanny <id...@gmail.com> wrote:
>
> Hey guys,
> patch is attached + tested on unit-tests + We're testing it on a
> 1000-nodes real hadoop cluster as we speak.
> Do you want us to create a jira issue for this, or is this good enough?
> Thanks, Ilia and Ido
>
> On 7 March 2015 at 23:09, Matthew Hayes <ma...@gmail.com>
> wrote:
>
>> I don't remember if there was a particular reason I didn't implement this
>> as AlgebraicEvalFunc. It seems like it could be. I believe the Java
>> MapReduce version leverages the combiner. If you want to try making this
>> Algebraic we would be happy to accept a patch :)
>>
>> -Matt
>>
>> > On Mar 7, 2015, at 12:11 PM, Ido Hadanny <id...@gmail.com> wrote:
>> >
>> > data.fu has a nice implementation of HyperLogLog for estimating
>> cardinality
>> > here
>> > <
>> https://github.com/apache/incubator-datafu/blob/master/datafu-pig/src/main/java/datafu/pig/stats/HyperLogLogPlusPlus.java
>> >
>> >
>> > However, it's implemented as Accumulator which means it will run only at
>> > the reducer and not in the combiner (but it will never load the entire
>> set
>> > into memory as in normal EvalFunc). Why couldn't data.fu implement it as
>> > Algebraic - and fill the registers at every combiner, then merge and
>> reduce
>> > the result? Am I missing something here?
>> > also available here:
>> >
>> http://stackoverflow.com/questions/28908217/why-is-data-fu-implementing-hyperloglog-as-an-accumulator-and-not-as-algebraic
>> >
>> > thanks!
>> >
>> >
>> > --
>> > Sent from my androido
>>
>
>
>
> --
> Sent from my androido
>
> <hyper-log-log-algebraic.diff>
>
>


-- 
Sent from my androido

Re: why is data.fu implementing HyperLogLog as an accumulator and not as algebraic?

Posted by Matthew Hayes <ma...@gmail.com>.
Great thanks :) Please file a JIRA and attach the patch there.

-Matt

> On Apr 27, 2015, at 6:26 AM, Ido Hadanny <id...@gmail.com> wrote:
> 
> Hey guys, 
> patch is attached + tested on unit-tests + We're testing it on a 1000-nodes real hadoop cluster as we speak.  
> Do you want us to create a jira issue for this, or is this good enough?
> Thanks, Ilia and Ido
> 
>> On 7 March 2015 at 23:09, Matthew Hayes <ma...@gmail.com> wrote:
>> I don't remember if there was a particular reason I didn't implement this as AlgebraicEvalFunc. It seems like it could be. I believe the Java MapReduce version leverages the combiner. If you want to try making this Algebraic we would be happy to accept a patch :)
>> 
>> -Matt
>> 
>> > On Mar 7, 2015, at 12:11 PM, Ido Hadanny <id...@gmail.com> wrote:
>> >
>> > data.fu has a nice implementation of HyperLogLog for estimating cardinality
>> > here
>> > <https://github.com/apache/incubator-datafu/blob/master/datafu-pig/src/main/java/datafu/pig/stats/HyperLogLogPlusPlus.java>
>> >
>> > However, it's implemented as Accumulator which means it will run only at
>> > the reducer and not in the combiner (but it will never load the entire set
>> > into memory as in normal EvalFunc). Why couldn't data.fu implement it as
>> > Algebraic - and fill the registers at every combiner, then merge and reduce
>> > the result? Am I missing something here?
>> > also available here:
>> > http://stackoverflow.com/questions/28908217/why-is-data-fu-implementing-hyperloglog-as-an-accumulator-and-not-as-algebraic
>> >
>> > thanks!
>> >
>> >
>> > --
>> > Sent from my androido
> 
> 
> 
> -- 
> Sent from my androido
> <hyper-log-log-algebraic.diff>