You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@datafu.apache.org by Eyal Allweil <ey...@yahoo.com> on 2016/04/27 09:44:43 UTC

Review Request 46701: DATAFU-117 - New UDF - CountDistinctUpTo

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46701/
-----------------------------------------------------------

Review request for DataFu.


Repository: datafu


Description
-------

DATAFU-117 - New UDF - CountDistinctUpTo


Diffs
-----

  datafu-pig/src/main/java/datafu/pig/bags/CountDistinctUpTo.java PRE-CREATION 
  datafu-pig/src/test/java/datafu/test/pig/bags/BagTests.java 28292db0c01a1967ea53d9cc3d316e9906d942a8 

Diff: https://reviews.apache.org/r/46701/diff/


Testing
-------


Thanks,

Eyal Allweil


Re: Review Request 46701: DATAFU-117 - New UDF - CountDistinctUpTo

Posted by Matthew Hayes <ma...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46701/#review139809
-----------------------------------------------------------


Ship it!




Ship It!

- Matthew Hayes


On June 8, 2016, 1:49 p.m., Eyal Allweil wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46701/
> -----------------------------------------------------------
> 
> (Updated June 8, 2016, 1:49 p.m.)
> 
> 
> Review request for DataFu.
> 
> 
> Repository: datafu
> 
> 
> Description
> -------
> 
> DATAFU-117 - New UDF - CountDistinctUpTo
> 
> 
> Diffs
> -----
> 
>   datafu-pig/src/main/java/datafu/pig/bags/CountDistinctUpTo.java PRE-CREATION 
>   datafu-pig/src/test/java/datafu/test/pig/bags/BagTests.java 28292db 
> 
> Diff: https://reviews.apache.org/r/46701/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Eyal Allweil
> 
>


Re: Review Request 46701: DATAFU-117 - New UDF - CountDistinctUpTo

Posted by Eyal Allweil <ey...@yahoo.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46701/
-----------------------------------------------------------

(Updated June 8, 2016, 1:49 p.m.)


Review request for DataFu.


Changes
-------

Incorporated last remaining comment in new diff


Repository: datafu


Description
-------

DATAFU-117 - New UDF - CountDistinctUpTo


Diffs (updated)
-----

  datafu-pig/src/main/java/datafu/pig/bags/CountDistinctUpTo.java PRE-CREATION 
  datafu-pig/src/test/java/datafu/test/pig/bags/BagTests.java 28292db 

Diff: https://reviews.apache.org/r/46701/diff/


Testing
-------


Thanks,

Eyal Allweil


Re: Review Request 46701: DATAFU-117 - New UDF - CountDistinctUpTo

Posted by Eyal Allweil <ey...@yahoo.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46701/
-----------------------------------------------------------

(Updated June 5, 2016, 9:21 a.m.)


Review request for DataFu.


Repository: datafu


Description
-------

DATAFU-117 - New UDF - CountDistinctUpTo


Diffs (updated)
-----

  datafu-pig/src/main/java/datafu/pig/bags/CountDistinctUpTo.java PRE-CREATION 
  datafu-pig/src/test/java/datafu/test/pig/bags/BagTests.java 28292db 

Diff: https://reviews.apache.org/r/46701/diff/


Testing
-------


Thanks,

Eyal Allweil


Re: Review Request 46701: DATAFU-117 - New UDF - CountDistinctUpTo

Posted by Matthew Hayes <ma...@gmail.com>.

> On May 18, 2016, 10:19 p.m., Matthew Hayes wrote:
> > datafu-pig/src/main/java/datafu/pig/bags/CountDistinctUpTo.java, line 147
> > <https://reviews.apache.org/r/46701/diff/1/?file=1361707#file1361707line147>
> >
> >     What about clearing the set so we don't have to garbage collect?
> 
> Eyal Allweil wrote:
>     I just reassigned it because the clear() method in HashSet uses Array.fill and I thought it would be more expensive than just letting it be garbage collected and making a new one.

I would think GCing the hashset would be more expensive than clearing.  I did a quick benchmark and it seems that clear is significantly faster for large and small hashsets.


- Matthew


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46701/#review133803
-----------------------------------------------------------


On April 27, 2016, 7:44 a.m., Eyal Allweil wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46701/
> -----------------------------------------------------------
> 
> (Updated April 27, 2016, 7:44 a.m.)
> 
> 
> Review request for DataFu.
> 
> 
> Repository: datafu
> 
> 
> Description
> -------
> 
> DATAFU-117 - New UDF - CountDistinctUpTo
> 
> 
> Diffs
> -----
> 
>   datafu-pig/src/main/java/datafu/pig/bags/CountDistinctUpTo.java PRE-CREATION 
>   datafu-pig/src/test/java/datafu/test/pig/bags/BagTests.java 28292db0c01a1967ea53d9cc3d316e9906d942a8 
> 
> Diff: https://reviews.apache.org/r/46701/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Eyal Allweil
> 
>


Re: Review Request 46701: DATAFU-117 - New UDF - CountDistinctUpTo

Posted by Eyal Allweil <ey...@yahoo.com>.

> On May 18, 2016, 10:19 p.m., Matthew Hayes wrote:
> > datafu-pig/src/main/java/datafu/pig/bags/CountDistinctUpTo.java, line 147
> > <https://reviews.apache.org/r/46701/diff/1/?file=1361707#file1361707line147>
> >
> >     What about clearing the set so we don't have to garbage collect?

I just reassigned it because the clear() method in HashSet uses Array.fill and I thought it would be more expensive than just letting it be garbage collected and making a new one.


> On May 18, 2016, 10:19 p.m., Matthew Hayes wrote:
> > datafu-pig/src/test/java/datafu/test/pig/bags/BagTests.java, line 22
> > <https://reviews.apache.org/r/46701/diff/1/?file=1361708#file1361708line22>
> >
> >     We already have the testng assertEquals and this gives me a build error.  Can you confirm this command passes?
> >     
> >     ./gradlew :datafu-pig:test -Dtest.single=BagTests
> >     
> >     I'm getting an error with this.

I removed this line ... don't know why it worked for me (although I think it did)


- Eyal


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46701/#review133803
-----------------------------------------------------------


On April 27, 2016, 7:44 a.m., Eyal Allweil wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46701/
> -----------------------------------------------------------
> 
> (Updated April 27, 2016, 7:44 a.m.)
> 
> 
> Review request for DataFu.
> 
> 
> Repository: datafu
> 
> 
> Description
> -------
> 
> DATAFU-117 - New UDF - CountDistinctUpTo
> 
> 
> Diffs
> -----
> 
>   datafu-pig/src/main/java/datafu/pig/bags/CountDistinctUpTo.java PRE-CREATION 
>   datafu-pig/src/test/java/datafu/test/pig/bags/BagTests.java 28292db0c01a1967ea53d9cc3d316e9906d942a8 
> 
> Diff: https://reviews.apache.org/r/46701/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Eyal Allweil
> 
>


Re: Review Request 46701: DATAFU-117 - New UDF - CountDistinctUpTo

Posted by Matthew Hayes <ma...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46701/#review133803
-----------------------------------------------------------



Mostly looks good to me.  I've flagged a few issues that should be fixed.  Otherwise it looks good.  Tests passed after some code edits.


datafu-pig/src/main/java/datafu/pig/bags/CountDistinctUpTo.java (line 41)
<https://reviews.apache.org/r/46701/#comment198464>

    It would be good to add a comment advising when this UDF is useful compared to using the builtin DISTINCT and COUNT based on your comments in JIRA.



datafu-pig/src/main/java/datafu/pig/bags/CountDistinctUpTo.java (line 109)
<https://reviews.apache.org/r/46701/#comment198426>

    Don't think we should be suppressing exceptions like this because it means an error can lead to incorrect results without the user knowing.



datafu-pig/src/main/java/datafu/pig/bags/CountDistinctUpTo.java (line 147)
<https://reviews.apache.org/r/46701/#comment198429>

    What about clearing the set so we don't have to garbage collect?



datafu-pig/src/main/java/datafu/pig/bags/CountDistinctUpTo.java (line 152)
<https://reviews.apache.org/r/46701/#comment198462>

    nit: I don't think there's a case where set it null



datafu-pig/src/test/java/datafu/test/pig/bags/BagTests.java (line 22)
<https://reviews.apache.org/r/46701/#comment198463>

    We already have the testng assertEquals and this gives me a build error.  Can you confirm this command passes?
    
    ./gradlew :datafu-pig:test -Dtest.single=BagTests
    
    I'm getting an error with this.



datafu-pig/src/test/java/datafu/test/pig/bags/BagTests.java (line 1080)
<https://reviews.apache.org/r/46701/#comment198465>

    Why are you using the Assert.assertEquals instead of the testng assert?


- Matthew Hayes


On April 27, 2016, 7:44 a.m., Eyal Allweil wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46701/
> -----------------------------------------------------------
> 
> (Updated April 27, 2016, 7:44 a.m.)
> 
> 
> Review request for DataFu.
> 
> 
> Repository: datafu
> 
> 
> Description
> -------
> 
> DATAFU-117 - New UDF - CountDistinctUpTo
> 
> 
> Diffs
> -----
> 
>   datafu-pig/src/main/java/datafu/pig/bags/CountDistinctUpTo.java PRE-CREATION 
>   datafu-pig/src/test/java/datafu/test/pig/bags/BagTests.java 28292db0c01a1967ea53d9cc3d316e9906d942a8 
> 
> Diff: https://reviews.apache.org/r/46701/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Eyal Allweil
> 
>


Re: Review Request 46701: DATAFU-117 - New UDF - CountDistinctUpTo

Posted by Eyal Allweil <ey...@yahoo.com>.

> On May 20, 2016, 5:35 p.m., Matthew Hayes wrote:
> > I reviewed the updated patch on JIRA (RB was not updated).  The changes look good to me aside from the clear vs gc issue.  If you're good with going with clear over new then I can go ahead and make the change and commit (without you having to upload a new patch).

Sure, I'm fine with you making the change to clear.


- Eyal


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46701/#review134168
-----------------------------------------------------------


On April 27, 2016, 7:44 a.m., Eyal Allweil wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46701/
> -----------------------------------------------------------
> 
> (Updated April 27, 2016, 7:44 a.m.)
> 
> 
> Review request for DataFu.
> 
> 
> Repository: datafu
> 
> 
> Description
> -------
> 
> DATAFU-117 - New UDF - CountDistinctUpTo
> 
> 
> Diffs
> -----
> 
>   datafu-pig/src/main/java/datafu/pig/bags/CountDistinctUpTo.java PRE-CREATION 
>   datafu-pig/src/test/java/datafu/test/pig/bags/BagTests.java 28292db0c01a1967ea53d9cc3d316e9906d942a8 
> 
> Diff: https://reviews.apache.org/r/46701/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Eyal Allweil
> 
>


Re: Review Request 46701: DATAFU-117 - New UDF - CountDistinctUpTo

Posted by Matthew Hayes <ma...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46701/#review134168
-----------------------------------------------------------



I reviewed the updated patch on JIRA (RB was not updated).  The changes look good to me aside from the clear vs gc issue.  If you're good with going with clear over new then I can go ahead and make the change and commit (without you having to upload a new patch).

- Matthew Hayes


On April 27, 2016, 7:44 a.m., Eyal Allweil wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46701/
> -----------------------------------------------------------
> 
> (Updated April 27, 2016, 7:44 a.m.)
> 
> 
> Review request for DataFu.
> 
> 
> Repository: datafu
> 
> 
> Description
> -------
> 
> DATAFU-117 - New UDF - CountDistinctUpTo
> 
> 
> Diffs
> -----
> 
>   datafu-pig/src/main/java/datafu/pig/bags/CountDistinctUpTo.java PRE-CREATION 
>   datafu-pig/src/test/java/datafu/test/pig/bags/BagTests.java 28292db0c01a1967ea53d9cc3d316e9906d942a8 
> 
> Diff: https://reviews.apache.org/r/46701/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Eyal Allweil
> 
>