You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@datafu.apache.org by Matthew Hayes <ma...@gmail.com> on 2014/01/17 19:56:55 UTC

Review Request 17058: DATAFU-2 UDFs for entropy and weighted sampling algorithms

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17058/
-----------------------------------------------------------

Review request for DataFu.


Repository: datafu


Description
-------

This is Jian Wang's implementation of the entropy and weighted sampling algorithms in DATAFU-2.  I generated the diff by applying the patches in a branch and squashing into one commit.


Diffs
-----

  src/java/datafu/pig/sampling/ReservoirSample.java 403bfaf06402c8dac2a70c0133ee955c625cb06e 
  src/java/datafu/pig/sampling/ScoredSampleReservoir.java PRE-CREATION 
  src/java/datafu/pig/sampling/ScoredTuple.java 293cf3d6fcfa041287258b94a1f686a2a77ab4ce 
  src/java/datafu/pig/sampling/WeightedReservoirSample.java PRE-CREATION 
  src/java/datafu/pig/sampling/WeightedReservoirSampleWithExpJump.java PRE-CREATION 
  src/java/datafu/pig/stats/entropy/Entropy.java PRE-CREATION 
  src/java/datafu/pig/stats/entropy/EntropyUtil.java PRE-CREATION 
  src/java/datafu/pig/stats/entropy/stream/ChaoShenEntropyEstimator.java PRE-CREATION 
  src/java/datafu/pig/stats/entropy/stream/EmpiricalEntropyEstimator.java PRE-CREATION 
  src/java/datafu/pig/stats/entropy/stream/EntropyEstimator.java PRE-CREATION 
  src/java/datafu/pig/stats/entropy/stream/StreamingCondEntropy.java PRE-CREATION 
  src/java/datafu/pig/stats/entropy/stream/StreamingEntropy.java PRE-CREATION 
  test/pig/datafu/test/pig/sampling/WeightedReservoirSampleWithExpJumpTests.java PRE-CREATION 
  test/pig/datafu/test/pig/sampling/WeightedReservoirSamplingTests.java PRE-CREATION 
  test/pig/datafu/test/pig/stats/entropy/AbstractEntropyTests.java PRE-CREATION 
  test/pig/datafu/test/pig/stats/entropy/EntropyTests.java PRE-CREATION 
  test/pig/datafu/test/pig/stats/entropy/StreamingChaoShenEntropyTests.java PRE-CREATION 
  test/pig/datafu/test/pig/stats/entropy/StreamingEmpiricalCondEntropyTests.java PRE-CREATION 
  test/pig/datafu/test/pig/stats/entropy/StreamingEmpiricalEntropyTests.java PRE-CREATION 

Diff: https://reviews.apache.org/r/17058/diff/


Testing
-------


Thanks,

Matthew Hayes


Re: Review Request 17058: DATAFU-2 UDFs for entropy and weighted sampling algorithms

Posted by Matthew Hayes <ma...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17058/#review32202
-----------------------------------------------------------


Here are some initial comments on the new changes since the last time I reviewed it.


src/java/datafu/pig/sampling/WeightedReservoirSampleWithExpJump.java
<https://reviews.apache.org/r/17058/#comment60929>

    You also need to implement new versions of Intermediate and Final that set the reservoir to InverseWeightJumpSampleReservoir.  Currently you are using the default Reservoir that does not perform the exponential jumps.  Therefore the algebraic version doesn't behave as intended. I checked this with the debugger by stepping through the code.



src/java/datafu/pig/sampling/WeightedReservoirSampleWithExpJump.java
<https://reviews.apache.org/r/17058/#comment60918>

    minor comment: It appears that declaring 'r' is not needed.  Math.random() could be called on line 146 directly.



src/java/datafu/pig/stats/entropy/stream/ChaoShenEntropyEstimator.java
<https://reviews.apache.org/r/17058/#comment60916>

    I think everything in the datafu.pig.stats.entropy.stream package should be moved to datafu.pig.stats.entropy. 


- Matthew Hayes


On Jan. 17, 2014, 6:57 p.m., Matthew Hayes wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17058/
> -----------------------------------------------------------
> 
> (Updated Jan. 17, 2014, 6:57 p.m.)
> 
> 
> Review request for DataFu.
> 
> 
> Repository: datafu
> 
> 
> Description
> -------
> 
> This is Jian Wang's implementation of the entropy and weighted sampling algorithms in DATAFU-2.  I generated the diff by applying the patches in a branch and squashing into one commit.
> 
> 
> Diffs
> -----
> 
>   src/java/datafu/pig/sampling/ReservoirSample.java 403bfaf06402c8dac2a70c0133ee955c625cb06e 
>   src/java/datafu/pig/sampling/ScoredSampleReservoir.java PRE-CREATION 
>   src/java/datafu/pig/sampling/ScoredTuple.java 293cf3d6fcfa041287258b94a1f686a2a77ab4ce 
>   src/java/datafu/pig/sampling/WeightedReservoirSample.java PRE-CREATION 
>   src/java/datafu/pig/sampling/WeightedReservoirSampleWithExpJump.java PRE-CREATION 
>   src/java/datafu/pig/stats/entropy/Entropy.java PRE-CREATION 
>   src/java/datafu/pig/stats/entropy/EntropyUtil.java PRE-CREATION 
>   src/java/datafu/pig/stats/entropy/stream/ChaoShenEntropyEstimator.java PRE-CREATION 
>   src/java/datafu/pig/stats/entropy/stream/EmpiricalEntropyEstimator.java PRE-CREATION 
>   src/java/datafu/pig/stats/entropy/stream/EntropyEstimator.java PRE-CREATION 
>   src/java/datafu/pig/stats/entropy/stream/StreamingCondEntropy.java PRE-CREATION 
>   src/java/datafu/pig/stats/entropy/stream/StreamingEntropy.java PRE-CREATION 
>   test/pig/datafu/test/pig/sampling/WeightedReservoirSampleWithExpJumpTests.java PRE-CREATION 
>   test/pig/datafu/test/pig/sampling/WeightedReservoirSamplingTests.java PRE-CREATION 
>   test/pig/datafu/test/pig/stats/entropy/AbstractEntropyTests.java PRE-CREATION 
>   test/pig/datafu/test/pig/stats/entropy/EntropyTests.java PRE-CREATION 
>   test/pig/datafu/test/pig/stats/entropy/StreamingChaoShenEntropyTests.java PRE-CREATION 
>   test/pig/datafu/test/pig/stats/entropy/StreamingEmpiricalCondEntropyTests.java PRE-CREATION 
>   test/pig/datafu/test/pig/stats/entropy/StreamingEmpiricalEntropyTests.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/17058/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Matthew Hayes
> 
>


Re: Review Request 17058: DATAFU-2 UDFs for entropy and weighted sampling algorithms

Posted by Matthew Hayes <ma...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17058/#review32405
-----------------------------------------------------------

Ship it!


Ship It!

- Matthew Hayes


On Jan. 21, 2014, 4 a.m., Matthew Hayes wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17058/
> -----------------------------------------------------------
> 
> (Updated Jan. 21, 2014, 4 a.m.)
> 
> 
> Review request for DataFu.
> 
> 
> Repository: datafu
> 
> 
> Description
> -------
> 
> This is Jian Wang's implementation of the entropy and weighted sampling algorithms in DATAFU-2.  I generated the diff by applying the patches in a branch and squashing into one commit.
> 
> 
> Diffs
> -----
> 
>   src/java/datafu/pig/sampling/ReservoirSample.java 403bfaf06402c8dac2a70c0133ee955c625cb06e 
>   src/java/datafu/pig/sampling/ScoredTuple.java 293cf3d6fcfa041287258b94a1f686a2a77ab4ce 
>   src/java/datafu/pig/sampling/WeightedReservoirSample.java PRE-CREATION 
>   src/java/datafu/pig/stats/entropy/ChaoShenEntropyEstimator.java PRE-CREATION 
>   src/java/datafu/pig/stats/entropy/EmpiricalEntropyEstimator.java PRE-CREATION 
>   src/java/datafu/pig/stats/entropy/Entropy.java PRE-CREATION 
>   src/java/datafu/pig/stats/entropy/EntropyEstimator.java PRE-CREATION 
>   src/java/datafu/pig/stats/entropy/EntropyUtil.java PRE-CREATION 
>   src/java/datafu/pig/stats/entropy/StreamingCondEntropy.java PRE-CREATION 
>   src/java/datafu/pig/stats/entropy/StreamingEntropy.java PRE-CREATION 
>   test/pig/datafu/test/pig/sampling/WeightedReservoirSamplingTests.java PRE-CREATION 
>   test/pig/datafu/test/pig/stats/entropy/AbstractEntropyTests.java PRE-CREATION 
>   test/pig/datafu/test/pig/stats/entropy/EntropyTests.java PRE-CREATION 
>   test/pig/datafu/test/pig/stats/entropy/StreamingChaoShenEntropyTests.java PRE-CREATION 
>   test/pig/datafu/test/pig/stats/entropy/StreamingEmpiricalCondEntropyTests.java PRE-CREATION 
>   test/pig/datafu/test/pig/stats/entropy/StreamingEmpiricalEntropyTests.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/17058/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Matthew Hayes
> 
>


Re: Review Request 17058: DATAFU-2 UDFs for entropy and weighted sampling algorithms

Posted by Matthew Hayes <ma...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17058/
-----------------------------------------------------------

(Updated Jan. 21, 2014, 4 a.m.)


Review request for DataFu.


Repository: datafu


Description
-------

This is Jian Wang's implementation of the entropy and weighted sampling algorithms in DATAFU-2.  I generated the diff by applying the patches in a branch and squashing into one commit.


Diffs (updated)
-----

  src/java/datafu/pig/sampling/ReservoirSample.java 403bfaf06402c8dac2a70c0133ee955c625cb06e 
  src/java/datafu/pig/sampling/ScoredTuple.java 293cf3d6fcfa041287258b94a1f686a2a77ab4ce 
  src/java/datafu/pig/sampling/WeightedReservoirSample.java PRE-CREATION 
  src/java/datafu/pig/stats/entropy/ChaoShenEntropyEstimator.java PRE-CREATION 
  src/java/datafu/pig/stats/entropy/EmpiricalEntropyEstimator.java PRE-CREATION 
  src/java/datafu/pig/stats/entropy/Entropy.java PRE-CREATION 
  src/java/datafu/pig/stats/entropy/EntropyEstimator.java PRE-CREATION 
  src/java/datafu/pig/stats/entropy/EntropyUtil.java PRE-CREATION 
  src/java/datafu/pig/stats/entropy/StreamingCondEntropy.java PRE-CREATION 
  src/java/datafu/pig/stats/entropy/StreamingEntropy.java PRE-CREATION 
  test/pig/datafu/test/pig/sampling/WeightedReservoirSamplingTests.java PRE-CREATION 
  test/pig/datafu/test/pig/stats/entropy/AbstractEntropyTests.java PRE-CREATION 
  test/pig/datafu/test/pig/stats/entropy/EntropyTests.java PRE-CREATION 
  test/pig/datafu/test/pig/stats/entropy/StreamingChaoShenEntropyTests.java PRE-CREATION 
  test/pig/datafu/test/pig/stats/entropy/StreamingEmpiricalCondEntropyTests.java PRE-CREATION 
  test/pig/datafu/test/pig/stats/entropy/StreamingEmpiricalEntropyTests.java PRE-CREATION 

Diff: https://reviews.apache.org/r/17058/diff/


Testing
-------


Thanks,

Matthew Hayes


Re: Review Request 17058: DATAFU-2 UDFs for entropy and weighted sampling algorithms

Posted by Matthew Hayes <ma...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17058/
-----------------------------------------------------------

(Updated Jan. 17, 2014, 6:57 p.m.)


Review request for DataFu.


Repository: datafu


Description
-------

This is Jian Wang's implementation of the entropy and weighted sampling algorithms in DATAFU-2.  I generated the diff by applying the patches in a branch and squashing into one commit.


Diffs
-----

  src/java/datafu/pig/sampling/ReservoirSample.java 403bfaf06402c8dac2a70c0133ee955c625cb06e 
  src/java/datafu/pig/sampling/ScoredSampleReservoir.java PRE-CREATION 
  src/java/datafu/pig/sampling/ScoredTuple.java 293cf3d6fcfa041287258b94a1f686a2a77ab4ce 
  src/java/datafu/pig/sampling/WeightedReservoirSample.java PRE-CREATION 
  src/java/datafu/pig/sampling/WeightedReservoirSampleWithExpJump.java PRE-CREATION 
  src/java/datafu/pig/stats/entropy/Entropy.java PRE-CREATION 
  src/java/datafu/pig/stats/entropy/EntropyUtil.java PRE-CREATION 
  src/java/datafu/pig/stats/entropy/stream/ChaoShenEntropyEstimator.java PRE-CREATION 
  src/java/datafu/pig/stats/entropy/stream/EmpiricalEntropyEstimator.java PRE-CREATION 
  src/java/datafu/pig/stats/entropy/stream/EntropyEstimator.java PRE-CREATION 
  src/java/datafu/pig/stats/entropy/stream/StreamingCondEntropy.java PRE-CREATION 
  src/java/datafu/pig/stats/entropy/stream/StreamingEntropy.java PRE-CREATION 
  test/pig/datafu/test/pig/sampling/WeightedReservoirSampleWithExpJumpTests.java PRE-CREATION 
  test/pig/datafu/test/pig/sampling/WeightedReservoirSamplingTests.java PRE-CREATION 
  test/pig/datafu/test/pig/stats/entropy/AbstractEntropyTests.java PRE-CREATION 
  test/pig/datafu/test/pig/stats/entropy/EntropyTests.java PRE-CREATION 
  test/pig/datafu/test/pig/stats/entropy/StreamingChaoShenEntropyTests.java PRE-CREATION 
  test/pig/datafu/test/pig/stats/entropy/StreamingEmpiricalCondEntropyTests.java PRE-CREATION 
  test/pig/datafu/test/pig/stats/entropy/StreamingEmpiricalEntropyTests.java PRE-CREATION 

Diff: https://reviews.apache.org/r/17058/diff/


Testing
-------


Thanks,

Matthew Hayes