You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@datafu.apache.org by Matthew Hayes <ma...@gmail.com> on 2014/01/17 19:56:55 UTC
Review Request 17058: DATAFU-2 UDFs for entropy and weighted sampling
algorithms
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17058/
-----------------------------------------------------------
Review request for DataFu.
Repository: datafu
Description
-------
This is Jian Wang's implementation of the entropy and weighted sampling algorithms in DATAFU-2. I generated the diff by applying the patches in a branch and squashing into one commit.
Diffs
-----
src/java/datafu/pig/sampling/ReservoirSample.java 403bfaf06402c8dac2a70c0133ee955c625cb06e
src/java/datafu/pig/sampling/ScoredSampleReservoir.java PRE-CREATION
src/java/datafu/pig/sampling/ScoredTuple.java 293cf3d6fcfa041287258b94a1f686a2a77ab4ce
src/java/datafu/pig/sampling/WeightedReservoirSample.java PRE-CREATION
src/java/datafu/pig/sampling/WeightedReservoirSampleWithExpJump.java PRE-CREATION
src/java/datafu/pig/stats/entropy/Entropy.java PRE-CREATION
src/java/datafu/pig/stats/entropy/EntropyUtil.java PRE-CREATION
src/java/datafu/pig/stats/entropy/stream/ChaoShenEntropyEstimator.java PRE-CREATION
src/java/datafu/pig/stats/entropy/stream/EmpiricalEntropyEstimator.java PRE-CREATION
src/java/datafu/pig/stats/entropy/stream/EntropyEstimator.java PRE-CREATION
src/java/datafu/pig/stats/entropy/stream/StreamingCondEntropy.java PRE-CREATION
src/java/datafu/pig/stats/entropy/stream/StreamingEntropy.java PRE-CREATION
test/pig/datafu/test/pig/sampling/WeightedReservoirSampleWithExpJumpTests.java PRE-CREATION
test/pig/datafu/test/pig/sampling/WeightedReservoirSamplingTests.java PRE-CREATION
test/pig/datafu/test/pig/stats/entropy/AbstractEntropyTests.java PRE-CREATION
test/pig/datafu/test/pig/stats/entropy/EntropyTests.java PRE-CREATION
test/pig/datafu/test/pig/stats/entropy/StreamingChaoShenEntropyTests.java PRE-CREATION
test/pig/datafu/test/pig/stats/entropy/StreamingEmpiricalCondEntropyTests.java PRE-CREATION
test/pig/datafu/test/pig/stats/entropy/StreamingEmpiricalEntropyTests.java PRE-CREATION
Diff: https://reviews.apache.org/r/17058/diff/
Testing
-------
Thanks,
Matthew Hayes
Re: Review Request 17058: DATAFU-2 UDFs for entropy and weighted sampling
algorithms
Posted by Matthew Hayes <ma...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17058/#review32202
-----------------------------------------------------------
Here are some initial comments on the new changes since the last time I reviewed it.
src/java/datafu/pig/sampling/WeightedReservoirSampleWithExpJump.java
<https://reviews.apache.org/r/17058/#comment60929>
You also need to implement new versions of Intermediate and Final that set the reservoir to InverseWeightJumpSampleReservoir. Currently you are using the default Reservoir that does not perform the exponential jumps. Therefore the algebraic version doesn't behave as intended. I checked this with the debugger by stepping through the code.
src/java/datafu/pig/sampling/WeightedReservoirSampleWithExpJump.java
<https://reviews.apache.org/r/17058/#comment60918>
minor comment: It appears that declaring 'r' is not needed. Math.random() could be called on line 146 directly.
src/java/datafu/pig/stats/entropy/stream/ChaoShenEntropyEstimator.java
<https://reviews.apache.org/r/17058/#comment60916>
I think everything in the datafu.pig.stats.entropy.stream package should be moved to datafu.pig.stats.entropy.
- Matthew Hayes
On Jan. 17, 2014, 6:57 p.m., Matthew Hayes wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17058/
> -----------------------------------------------------------
>
> (Updated Jan. 17, 2014, 6:57 p.m.)
>
>
> Review request for DataFu.
>
>
> Repository: datafu
>
>
> Description
> -------
>
> This is Jian Wang's implementation of the entropy and weighted sampling algorithms in DATAFU-2. I generated the diff by applying the patches in a branch and squashing into one commit.
>
>
> Diffs
> -----
>
> src/java/datafu/pig/sampling/ReservoirSample.java 403bfaf06402c8dac2a70c0133ee955c625cb06e
> src/java/datafu/pig/sampling/ScoredSampleReservoir.java PRE-CREATION
> src/java/datafu/pig/sampling/ScoredTuple.java 293cf3d6fcfa041287258b94a1f686a2a77ab4ce
> src/java/datafu/pig/sampling/WeightedReservoirSample.java PRE-CREATION
> src/java/datafu/pig/sampling/WeightedReservoirSampleWithExpJump.java PRE-CREATION
> src/java/datafu/pig/stats/entropy/Entropy.java PRE-CREATION
> src/java/datafu/pig/stats/entropy/EntropyUtil.java PRE-CREATION
> src/java/datafu/pig/stats/entropy/stream/ChaoShenEntropyEstimator.java PRE-CREATION
> src/java/datafu/pig/stats/entropy/stream/EmpiricalEntropyEstimator.java PRE-CREATION
> src/java/datafu/pig/stats/entropy/stream/EntropyEstimator.java PRE-CREATION
> src/java/datafu/pig/stats/entropy/stream/StreamingCondEntropy.java PRE-CREATION
> src/java/datafu/pig/stats/entropy/stream/StreamingEntropy.java PRE-CREATION
> test/pig/datafu/test/pig/sampling/WeightedReservoirSampleWithExpJumpTests.java PRE-CREATION
> test/pig/datafu/test/pig/sampling/WeightedReservoirSamplingTests.java PRE-CREATION
> test/pig/datafu/test/pig/stats/entropy/AbstractEntropyTests.java PRE-CREATION
> test/pig/datafu/test/pig/stats/entropy/EntropyTests.java PRE-CREATION
> test/pig/datafu/test/pig/stats/entropy/StreamingChaoShenEntropyTests.java PRE-CREATION
> test/pig/datafu/test/pig/stats/entropy/StreamingEmpiricalCondEntropyTests.java PRE-CREATION
> test/pig/datafu/test/pig/stats/entropy/StreamingEmpiricalEntropyTests.java PRE-CREATION
>
> Diff: https://reviews.apache.org/r/17058/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Matthew Hayes
>
>
Re: Review Request 17058: DATAFU-2 UDFs for entropy and weighted sampling
algorithms
Posted by Matthew Hayes <ma...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17058/#review32405
-----------------------------------------------------------
Ship it!
Ship It!
- Matthew Hayes
On Jan. 21, 2014, 4 a.m., Matthew Hayes wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17058/
> -----------------------------------------------------------
>
> (Updated Jan. 21, 2014, 4 a.m.)
>
>
> Review request for DataFu.
>
>
> Repository: datafu
>
>
> Description
> -------
>
> This is Jian Wang's implementation of the entropy and weighted sampling algorithms in DATAFU-2. I generated the diff by applying the patches in a branch and squashing into one commit.
>
>
> Diffs
> -----
>
> src/java/datafu/pig/sampling/ReservoirSample.java 403bfaf06402c8dac2a70c0133ee955c625cb06e
> src/java/datafu/pig/sampling/ScoredTuple.java 293cf3d6fcfa041287258b94a1f686a2a77ab4ce
> src/java/datafu/pig/sampling/WeightedReservoirSample.java PRE-CREATION
> src/java/datafu/pig/stats/entropy/ChaoShenEntropyEstimator.java PRE-CREATION
> src/java/datafu/pig/stats/entropy/EmpiricalEntropyEstimator.java PRE-CREATION
> src/java/datafu/pig/stats/entropy/Entropy.java PRE-CREATION
> src/java/datafu/pig/stats/entropy/EntropyEstimator.java PRE-CREATION
> src/java/datafu/pig/stats/entropy/EntropyUtil.java PRE-CREATION
> src/java/datafu/pig/stats/entropy/StreamingCondEntropy.java PRE-CREATION
> src/java/datafu/pig/stats/entropy/StreamingEntropy.java PRE-CREATION
> test/pig/datafu/test/pig/sampling/WeightedReservoirSamplingTests.java PRE-CREATION
> test/pig/datafu/test/pig/stats/entropy/AbstractEntropyTests.java PRE-CREATION
> test/pig/datafu/test/pig/stats/entropy/EntropyTests.java PRE-CREATION
> test/pig/datafu/test/pig/stats/entropy/StreamingChaoShenEntropyTests.java PRE-CREATION
> test/pig/datafu/test/pig/stats/entropy/StreamingEmpiricalCondEntropyTests.java PRE-CREATION
> test/pig/datafu/test/pig/stats/entropy/StreamingEmpiricalEntropyTests.java PRE-CREATION
>
> Diff: https://reviews.apache.org/r/17058/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Matthew Hayes
>
>
Re: Review Request 17058: DATAFU-2 UDFs for entropy and weighted sampling
algorithms
Posted by Matthew Hayes <ma...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17058/
-----------------------------------------------------------
(Updated Jan. 21, 2014, 4 a.m.)
Review request for DataFu.
Repository: datafu
Description
-------
This is Jian Wang's implementation of the entropy and weighted sampling algorithms in DATAFU-2. I generated the diff by applying the patches in a branch and squashing into one commit.
Diffs (updated)
-----
src/java/datafu/pig/sampling/ReservoirSample.java 403bfaf06402c8dac2a70c0133ee955c625cb06e
src/java/datafu/pig/sampling/ScoredTuple.java 293cf3d6fcfa041287258b94a1f686a2a77ab4ce
src/java/datafu/pig/sampling/WeightedReservoirSample.java PRE-CREATION
src/java/datafu/pig/stats/entropy/ChaoShenEntropyEstimator.java PRE-CREATION
src/java/datafu/pig/stats/entropy/EmpiricalEntropyEstimator.java PRE-CREATION
src/java/datafu/pig/stats/entropy/Entropy.java PRE-CREATION
src/java/datafu/pig/stats/entropy/EntropyEstimator.java PRE-CREATION
src/java/datafu/pig/stats/entropy/EntropyUtil.java PRE-CREATION
src/java/datafu/pig/stats/entropy/StreamingCondEntropy.java PRE-CREATION
src/java/datafu/pig/stats/entropy/StreamingEntropy.java PRE-CREATION
test/pig/datafu/test/pig/sampling/WeightedReservoirSamplingTests.java PRE-CREATION
test/pig/datafu/test/pig/stats/entropy/AbstractEntropyTests.java PRE-CREATION
test/pig/datafu/test/pig/stats/entropy/EntropyTests.java PRE-CREATION
test/pig/datafu/test/pig/stats/entropy/StreamingChaoShenEntropyTests.java PRE-CREATION
test/pig/datafu/test/pig/stats/entropy/StreamingEmpiricalCondEntropyTests.java PRE-CREATION
test/pig/datafu/test/pig/stats/entropy/StreamingEmpiricalEntropyTests.java PRE-CREATION
Diff: https://reviews.apache.org/r/17058/diff/
Testing
-------
Thanks,
Matthew Hayes
Re: Review Request 17058: DATAFU-2 UDFs for entropy and weighted sampling
algorithms
Posted by Matthew Hayes <ma...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17058/
-----------------------------------------------------------
(Updated Jan. 17, 2014, 6:57 p.m.)
Review request for DataFu.
Repository: datafu
Description
-------
This is Jian Wang's implementation of the entropy and weighted sampling algorithms in DATAFU-2. I generated the diff by applying the patches in a branch and squashing into one commit.
Diffs
-----
src/java/datafu/pig/sampling/ReservoirSample.java 403bfaf06402c8dac2a70c0133ee955c625cb06e
src/java/datafu/pig/sampling/ScoredSampleReservoir.java PRE-CREATION
src/java/datafu/pig/sampling/ScoredTuple.java 293cf3d6fcfa041287258b94a1f686a2a77ab4ce
src/java/datafu/pig/sampling/WeightedReservoirSample.java PRE-CREATION
src/java/datafu/pig/sampling/WeightedReservoirSampleWithExpJump.java PRE-CREATION
src/java/datafu/pig/stats/entropy/Entropy.java PRE-CREATION
src/java/datafu/pig/stats/entropy/EntropyUtil.java PRE-CREATION
src/java/datafu/pig/stats/entropy/stream/ChaoShenEntropyEstimator.java PRE-CREATION
src/java/datafu/pig/stats/entropy/stream/EmpiricalEntropyEstimator.java PRE-CREATION
src/java/datafu/pig/stats/entropy/stream/EntropyEstimator.java PRE-CREATION
src/java/datafu/pig/stats/entropy/stream/StreamingCondEntropy.java PRE-CREATION
src/java/datafu/pig/stats/entropy/stream/StreamingEntropy.java PRE-CREATION
test/pig/datafu/test/pig/sampling/WeightedReservoirSampleWithExpJumpTests.java PRE-CREATION
test/pig/datafu/test/pig/sampling/WeightedReservoirSamplingTests.java PRE-CREATION
test/pig/datafu/test/pig/stats/entropy/AbstractEntropyTests.java PRE-CREATION
test/pig/datafu/test/pig/stats/entropy/EntropyTests.java PRE-CREATION
test/pig/datafu/test/pig/stats/entropy/StreamingChaoShenEntropyTests.java PRE-CREATION
test/pig/datafu/test/pig/stats/entropy/StreamingEmpiricalCondEntropyTests.java PRE-CREATION
test/pig/datafu/test/pig/stats/entropy/StreamingEmpiricalEntropyTests.java PRE-CREATION
Diff: https://reviews.apache.org/r/17058/diff/
Testing
-------
Thanks,
Matthew Hayes