You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by 李运田 <cu...@163.com> on 2015/05/15 11:27:44 UTC

is there a random sample function with seed

I USE 
 table1 = load 'school' using org.apache.hcatalog.pig.HCatLoader();
table2 = sample table 0.01;
every time I dump table2 ,I get different result, Is there one sample function with seed? so  the result is not changed every time.
thank you.

Re: is there a random sample function with seed

Posted by Daniel Dai <da...@hortonworks.com>.
RANDOM takes a seed. You can do a filter with RANDOM:


define rand100 RANDOM(‘100');

table1 = load 'school' using org.apache.hcatalog.pig.HCatLoader();
table2 = filter table1 by rand100()<0.01;



Daniel
On 5/15/15, 2:27 AM, "李运田" <cu...@163.com> wrote:

>I USE 
> table1 = load 'school' using org.apache.hcatalog.pig.HCatLoader();
>table2 = sample table 0.01;
>every time I dump table2 ,I get different result, Is there one sample
>function with seed? so  the result is not changed every time.
>thank you.


Re: is there a random sample function with seed

Posted by Daniel Dai <da...@hortonworks.com>.
RANDOM takes a seed. You can do a filter with RANDOM:


define rand100 RANDOM(‘100');

table1 = load 'school' using org.apache.hcatalog.pig.HCatLoader();
table2 = filter table1 by rand100()<0.01;



Daniel
On 5/15/15, 2:27 AM, "李运田" <cu...@163.com> wrote:

>I USE 
> table1 = load 'school' using org.apache.hcatalog.pig.HCatLoader();
>table2 = sample table 0.01;
>every time I dump table2 ,I get different result, Is there one sample
>function with seed? so  the result is not changed every time.
>thank you.


Re: is there a random sample function with seed

Posted by Tomas Hudik <xh...@gmail.com>.
Not sure I understand what you want, but sample should be random.
Even if there was a seed you still should get different results for each
run.

If you want the same results, don't use sample but specify lines somehow
(based on some ID?) which you want to see

regards, Tomas


On Fri, May 15, 2015 at 11:27 AM, 李运田 <cu...@163.com> wrote:

> I USE
>  table1 = load 'school' using org.apache.hcatalog.pig.HCatLoader();
> table2 = sample table 0.01;
> every time I dump table2 ,I get different result, Is there one sample
> function with seed? so  the result is not changed every time.
> thank you.

Re: is there a random sample function with seed

Posted by Tomas Hudik <xh...@gmail.com>.
Not sure I understand what you want, but sample should be random.
Even if there was a seed you still should get different results for each
run.

If you want the same results, don't use sample but specify lines somehow
(based on some ID?) which you want to see

regards, Tomas


On Fri, May 15, 2015 at 11:27 AM, 李运田 <cu...@163.com> wrote:

> I USE
>  table1 = load 'school' using org.apache.hcatalog.pig.HCatLoader();
> table2 = sample table 0.01;
> every time I dump table2 ,I get different result, Is there one sample
> function with seed? so  the result is not changed every time.
> thank you.