You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by zuohua zhang <zu...@gmail.com> on 2012/08/15 00:18:50 UTC

how to do random sampling in hive?

Would like to extract a uniform random sample from a hive table? How should
I write the query?
Thanks!

Re: how to do random sampling in hive?

Posted by Bejoy KS <be...@yahoo.com>.
Hi,

To get more accurate sampling, you need to bucketize your table based on the columns you wish to use in sampling. Also use the TABLESAMPLE clause while getting the required sample size in your queries.

http://hive.apache.org/docs/r0.9.0/language_manual/working_with_bucketed_tables.html

https://cwiki.apache.org/Hive/languagemanual-sampling.html 


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Roberto Sanabria <ro...@stumbleupon.com>
Date: Tue, 14 Aug 2012 15:31:14 
To: <us...@hive.apache.org>
Reply-To: user@hive.apache.org
Subject: Re: how to do random sampling in hive?

Try this:

select * from table_name order by rand() limit 5;

Cheers,
R

On Tue, Aug 14, 2012 at 3:23 PM, Raihan Jamal <ja...@gmail.com> wrote:

> I think you can use here LIMIT-
>
> Limit indicates the number of rows to be returned. The rows returned are
> chosen at random. The following query returns 5 rows from t1 at random.
>
>
>
> SELECT * FROM t1 LIMIT 5
>
> http://karmasphere.com/hive-queries-on-table-data
>
>
>
> *Raihan Jamal*
>
>
>
> On Tue, Aug 14, 2012 at 3:18 PM, zuohua zhang <zu...@gmail.com> wrote:
>
>> Would like to extract a uniform random sample from a hive table? How
>> should I write the query?
>> Thanks!
>>
>
>


Re: how to do random sampling in hive?

Posted by Roberto Sanabria <ro...@stumbleupon.com>.
Try this:

select * from table_name order by rand() limit 5;

Cheers,
R

On Tue, Aug 14, 2012 at 3:23 PM, Raihan Jamal <ja...@gmail.com> wrote:

> I think you can use here LIMIT-
>
> Limit indicates the number of rows to be returned. The rows returned are
> chosen at random. The following query returns 5 rows from t1 at random.
>
>
>
> SELECT * FROM t1 LIMIT 5
>
> http://karmasphere.com/hive-queries-on-table-data
>
>
>
> *Raihan Jamal*
>
>
>
> On Tue, Aug 14, 2012 at 3:18 PM, zuohua zhang <zu...@gmail.com> wrote:
>
>> Would like to extract a uniform random sample from a hive table? How
>> should I write the query?
>> Thanks!
>>
>
>

Re: how to do random sampling in hive?

Posted by Raihan Jamal <ja...@gmail.com>.
I think you can use here LIMIT-

Limit indicates the number of rows to be returned. The rows returned are
chosen at random. The following query returns 5 rows from t1 at random.



SELECT * FROM t1 LIMIT 5

http://karmasphere.com/hive-queries-on-table-data



*Raihan Jamal*



On Tue, Aug 14, 2012 at 3:18 PM, zuohua zhang <zu...@gmail.com> wrote:

> Would like to extract a uniform random sample from a hive table? How
> should I write the query?
> Thanks!
>