You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "j.barrett Strausser" <j....@gmail.com> on 2013/07/30 03:51:35 UTC

Tablesample doubling

Hello All,

Why does TABLESAMPLE(N rows) produce ouptut with 2*N rows?


I have the following script:

DROP TABLE IF EXISTS sparse_features_small;

CREATE TABLE sparse_features_small ROW FORMAT DELIMITED FIELDS TERMINATED
BY ',' LINES TERMINATED BY '\n' as

SELECT
        *
FROM
        sparse_features
TABLESAMPLE(50000 ROWS)


After I execute this by sourcing the file, I can then execute :







-- 


https://github.com/bearrito
@deepbearrito

Re: Tablesample doubling

Posted by Stephen Sprague <sp...@gmail.com>.
+1 for documentation.  sometimes it surprises you. :)


On Mon, Jul 29, 2013 at 7:11 PM, j.barrett Strausser <
j.barrett.strausser@gmail.com> wrote:

> Nevermind I see in the docs, it is rows PER SPLIT.
>
> -b
>
>
> On Mon, Jul 29, 2013 at 9:52 PM, j.barrett Strausser <
> j.barrett.strausser@gmail.com> wrote:
>
>> SELECT COUNT(*) FROM sparse_features_small;
>>
>> And I receive back :
>>
>> Total MapReduce CPU Time Spent: 3 seconds 330 msec
>> OK
>> 100000
>>
>> Rather than the expected 50000
>>
>> I am running hive 11.2
>>
>>
>>
>>
>> On Mon, Jul 29, 2013 at 9:51 PM, j.barrett Strausser <
>> j.barrett.strausser@gmail.com> wrote:
>>
>>> Hello All,
>>>
>>> Why does TABLESAMPLE(N rows) produce ouptut with 2*N rows?
>>>
>>>
>>> I have the following script:
>>>
>>> DROP TABLE IF EXISTS sparse_features_small;
>>>
>>> CREATE TABLE sparse_features_small ROW FORMAT DELIMITED FIELDS
>>> TERMINATED BY ',' LINES TERMINATED BY '\n' as
>>>
>>> SELECT
>>>         *
>>> FROM
>>>         sparse_features
>>> TABLESAMPLE(50000 ROWS)
>>>
>>>
>>> After I execute this by sourcing the file, I can then execute :
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>>
>>> https://github.com/bearrito
>>> @deepbearrito
>>>
>>
>>
>>
>> --
>>
>>
>> https://github.com/bearrito
>> @deepbearrito
>>
>
>
>
> --
>
>
> https://github.com/bearrito
> @deepbearrito
>

Re: Tablesample doubling

Posted by "j.barrett Strausser" <j....@gmail.com>.
Nevermind I see in the docs, it is rows PER SPLIT.

-b


On Mon, Jul 29, 2013 at 9:52 PM, j.barrett Strausser <
j.barrett.strausser@gmail.com> wrote:

> SELECT COUNT(*) FROM sparse_features_small;
>
> And I receive back :
>
> Total MapReduce CPU Time Spent: 3 seconds 330 msec
> OK
> 100000
>
> Rather than the expected 50000
>
> I am running hive 11.2
>
>
>
>
> On Mon, Jul 29, 2013 at 9:51 PM, j.barrett Strausser <
> j.barrett.strausser@gmail.com> wrote:
>
>> Hello All,
>>
>> Why does TABLESAMPLE(N rows) produce ouptut with 2*N rows?
>>
>>
>> I have the following script:
>>
>> DROP TABLE IF EXISTS sparse_features_small;
>>
>> CREATE TABLE sparse_features_small ROW FORMAT DELIMITED FIELDS TERMINATED
>> BY ',' LINES TERMINATED BY '\n' as
>>
>> SELECT
>>         *
>> FROM
>>         sparse_features
>> TABLESAMPLE(50000 ROWS)
>>
>>
>> After I execute this by sourcing the file, I can then execute :
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>>
>> https://github.com/bearrito
>> @deepbearrito
>>
>
>
>
> --
>
>
> https://github.com/bearrito
> @deepbearrito
>



-- 


https://github.com/bearrito
@deepbearrito

Re: Tablesample doubling

Posted by "j.barrett Strausser" <j....@gmail.com>.
SELECT COUNT(*) FROM sparse_features_small;

And I receive back :

Total MapReduce CPU Time Spent: 3 seconds 330 msec
OK
100000

Rather than the expected 50000

I am running hive 11.2




On Mon, Jul 29, 2013 at 9:51 PM, j.barrett Strausser <
j.barrett.strausser@gmail.com> wrote:

> Hello All,
>
> Why does TABLESAMPLE(N rows) produce ouptut with 2*N rows?
>
>
> I have the following script:
>
> DROP TABLE IF EXISTS sparse_features_small;
>
> CREATE TABLE sparse_features_small ROW FORMAT DELIMITED FIELDS TERMINATED
> BY ',' LINES TERMINATED BY '\n' as
>
> SELECT
>         *
> FROM
>         sparse_features
> TABLESAMPLE(50000 ROWS)
>
>
> After I execute this by sourcing the file, I can then execute :
>
>
>
>
>
>
>
> --
>
>
> https://github.com/bearrito
> @deepbearrito
>



-- 


https://github.com/bearrito
@deepbearrito