You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Furcy Pin <pi...@gmail.com> on 2018/05/01 14:34:27 UTC

Re: insert overwrite to hive orc table in aws

I suggest that you do a

DESCRIBE FORMATTED orc

and to check directly on s3 the location of your data.

Did you also try a SELECT * FROM orc LIMIT 10; ? Did it return something?



On Fri, 27 Apr 2018, 21:30 Sowjanya Kakarala, <so...@agrible.com> wrote:

>
>
>
>
>> Hi Guys,
>>
>> I am new to aws, I am not sure how the hive works and store the data in
>> aws.
>>
>> I have created a text table and orc table. where text table have data and
>> when I do insert overwrite from text table the qury runs fine and when I do
>> a count on orc table its zero.
>>
>> I have no clue where the data is going, any suggestions or ideas?
>>
>> Here is my query:
>>
>> hive> insert into table orc select * from text;
>
> Query ID = hadoop_20180427190946_03edcce9-7a48-4876-aadb-6a7a1a862e50
>
> Total jobs = 1
>
> Launching Job 1 out of 1
>
> Tez session was closed. Reopening...
>
> Session re-established.
>
> Status: Running (Executing on YARN cluster with App id
> application_1524689752360_0058)
>
>
>
> ----------------------------------------------------------------------------------------------
>
> *        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING
> PENDING  FAILED  KILLED  *
>
>
> ----------------------------------------------------------------------------------------------
>
> Map 1 .......... container     SUCCEEDED      7          7        0
>   0       0       0
>
> Reducer 2 ...... container     SUCCEEDED     10         10        0
>   0       0       0
>
>
> ----------------------------------------------------------------------------------------------
>
> *VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 23.02
> s    *
>
>
> ----------------------------------------------------------------------------------------------
>
> Loading data to table db.orc
>
> OK
>
> Time taken: 29.982 seconds
>
> hive> select count(*) from orc;
>
> Query ID = hadoop_20180427191026_faa4fa26-df05-4b52-aa31-f0c878b9c0bd
>
> Total jobs = 1
>
> Launching Job 1 out of 1
>
> Status: Running (Executing on YARN cluster with App id
> application_1524689752360_0058)
>
>
>
> ----------------------------------------------------------------------------------------------
>
> *        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING
> PENDING  FAILED  KILLED  *
>
>
> ----------------------------------------------------------------------------------------------
>
> Map 1            container     SUCCEEDED      0          0        0
>   0       0       0
>
> Reducer 2 ...... container     SUCCEEDED      1          1        0
>   0       0       0
>
>
> ----------------------------------------------------------------------------------------------
>
> *VERTICES: 01/02  [==========================>>] 100%  ELAPSED TIME: 3.49
> s     *
>
>
> ----------------------------------------------------------------------------------------------
>
> OK
>
> 0
>
> Time taken: 4.425 seconds, Fetched: 1 row(s)
>
> hive> select count(*) from text;
>
> Query ID = hadoop_20180427192401_fc0c4e26-0bb7-4d56-a7b9-17493cd5c88d
>
> Total jobs = 1
>
> Launching Job 1 out of 1
>
> Tez session was closed. Reopening...
>
> Session re-established.
>
> Status: Running (Executing on YARN cluster with App id
> application_1524689752360_0059)
>
>
>
> ----------------------------------------------------------------------------------------------
>
> *        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING
> PENDING  FAILED  KILLED  *
>
>
> ----------------------------------------------------------------------------------------------
>
> Map 1 .......... container     SUCCEEDED      7          7        0
>   0       0       0
>
> Reducer 2 ...... container     SUCCEEDED      1          1        0
>   0       0       0
>
>
> ----------------------------------------------------------------------------------------------
>
> *VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 10.19
> s    *
>
>
> ----------------------------------------------------------------------------------------------
>
> OK
>
> 37504
>
> Time taken: 15.506 seconds, Fetched: 1 row(s)
>
>
> Thanks in advance.
>
>

Re: insert overwrite to hive orc table in aws

Posted by Gopal Vijayaraghavan <go...@apache.org>.
> delta_0000000_0000000
...
> I am using Glue data catalog as metastore, so should there be any link up to these tables from hive?

That would be why transactions are returning as 0 (there is never a transaction 0), because it is not using a Hive standard metastore.

You might not be able to use transactional tables with a non-standard metastore, because DbTxnManager & DbLockManager might not be working without a DB.

Looking at that, it looks like you will not be able to use transactional tables without using a proper Hive metastore backed by RDS.

Cheers,
Gopal



Re: insert overwrite to hive orc table in aws

Posted by Sowjanya Kakarala <so...@agrible.com>.
Ya I did try the select statement but it did not return anything. In S3
location I saw that it created buckets path as "delta_0000000_0000000" but
there is nothing in it.
I am using Glue data catalog as metastore, so should there be any link up
to these tables from hive?
or to see if data catalog has data how should I retrieve it?

On Tue, May 1, 2018 at 9:34 AM, Furcy Pin <pi...@gmail.com> wrote:

> I suggest that you do a
>
> DESCRIBE FORMATTED orc
>
> and to check directly on s3 the location of your data.
>
> Did you also try a SELECT * FROM orc LIMIT 10; ? Did it return something?
>
>
>
> On Fri, 27 Apr 2018, 21:30 Sowjanya Kakarala, <so...@agrible.com>
> wrote:
>
>>
>>
>>
>>
>>> Hi Guys,
>>>
>>> I am new to aws, I am not sure how the hive works and store the data in
>>> aws.
>>>
>>> I have created a text table and orc table. where text table have data
>>> and when I do insert overwrite from text table the qury runs fine and when
>>> I do a count on orc table its zero.
>>>
>>> I have no clue where the data is going, any suggestions or ideas?
>>>
>>> Here is my query:
>>>
>>> hive> insert into table orc select * from text;
>>
>> Query ID = hadoop_20180427190946_03edcce9-7a48-4876-aadb-6a7a1a862e50
>>
>> Total jobs = 1
>>
>> Launching Job 1 out of 1
>>
>> Tez session was closed. Reopening...
>>
>> Session re-established.
>>
>> Status: Running (Executing on YARN cluster with App id
>> application_1524689752360_0058)
>>
>>
>> ------------------------------------------------------------
>> ----------------------------------
>>
>> *        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING
>> PENDING  FAILED  KILLED  *
>>
>> ------------------------------------------------------------
>> ----------------------------------
>>
>> Map 1 .......... container     SUCCEEDED      7          7        0
>>   0       0       0
>>
>> Reducer 2 ...... container     SUCCEEDED     10         10        0
>>   0       0       0
>>
>> ------------------------------------------------------------
>> ----------------------------------
>>
>> *VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME:
>> 23.02 s    *
>>
>> ------------------------------------------------------------
>> ----------------------------------
>>
>> Loading data to table db.orc
>>
>> OK
>>
>> Time taken: 29.982 seconds
>>
>> hive> select count(*) from orc;
>>
>> Query ID = hadoop_20180427191026_faa4fa26-df05-4b52-aa31-f0c878b9c0bd
>>
>> Total jobs = 1
>>
>> Launching Job 1 out of 1
>>
>> Status: Running (Executing on YARN cluster with App id
>> application_1524689752360_0058)
>>
>>
>> ------------------------------------------------------------
>> ----------------------------------
>>
>> *        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING
>> PENDING  FAILED  KILLED  *
>>
>> ------------------------------------------------------------
>> ----------------------------------
>>
>> Map 1            container     SUCCEEDED      0          0        0
>>   0       0       0
>>
>> Reducer 2 ...... container     SUCCEEDED      1          1        0
>>   0       0       0
>>
>> ------------------------------------------------------------
>> ----------------------------------
>>
>> *VERTICES: 01/02  [==========================>>] 100%  ELAPSED TIME: 3.49
>> s     *
>>
>> ------------------------------------------------------------
>> ----------------------------------
>>
>> OK
>>
>> 0
>>
>> Time taken: 4.425 seconds, Fetched: 1 row(s)
>>
>> hive> select count(*) from text;
>>
>> Query ID = hadoop_20180427192401_fc0c4e26-0bb7-4d56-a7b9-17493cd5c88d
>>
>> Total jobs = 1
>>
>> Launching Job 1 out of 1
>>
>> Tez session was closed. Reopening...
>>
>> Session re-established.
>>
>> Status: Running (Executing on YARN cluster with App id
>> application_1524689752360_0059)
>>
>>
>> ------------------------------------------------------------
>> ----------------------------------
>>
>> *        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING
>> PENDING  FAILED  KILLED  *
>>
>> ------------------------------------------------------------
>> ----------------------------------
>>
>> Map 1 .......... container     SUCCEEDED      7          7        0
>>   0       0       0
>>
>> Reducer 2 ...... container     SUCCEEDED      1          1        0
>>   0       0       0
>>
>> ------------------------------------------------------------
>> ----------------------------------
>>
>> *VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME:
>> 10.19 s    *
>>
>> ------------------------------------------------------------
>> ----------------------------------
>>
>> OK
>>
>> 37504
>>
>> Time taken: 15.506 seconds, Fetched: 1 row(s)
>>
>>
>> Thanks in advance.
>>
>>


--