You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by unmesha sreeveni <un...@gmail.com> on 2014/12/01 06:00:38 UTC

How to retriew data from a specific bucket in hive?

I created a table in hive
create table HiveMB
  (EmployeeID Int,FirstName String,Designation String,Salary Int,Department
String)
   clustered by (Department) into 3 buckets
   stored as orc TBLPROPERTIES ('transactional'='true') ;

where my file format is like
1,Anne,Admin,50000,A
2,Gokul,Admin,50000,B
3,Janet,Sales,60000,A
4,Hari,Admin,50000,C
5,Sanker,Admin,50000,C

and the data went into 3 buckets for department.

When I examined the warehouse , there are 3 buckets
Found 3 items
-rwxr-xr-x   3 aibladmin hadoop     252330 2014-11-28 14:46
/user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00000
-rwxr-xr-x   3 aibladmin hadoop     100421 2014-11-28 14:45
/user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00001
-rwxr-xr-x   3 aibladmin hadoop     313047 2014-11-28 14:46
/user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00002

How will I be able to retrieve 1 such bucket.

When I did a -cat, It is not in human readable format.
How can I able to see the data stored into each bucket?


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Re: How to retriew data from a specific bucket in hive?

Posted by unmesha sreeveni <un...@gmail.com>.
Any pointers would appreciate.

On Mon, Dec 1, 2014 at 11:27 AM, unmesha sreeveni <un...@gmail.com>
wrote:

>
> On Mon, Dec 1, 2014 at 11:15 AM, yogendra reddy <yo...@gmail.com>
> wrote:
>
>> hive --orcfiledump
>
>
> ​Hi yogendra​
>
> ​shows ​
> Exception in thread "main" java.io.IOException: Malformed ORC file
> /employeeData/empLargenew.txt. Invalid postscript.
> ​But ​my file is not ORC format it is .csv format
>
> *1,Anne,Admin,50000,A*
> *2,Gokul,Admin,50000,B*
>
> So as a workaround I loaded data into a table
>
> * create external table stagingMB (EmployeeID Int,FirstName
> String,Designation String,Salary Int,Department String) row format
> delimited fields terminated by "," location '/employeeData';*
>
> and from the above table I loaded the data into ORC table
>
>  *create table HiveMB (EmployeeID Int,FirstName String,Designation
> String,Salary Int,Department String) clustered by (Department) into 3
> buckets stored as orc TBLPROPERTIES ('transactional'='true') ;  *
>
> * from stagingMB insert into table HiveMB  select
> employeeid,firstname,designation,salary,department;  *
>
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Re: How to retriew data from a specific bucket in hive?

Posted by unmesha sreeveni <un...@gmail.com>.
On Mon, Dec 1, 2014 at 11:15 AM, yogendra reddy <yo...@gmail.com>
wrote:

> hive --orcfiledump


​Hi yogendra​

​shows ​
Exception in thread "main" java.io.IOException: Malformed ORC file
/employeeData/empLargenew.txt. Invalid postscript.
​But ​my file is not ORC format it is .csv format

*1,Anne,Admin,50000,A*
*2,Gokul,Admin,50000,B*

So as a workaround I loaded data into a table

* create external table stagingMB (EmployeeID Int,FirstName
String,Designation String,Salary Int,Department String) row format
delimited fields terminated by "," location '/employeeData';*

and from the above table I loaded the data into ORC table

 *create table HiveMB (EmployeeID Int,FirstName String,Designation
String,Salary Int,Department String) clustered by (Department) into 3
buckets stored as orc TBLPROPERTIES ('transactional'='true') ;  *

* from stagingMB insert into table HiveMB  select
employeeid,firstname,designation,salary,department;  *


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Re: How to retriew data from a specific bucket in hive?

Posted by yogendra reddy <yo...@gmail.com>.
Hi,

"When I did a -cat, It is not in human readable format"
-This is because the file format is specified as orc in your table
definition

let me know if this works for you (orc file dump utility)
hive --orcfiledump <hdfs-location-of-orc-file>

Thanks,
Yogendra

On Mon, Dec 1, 2014 at 10:49 AM, Somnath Pandeya <
Somnath_Pandeya@infosys.com> wrote:

>  Hi Unmesha,
>
>
>
> You can simply do
>
> Hdfs dfs –cat /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00000/*
>
>
>
>
> And check the content of file.
>
>
>
> Thanks
>
> Somnath
>
> *From:* unmesha sreeveni [mailto:unmeshabiju@gmail.com]
> *Sent:* Monday, December 01, 2014 10:31 AM
> *To:* User - Hive
> *Subject:* How to retriew data from a specific bucket in hive?
>
>
>
>
>
> I created a table in hive
>
> create table HiveMB
>
>   (EmployeeID Int,FirstName String,Designation String,Salary
> Int,Department String)
>
>    clustered by (Department) into 3 buckets
>
>    stored as orc TBLPROPERTIES ('transactional'='true') ;
>
>
>
> where my file format is like
>
> 1,Anne,Admin,50000,A
>
> 2,Gokul,Admin,50000,B
>
> 3,Janet,Sales,60000,A
>
> 4,Hari,Admin,50000,C
>
> 5,Sanker,Admin,50000,C
>
>
>
> and the data went into 3 buckets for department.
>
>
>
> When I examined the warehouse , there are 3 buckets
>
> Found 3 items
>
> -rwxr-xr-x   3 aibladmin hadoop     252330 2014-11-28 14:46
> /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00000
>
> -rwxr-xr-x   3 aibladmin hadoop     100421 2014-11-28 14:45
> /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00001
>
> -rwxr-xr-x   3 aibladmin hadoop     313047 2014-11-28 14:46
> /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00002
>
>
>
> How will I be able to retrieve 1 such bucket.
>
>
>
> When I did a -cat, It is not in human readable format.
>
> How can I able to see the data stored into each bucket?
>
>
>
>
>
> --
>
> *Thanks & Regards *
>
>
>
> *Unmesha Sreeveni U.B*
>
> *Hadoop, Bigdata Developer*
>
> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
>
> http://www.unmeshasreeveni.blogspot.in/
>
>
>
>
>
> **************** CAUTION - Disclaimer *****************
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
> for the use of the addressee(s). If you are not the intended recipient, please
> notify the sender by e-mail and delete the original message. Further, you are not
> to copy, disclose, or distribute this e-mail or its contents to any other person and
> any such actions are unlawful. This e-mail may contain viruses. Infosys has taken
> every reasonable precaution to minimize this risk, but is not liable for any damage
> you may sustain as a result of any virus in this e-mail. You should carry out your
> own virus checks before opening the e-mail or attachment. Infosys reserves the
> right to monitor and review the content of all messages sent to or from this e-mail
> address. Messages sent to or from this e-mail address may be stored on the
> Infosys e-mail system.
> ***INFOSYS******** End of Disclaimer ********INFOSYS***
>
>

Re: How to retriew data from a specific bucket in hive?

Posted by unmesha sreeveni <un...@gmail.com>.
On Mon, Dec 1, 2014 at 10:49 AM, Somnath Pandeya <
Somnath_Pandeya@infosys.com> wrote:

> –cat /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00000/


​Hi ​Somnath

 That is not working for me

showing something like
`J�lj�(��rwNj��[��Y���gR�� \�B�Q_Js)�6 �st�A�6�ixt� R �
ޜ�KT� e����IL Iԋ� ł2�2���I�Y��FC8 /2�g� ����� > ������q�D � b�` `�`���89$
$$ ����I��y|@޿

                    %\���� �&�ɢ`a~ � S �$�l�:y���K $�$����X�X��)Ě���U*��
6. �� �cJnf� KHjr�ć����� ��(p` ��˻_1s �5ps1: 1:I4L\��u
                                                                 䵮



-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

RE: How to retriew data from a specific bucket in hive?

Posted by Somnath Pandeya <So...@infosys.com>.
Hi Unmesha,

You can simply do
Hdfs dfs –cat /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00000/*

And check the content of file.

Thanks
Somnath
From: unmesha sreeveni [mailto:unmeshabiju@gmail.com]
Sent: Monday, December 01, 2014 10:31 AM
To: User - Hive
Subject: How to retriew data from a specific bucket in hive?


I created a table in hive
create table HiveMB
  (EmployeeID Int,FirstName String,Designation String,Salary Int,Department String)
   clustered by (Department) into 3 buckets
   stored as orc TBLPROPERTIES ('transactional'='true') ;

where my file format is like
1,Anne,Admin,50000,A
2,Gokul,Admin,50000,B
3,Janet,Sales,60000,A
4,Hari,Admin,50000,C
5,Sanker,Admin,50000,C

and the data went into 3 buckets for department.

When I examined the warehouse , there are 3 buckets
Found 3 items
-rwxr-xr-x   3 aibladmin hadoop     252330 2014-11-28 14:46 /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00000
-rwxr-xr-x   3 aibladmin hadoop     100421 2014-11-28 14:45 /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00001
-rwxr-xr-x   3 aibladmin hadoop     313047 2014-11-28 14:46 /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00002

How will I be able to retrieve 1 such bucket.

When I did a -cat, It is not in human readable format.
How can I able to see the data stored into each bucket?


--
Thanks & Regards

Unmesha Sreeveni U.B
Hadoop, Bigdata Developer
Centre for Cyber Security | Amrita Vishwa Vidyapeetham
http://www.unmeshasreeveni.blogspot.in/



**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are not 
to copy, disclose, or distribute this e-mail or its contents to any other person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has taken 
every reasonable precaution to minimize this risk, but is not liable for any damage 
you may sustain as a result of any virus in this e-mail. You should carry out your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***