You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by unmesha sreeveni <un...@gmail.com> on 2014/12/01 06:00:38 UTC
How to retriew data from a specific bucket in hive?
I created a table in hive
create table HiveMB
(EmployeeID Int,FirstName String,Designation String,Salary Int,Department
String)
clustered by (Department) into 3 buckets
stored as orc TBLPROPERTIES ('transactional'='true') ;
where my file format is like
1,Anne,Admin,50000,A
2,Gokul,Admin,50000,B
3,Janet,Sales,60000,A
4,Hari,Admin,50000,C
5,Sanker,Admin,50000,C
and the data went into 3 buckets for department.
When I examined the warehouse , there are 3 buckets
Found 3 items
-rwxr-xr-x 3 aibladmin hadoop 252330 2014-11-28 14:46
/user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00000
-rwxr-xr-x 3 aibladmin hadoop 100421 2014-11-28 14:45
/user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00001
-rwxr-xr-x 3 aibladmin hadoop 313047 2014-11-28 14:46
/user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00002
How will I be able to retrieve 1 such bucket.
When I did a -cat, It is not in human readable format.
How can I able to see the data stored into each bucket?
--
*Thanks & Regards *
*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/
Re: How to retriew data from a specific bucket in hive?
Posted by unmesha sreeveni <un...@gmail.com>.
Any pointers would appreciate.
On Mon, Dec 1, 2014 at 11:27 AM, unmesha sreeveni <un...@gmail.com>
wrote:
>
> On Mon, Dec 1, 2014 at 11:15 AM, yogendra reddy <yo...@gmail.com>
> wrote:
>
>> hive --orcfiledump
>
>
> Hi yogendra
>
> shows
> Exception in thread "main" java.io.IOException: Malformed ORC file
> /employeeData/empLargenew.txt. Invalid postscript.
> But my file is not ORC format it is .csv format
>
> *1,Anne,Admin,50000,A*
> *2,Gokul,Admin,50000,B*
>
> So as a workaround I loaded data into a table
>
> * create external table stagingMB (EmployeeID Int,FirstName
> String,Designation String,Salary Int,Department String) row format
> delimited fields terminated by "," location '/employeeData';*
>
> and from the above table I loaded the data into ORC table
>
> *create table HiveMB (EmployeeID Int,FirstName String,Designation
> String,Salary Int,Department String) clustered by (Department) into 3
> buckets stored as orc TBLPROPERTIES ('transactional'='true') ; *
>
> * from stagingMB insert into table HiveMB select
> employeeid,firstname,designation,salary,department; *
>
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>
--
*Thanks & Regards *
*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/
Re: How to retriew data from a specific bucket in hive?
Posted by unmesha sreeveni <un...@gmail.com>.
On Mon, Dec 1, 2014 at 11:15 AM, yogendra reddy <yo...@gmail.com>
wrote:
> hive --orcfiledump
Hi yogendra
shows
Exception in thread "main" java.io.IOException: Malformed ORC file
/employeeData/empLargenew.txt. Invalid postscript.
But my file is not ORC format it is .csv format
*1,Anne,Admin,50000,A*
*2,Gokul,Admin,50000,B*
So as a workaround I loaded data into a table
* create external table stagingMB (EmployeeID Int,FirstName
String,Designation String,Salary Int,Department String) row format
delimited fields terminated by "," location '/employeeData';*
and from the above table I loaded the data into ORC table
*create table HiveMB (EmployeeID Int,FirstName String,Designation
String,Salary Int,Department String) clustered by (Department) into 3
buckets stored as orc TBLPROPERTIES ('transactional'='true') ; *
* from stagingMB insert into table HiveMB select
employeeid,firstname,designation,salary,department; *
--
*Thanks & Regards *
*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/
Re: How to retriew data from a specific bucket in hive?
Posted by yogendra reddy <yo...@gmail.com>.
Hi,
"When I did a -cat, It is not in human readable format"
-This is because the file format is specified as orc in your table
definition
let me know if this works for you (orc file dump utility)
hive --orcfiledump <hdfs-location-of-orc-file>
Thanks,
Yogendra
On Mon, Dec 1, 2014 at 10:49 AM, Somnath Pandeya <
Somnath_Pandeya@infosys.com> wrote:
> Hi Unmesha,
>
>
>
> You can simply do
>
> Hdfs dfs –cat /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00000/*
>
>
>
>
> And check the content of file.
>
>
>
> Thanks
>
> Somnath
>
> *From:* unmesha sreeveni [mailto:unmeshabiju@gmail.com]
> *Sent:* Monday, December 01, 2014 10:31 AM
> *To:* User - Hive
> *Subject:* How to retriew data from a specific bucket in hive?
>
>
>
>
>
> I created a table in hive
>
> create table HiveMB
>
> (EmployeeID Int,FirstName String,Designation String,Salary
> Int,Department String)
>
> clustered by (Department) into 3 buckets
>
> stored as orc TBLPROPERTIES ('transactional'='true') ;
>
>
>
> where my file format is like
>
> 1,Anne,Admin,50000,A
>
> 2,Gokul,Admin,50000,B
>
> 3,Janet,Sales,60000,A
>
> 4,Hari,Admin,50000,C
>
> 5,Sanker,Admin,50000,C
>
>
>
> and the data went into 3 buckets for department.
>
>
>
> When I examined the warehouse , there are 3 buckets
>
> Found 3 items
>
> -rwxr-xr-x 3 aibladmin hadoop 252330 2014-11-28 14:46
> /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00000
>
> -rwxr-xr-x 3 aibladmin hadoop 100421 2014-11-28 14:45
> /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00001
>
> -rwxr-xr-x 3 aibladmin hadoop 313047 2014-11-28 14:46
> /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00002
>
>
>
> How will I be able to retrieve 1 such bucket.
>
>
>
> When I did a -cat, It is not in human readable format.
>
> How can I able to see the data stored into each bucket?
>
>
>
>
>
> --
>
> *Thanks & Regards *
>
>
>
> *Unmesha Sreeveni U.B*
>
> *Hadoop, Bigdata Developer*
>
> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
>
> http://www.unmeshasreeveni.blogspot.in/
>
>
>
>
>
> **************** CAUTION - Disclaimer *****************
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
> for the use of the addressee(s). If you are not the intended recipient, please
> notify the sender by e-mail and delete the original message. Further, you are not
> to copy, disclose, or distribute this e-mail or its contents to any other person and
> any such actions are unlawful. This e-mail may contain viruses. Infosys has taken
> every reasonable precaution to minimize this risk, but is not liable for any damage
> you may sustain as a result of any virus in this e-mail. You should carry out your
> own virus checks before opening the e-mail or attachment. Infosys reserves the
> right to monitor and review the content of all messages sent to or from this e-mail
> address. Messages sent to or from this e-mail address may be stored on the
> Infosys e-mail system.
> ***INFOSYS******** End of Disclaimer ********INFOSYS***
>
>
Re: How to retriew data from a specific bucket in hive?
Posted by unmesha sreeveni <un...@gmail.com>.
On Mon, Dec 1, 2014 at 10:49 AM, Somnath Pandeya <
Somnath_Pandeya@infosys.com> wrote:
> –cat /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00000/
Hi Somnath
That is not working for me
showing something like
`J�lj�(��rwNj��[��Y���gR�� \�B�Q_Js)�6 �st�A�6�ixt� R �
ޜ�KT� e����IL Iԋ� ł2�2���I�Y��FC8 /2�g� ����� > ������q�D � b�` `�`���89$
$$ ����I��y|@
%\���� �&�ɢ`a~ � S �$�l�:y���K $�$����X�X��)Ě���U*��
6. �� �cJnf� KHjr�ć����� ��(p` ��˻_1s �5ps1: 1:I4L\��u
䵮
--
*Thanks & Regards *
*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/
RE: How to retriew data from a specific bucket in hive?
Posted by Somnath Pandeya <So...@infosys.com>.
Hi Unmesha,
You can simply do
Hdfs dfs –cat /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00000/*
And check the content of file.
Thanks
Somnath
From: unmesha sreeveni [mailto:unmeshabiju@gmail.com]
Sent: Monday, December 01, 2014 10:31 AM
To: User - Hive
Subject: How to retriew data from a specific bucket in hive?
I created a table in hive
create table HiveMB
(EmployeeID Int,FirstName String,Designation String,Salary Int,Department String)
clustered by (Department) into 3 buckets
stored as orc TBLPROPERTIES ('transactional'='true') ;
where my file format is like
1,Anne,Admin,50000,A
2,Gokul,Admin,50000,B
3,Janet,Sales,60000,A
4,Hari,Admin,50000,C
5,Sanker,Admin,50000,C
and the data went into 3 buckets for department.
When I examined the warehouse , there are 3 buckets
Found 3 items
-rwxr-xr-x 3 aibladmin hadoop 252330 2014-11-28 14:46 /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00000
-rwxr-xr-x 3 aibladmin hadoop 100421 2014-11-28 14:45 /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00001
-rwxr-xr-x 3 aibladmin hadoop 313047 2014-11-28 14:46 /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00002
How will I be able to retrieve 1 such bucket.
When I did a -cat, It is not in human readable format.
How can I able to see the data stored into each bucket?
--
Thanks & Regards
Unmesha Sreeveni U.B
Hadoop, Bigdata Developer
Centre for Cyber Security | Amrita Vishwa Vidyapeetham
http://www.unmeshasreeveni.blogspot.in/
**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message. Further, you are not
to copy, disclose, or distribute this e-mail or its contents to any other person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has taken
every reasonable precaution to minimize this risk, but is not liable for any damage
you may sustain as a result of any virus in this e-mail. You should carry out your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***