You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by anusha Mangina <an...@gmail.com> on 2014/09/18 21:36:10 UTC

Split the output file and name them

My output file part-r-0000 at
hive/warehouse/path/to/output_table_name/part-r-0000 has following content
inside


emp1_id emp1_name emp1_salary emp1_address emp1_dob  emp1_joiningdate
emp2_id emp2_name emp2_salary emp2_address emp2_dob  emp2_joiningdate
emp3_id emp3_name emp3_salary emp3_address emp3_dob  emp3_joiningdate
emp4_id emp4_name emp4_salary emp4_address emp4_dob  emp4_joiningdate
emp5_id emp5_name emp5_salary emp5_address emp5_dob  emp5_joiningdate

Basically output table will have n distinct employee records.
How can i split the output file and get n separate files inside
output_table_name and name them with emp_id.? (I dont want partitioning)

so my output table folder should have n separate files with emp_id as name


C:hadoop\bin>hadoop fs -ls  hive/warehouse/path/to/output_table_name


emp1_id
emp2_id
emp3_id
emp4_id
emp5_id

Re: Split the output file and name them

Posted by Anusha <an...@gmail.com>.

Thanks Karthik ,

I am using MultipleOutputs and still my output file name remains same . 

Does it have any constraints over HCatRecord ? 

Kindest Regards,
Anusha 

> On Sep 18, 2014, at 22:01, Karthiksrivasthava <ka...@gmail.com> wrote:
> 
> Anusha ,
> 
> I think you have to write a MapReduce and use Multipleoutput To split your output 
> 
> Thanks 
> Karthik
>> On Sep 18, 2014, at 15:36, anusha Mangina <an...@gmail.com> wrote:
>> 
>> My output file part-r-0000 at hive/warehouse/path/to/output_table_name/part-r-0000 has following content inside
>> 
>> 
>> emp1_id emp1_name emp1_salary emp1_address emp1_dob  emp1_joiningdate
>> emp2_id emp2_name emp2_salary emp2_address emp2_dob  emp2_joiningdate
>> emp3_id emp3_name emp3_salary emp3_address emp3_dob  emp3_joiningdate
>> emp4_id emp4_name emp4_salary emp4_address emp4_dob  emp4_joiningdate
>> emp5_id emp5_name emp5_salary emp5_address emp5_dob  emp5_joiningdate
>> 
>> Basically output table will have n distinct employee records. 
>> How can i split the output file and get n separate files inside output_table_name and name them with emp_id.? (I dont want partitioning)
>> 
>> so my output table folder should have n separate files with emp_id as name
>> 
>> 
>> C:hadoop\bin>hadoop fs -ls  hive/warehouse/path/to/output_table_name
>> 
>> 
>> emp1_id
>> emp2_id
>> emp3_id
>> emp4_id
>> emp5_id
>>

Re: Split the output file and name them

Posted by Karthiksrivasthava <ka...@gmail.com>.

Anusha ,

I think you have to write a MapReduce and use Multipleoutput To split your output 

Thanks 
Karthik
> On Sep 18, 2014, at 15:36, anusha Mangina <an...@gmail.com> wrote:
> 
> My output file part-r-0000 at hive/warehouse/path/to/output_table_name/part-r-0000 has following content inside
> 
> 
> emp1_id emp1_name emp1_salary emp1_address emp1_dob  emp1_joiningdate
> emp2_id emp2_name emp2_salary emp2_address emp2_dob  emp2_joiningdate
> emp3_id emp3_name emp3_salary emp3_address emp3_dob  emp3_joiningdate
> emp4_id emp4_name emp4_salary emp4_address emp4_dob  emp4_joiningdate
> emp5_id emp5_name emp5_salary emp5_address emp5_dob  emp5_joiningdate
> 
> Basically output table will have n distinct employee records. 
> How can i split the output file and get n separate files inside output_table_name and name them with emp_id.? (I dont want partitioning)
> 
> so my output table folder should have n separate files with emp_id as name
> 
> 
> C:hadoop\bin>hadoop fs -ls  hive/warehouse/path/to/output_table_name
> 
> 
> emp1_id
> emp2_id
> emp3_id
> emp4_id
> emp5_id
>