You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by anusha Mangina <an...@gmail.com> on 2014/09/18 21:36:10 UTC
Split the output file and name them
My output file part-r-0000 at
hive/warehouse/path/to/output_table_name/part-r-0000 has following content
inside
emp1_id emp1_name emp1_salary emp1_address emp1_dob emp1_joiningdate
emp2_id emp2_name emp2_salary emp2_address emp2_dob emp2_joiningdate
emp3_id emp3_name emp3_salary emp3_address emp3_dob emp3_joiningdate
emp4_id emp4_name emp4_salary emp4_address emp4_dob emp4_joiningdate
emp5_id emp5_name emp5_salary emp5_address emp5_dob emp5_joiningdate
Basically output table will have n distinct employee records.
How can i split the output file and get n separate files inside
output_table_name and name them with emp_id.? (I dont want partitioning)
so my output table folder should have n separate files with emp_id as name
C:hadoop\bin>hadoop fs -ls hive/warehouse/path/to/output_table_name
emp1_id
emp2_id
emp3_id
emp4_id
emp5_id
Re: Split the output file and name them
Posted by Anusha <an...@gmail.com>.
Thanks Karthik ,
I am using MultipleOutputs and still my output file name remains same .
Does it have any constraints over HCatRecord ?
Kindest Regards,
Anusha
> On Sep 18, 2014, at 22:01, Karthiksrivasthava <ka...@gmail.com> wrote:
>
> Anusha ,
>
> I think you have to write a MapReduce and use Multipleoutput To split your output
>
> Thanks
> Karthik
>> On Sep 18, 2014, at 15:36, anusha Mangina <an...@gmail.com> wrote:
>>
>> My output file part-r-0000 at hive/warehouse/path/to/output_table_name/part-r-0000 has following content inside
>>
>>
>> emp1_id emp1_name emp1_salary emp1_address emp1_dob emp1_joiningdate
>> emp2_id emp2_name emp2_salary emp2_address emp2_dob emp2_joiningdate
>> emp3_id emp3_name emp3_salary emp3_address emp3_dob emp3_joiningdate
>> emp4_id emp4_name emp4_salary emp4_address emp4_dob emp4_joiningdate
>> emp5_id emp5_name emp5_salary emp5_address emp5_dob emp5_joiningdate
>>
>> Basically output table will have n distinct employee records.
>> How can i split the output file and get n separate files inside output_table_name and name them with emp_id.? (I dont want partitioning)
>>
>> so my output table folder should have n separate files with emp_id as name
>>
>>
>> C:hadoop\bin>hadoop fs -ls hive/warehouse/path/to/output_table_name
>>
>>
>> emp1_id
>> emp2_id
>> emp3_id
>> emp4_id
>> emp5_id
>>
Re: Split the output file and name them
Posted by Karthiksrivasthava <ka...@gmail.com>.
Anusha ,
I think you have to write a MapReduce and use Multipleoutput To split your output
Thanks
Karthik
> On Sep 18, 2014, at 15:36, anusha Mangina <an...@gmail.com> wrote:
>
> My output file part-r-0000 at hive/warehouse/path/to/output_table_name/part-r-0000 has following content inside
>
>
> emp1_id emp1_name emp1_salary emp1_address emp1_dob emp1_joiningdate
> emp2_id emp2_name emp2_salary emp2_address emp2_dob emp2_joiningdate
> emp3_id emp3_name emp3_salary emp3_address emp3_dob emp3_joiningdate
> emp4_id emp4_name emp4_salary emp4_address emp4_dob emp4_joiningdate
> emp5_id emp5_name emp5_salary emp5_address emp5_dob emp5_joiningdate
>
> Basically output table will have n distinct employee records.
> How can i split the output file and get n separate files inside output_table_name and name them with emp_id.? (I dont want partitioning)
>
> so my output table folder should have n separate files with emp_id as name
>
>
> C:hadoop\bin>hadoop fs -ls hive/warehouse/path/to/output_table_name
>
>
> emp1_id
> emp2_id
> emp3_id
> emp4_id
> emp5_id
>