You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Tim Robertson <ti...@gmail.com> on 2011/02/21 05:48:04 UTC
UDFRowSequence called in Map() ?
Hi all,
I am using UDFRowSequence as follows:
CREATE TEMPORARY FUNCTION rowSequence AS
'org.apache.hadoop.hive.contrib.udf.UDFRowSequence';
mapred.reduce.tasks=1;
CREATE TABLE temp_tc1_test
as
SELECT
rowSequence() AS id,
data_resource_id,
local_id,
local_parent_id,
name,
author
FROM normalized;
I see 2 jobs, the first of which running with 2 map() and 0 reduce()
on my small test data. I believe the rowSequence() is being called in
the map and not the reduce as the results have duplicate IDs:
select * from temp_tc1_test where id=8915;
8915 167 1148 1113 Cytospora elaeagni Allesch.
8915 168 7 6 Achromadora inflata Abebe & Coomans, 1996
Is there any way to enforce the UDF is called in the reduce?
Thanks,
Tim
Re: UDFRowSequence called in Map() ?
Posted by John Sichi <js...@fb.com>.
There's no explicit way to enforce it, but in practice you can get it to work by using the UDF invocation in an outer select, typically with an ORDER or SORT BY on the inner select, as in this example:
http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad#Prepare_Range_Partitioning
Note also this semi-related JIRA issue, which I'm currently working on:
https://issues.apache.org/jira/browse/HIVE-1994
JVS
On Feb 20, 2011, at 8:48 PM, Tim Robertson wrote:
> Hi all,
>
> I am using UDFRowSequence as follows:
>
> CREATE TEMPORARY FUNCTION rowSequence AS
> 'org.apache.hadoop.hive.contrib.udf.UDFRowSequence';
> mapred.reduce.tasks=1;
> CREATE TABLE temp_tc1_test
> as
> SELECT
> rowSequence() AS id,
> data_resource_id,
> local_id,
> local_parent_id,
> name,
> author
> FROM normalized;
>
> I see 2 jobs, the first of which running with 2 map() and 0 reduce()
> on my small test data. I believe the rowSequence() is being called in
> the map and not the reduce as the results have duplicate IDs:
>
> select * from temp_tc1_test where id=8915;
> 8915 167 1148 1113 Cytospora elaeagni Allesch.
> 8915 168 7 6 Achromadora inflata Abebe & Coomans, 1996
>
> Is there any way to enforce the UDF is called in the reduce?
>
> Thanks,
> Tim