You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Weiwei Hsieh <wh...@slingmedia.com> on 2010/02/25 03:47:38 UTC

How to generate Row Id in Hive?

All,

Could anyone tell me on how to generate a row id for a new record in Hive?

Many thanks.

weiwei

Re: How to generate Row Id in Hive?

Posted by Zheng Shao <zs...@gmail.com>.

Yes usually there is a single map-reduce job.

The reason that Hive says 2 map-reduce jobs is because there is a
conditional task which will merge tiny files into smaller number of
files.
The conditional task It may or may not run depending on the output
files sizes, and they can be disabled also.

Zheng

On Mon, Mar 1, 2010 at 10:34 AM, Weiwei Hsieh <wh...@slingmedia.com> wrote:
> Thank you all for helps!!      Please bear with me.   I have one more
> question here:
>
>
>
> If I have table “t1 (id string, c1 string)” and “t2 (c1 string)”.    Now I
> have a statement of “insert overwrite table t1 (c1) select c1 from t2”, will
> this be one task?   I need id for each record in t1.
>
>
>
> From: Carl Steinbach [mailto:carl@cloudera.com]
> Sent: Thursday, February 25, 2010 9:11 PM
> To: hive-user@hadoop.apache.org
> Subject: Re: How to generate Row Id in Hive?
>
>
>
> Making JobConf accessible to UDFs is part of the plan behind HIVE-1016
> (Distributed Cache access for UDFs). I'll file a JIRA for a rowid() UDF and
> link it to this.
>
> Carl
>
> On Thu, Feb 25, 2010 at 9:00 PM, Zheng Shao <zs...@gmail.com> wrote:
>
> Not right now. It should be pretty simple to do though. We can expose
> the current JobConf via a static method in ExecMapper.
>
> Zheng
>
> On Thu, Feb 25, 2010 at 7:52 AM, Todd Lipcon <to...@cloudera.com> wrote:
>> Zheng: is there a way to get at the hadoop conf variables from within a
>> query? If so, you could use mapred.task.id to get a unique string.
>> -Todd
>>
>> On Thu, Feb 25, 2010 at 12:42 AM, Zheng Shao <zs...@gmail.com> wrote:
>>>
>>> Since Hive runs many mappers/reducers in parallel, there is no way to
>>> generate a globally unique increasing row id.
>>> If you are OK with that, you can easily write a "non-deterministic"
>>> UDF. See rand() (or UDFRand.java) for example.
>>>
>>> Please open a JIRA if you plan to work on that.
>>>
>>> Zheng
>>>
>>> On Wed, Feb 24, 2010 at 6:47 PM, Weiwei Hsieh <wh...@slingmedia.com>
>>> wrote:
>>> > All,
>>> >
>>> >
>>> >
>>> > Could anyone tell me on how to generate a row id for a new record in
>>> > Hive?
>>> >
>>> >
>>> >
>>> > Many thanks.
>>> >
>>> >
>>> >
>>> > weiwei
>>>
>>>
>>>
>>> --
>>> Yours,
>>> Zheng
>>
>>
>
>
> --
> Yours,
> Zheng
>
>



-- 
Yours,
Zheng

RE: How to generate Row Id in Hive?

Posted by Weiwei Hsieh <wh...@slingmedia.com>.

Thank you all for helps!!      Please bear with me.   I have one more question here:

If I have table "t1 (id string, c1 string)" and "t2 (c1 string)".    Now I have a statement of "insert overwrite table t1 (c1) select c1 from t2", will this be one task?   I need id for each record in t1.

From: Carl Steinbach [mailto:carl@cloudera.com]
Sent: Thursday, February 25, 2010 9:11 PM
To: hive-user@hadoop.apache.org
Subject: Re: How to generate Row Id in Hive?

Making JobConf accessible to UDFs is part of the plan behind HIVE-1016
(Distributed Cache access for UDFs). I'll file a JIRA for a rowid() UDF and
link it to this.

Carl
On Thu, Feb 25, 2010 at 9:00 PM, Zheng Shao <zs...@gmail.com>> wrote:
Not right now. It should be pretty simple to do though. We can expose
the current JobConf via a static method in ExecMapper.

Zheng

On Thu, Feb 25, 2010 at 7:52 AM, Todd Lipcon <to...@cloudera.com>> wrote:
> Zheng: is there a way to get at the hadoop conf variables from within a
> query? If so, you could use mapred.task.id<http://mapred.task.id> to get a unique string.
> -Todd
>
> On Thu, Feb 25, 2010 at 12:42 AM, Zheng Shao <zs...@gmail.com>> wrote:
>>
>> Since Hive runs many mappers/reducers in parallel, there is no way to
>> generate a globally unique increasing row id.
>> If you are OK with that, you can easily write a "non-deterministic"
>> UDF. See rand() (or UDFRand.java) for example.
>>
>> Please open a JIRA if you plan to work on that.
>>
>> Zheng
>>
>> On Wed, Feb 24, 2010 at 6:47 PM, Weiwei Hsieh <wh...@slingmedia.com>>
>> wrote:
>> > All,
>> >
>> >
>> >
>> > Could anyone tell me on how to generate a row id for a new record in
>> > Hive?
>> >
>> >
>> >
>> > Many thanks.
>> >
>> >
>> >
>> > weiwei
>>
>>
>>
>> --
>> Yours,
>> Zheng
>
>

--
Yours,
Zheng

Re: How to generate Row Id in Hive?

Posted by Carl Steinbach <ca...@cloudera.com>.

Making JobConf accessible to UDFs is part of the plan behind HIVE-1016
(Distributed Cache access for UDFs). I'll file a JIRA for a rowid() UDF and
link it to this.

Carl

On Thu, Feb 25, 2010 at 9:00 PM, Zheng Shao <zs...@gmail.com> wrote:

> Not right now. It should be pretty simple to do though. We can expose
> the current JobConf via a static method in ExecMapper.
>
> Zheng
>
> On Thu, Feb 25, 2010 at 7:52 AM, Todd Lipcon <to...@cloudera.com> wrote:
> > Zheng: is there a way to get at the hadoop conf variables from within a
> > query? If so, you could use mapred.task.id to get a unique string.
> > -Todd
> >
> > On Thu, Feb 25, 2010 at 12:42 AM, Zheng Shao <zs...@gmail.com> wrote:
> >>
> >> Since Hive runs many mappers/reducers in parallel, there is no way to
> >> generate a globally unique increasing row id.
> >> If you are OK with that, you can easily write a "non-deterministic"
> >> UDF. See rand() (or UDFRand.java) for example.
> >>
> >> Please open a JIRA if you plan to work on that.
> >>
> >> Zheng
> >>
> >> On Wed, Feb 24, 2010 at 6:47 PM, Weiwei Hsieh <wh...@slingmedia.com>
> >> wrote:
> >> > All,
> >> >
> >> >
> >> >
> >> > Could anyone tell me on how to generate a row id for a new record in
> >> > Hive?
> >> >
> >> >
> >> >
> >> > Many thanks.
> >> >
> >> >
> >> >
> >> > weiwei
> >>
> >>
> >>
> >> --
> >> Yours,
> >> Zheng
> >
> >
>
>
>
> --
> Yours,
> Zheng
>

Re: How to generate Row Id in Hive?

Posted by Zheng Shao <zs...@gmail.com>.

Not right now. It should be pretty simple to do though. We can expose
the current JobConf via a static method in ExecMapper.

Zheng

On Thu, Feb 25, 2010 at 7:52 AM, Todd Lipcon <to...@cloudera.com> wrote:
> Zheng: is there a way to get at the hadoop conf variables from within a
> query? If so, you could use mapred.task.id to get a unique string.
> -Todd
>
> On Thu, Feb 25, 2010 at 12:42 AM, Zheng Shao <zs...@gmail.com> wrote:
>>
>> Since Hive runs many mappers/reducers in parallel, there is no way to
>> generate a globally unique increasing row id.
>> If you are OK with that, you can easily write a "non-deterministic"
>> UDF. See rand() (or UDFRand.java) for example.
>>
>> Please open a JIRA if you plan to work on that.
>>
>> Zheng
>>
>> On Wed, Feb 24, 2010 at 6:47 PM, Weiwei Hsieh <wh...@slingmedia.com>
>> wrote:
>> > All,
>> >
>> >
>> >
>> > Could anyone tell me on how to generate a row id for a new record in
>> > Hive?
>> >
>> >
>> >
>> > Many thanks.
>> >
>> >
>> >
>> > weiwei
>>
>>
>>
>> --
>> Yours,
>> Zheng
>
>



-- 
Yours,
Zheng

Re: How to generate Row Id in Hive?

Posted by Todd Lipcon <to...@cloudera.com>.

Zheng: is there a way to get at the hadoop conf variables from within a
query? If so, you could use mapred.task.id to get a unique string.

-Todd

On Thu, Feb 25, 2010 at 12:42 AM, Zheng Shao <zs...@gmail.com> wrote:

> Since Hive runs many mappers/reducers in parallel, there is no way to
> generate a globally unique increasing row id.
> If you are OK with that, you can easily write a "non-deterministic"
> UDF. See rand() (or UDFRand.java) for example.
>
> Please open a JIRA if you plan to work on that.
>
> Zheng
>
> On Wed, Feb 24, 2010 at 6:47 PM, Weiwei Hsieh <wh...@slingmedia.com>
> wrote:
> > All,
> >
> >
> >
> > Could anyone tell me on how to generate a row id for a new record in
> Hive?
> >
> >
> >
> > Many thanks.
> >
> >
> >
> > weiwei
>
>
>
> --
> Yours,
> Zheng
>

Re: How to generate Row Id in Hive?

Posted by Zheng Shao <zs...@gmail.com>.

Since Hive runs many mappers/reducers in parallel, there is no way to
generate a globally unique increasing row id.
If you are OK with that, you can easily write a "non-deterministic"
UDF. See rand() (or UDFRand.java) for example.

Please open a JIRA if you plan to work on that.

Zheng

On Wed, Feb 24, 2010 at 6:47 PM, Weiwei Hsieh <wh...@slingmedia.com> wrote:
> All,
>
>
>
> Could anyone tell me on how to generate a row id for a new record in Hive?
>
>
>
> Many thanks.
>
>
>
> weiwei

-- 
Yours,
Zheng