You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Cappa Roberto <ro...@guest.telecomitalia.it> on 2010/08/06 08:46:14 UTC

How HIVE manages a join

Hi,

I cannot find any documentation about what algorithm performs HIVE to translate JOIN clauses to Map-Reduce tasks.

In particular, if I have two tables A and B, each table is written on a separate file and each file is splitted on hadoop nodes. When I perform a JOIN with A.column = B.column, the framework has to compare full data from the first file and full data from the second file. In order to perform a full scan of all possibile combinations of values, how can hadoop perform it? If each node contains a portion of each file, it seems not possible to have a complete comparison. Does one of the two files enterely replicated on each node? Or, does HIVE use another kind of strategy/optimization?

Thanks.

Re: How HIVE manages a join

Posted by akshaya iyengar <ak...@gmail.com>.

Thank you very much for the links.

Akshaya
On Thu, Aug 12, 2010 at 5:16 PM, Raghu Murthy <rm...@facebook.com> wrote:

> The hive.pdf link in the Design page is this one:
> http://www.slideshare.net/namit_jain/hive-demo-paper-at-vldb-2009
>
> A later paper in ICDE'10 is available here:
> http://i.stanford.edu/~ragho/hive-icde2010.pdf<http://i.stanford.edu/%7Eragho/hive-icde2010.pdf>
>
> Both of these papers and others are linked from:
> http://wiki.apache.org/hadoop/Hive/Presentations
>
> Hope this helps.
>
>
> On Aug 12, 2010, at 2:05 PM, akshaya iyengar wrote:
>
> Hello,
> I apologize if this is out of context for the current thread. I was looking
> for the Hive architecture diagram on this page
> http://wiki.apache.org/hadoop/Hive/Design . The pdf link doesnt seem to
> work for me as well.
>
> It would be a great help if someone could direct me to this information.
>
> Thanks,
> Akshaya
>
> On Thu, Aug 12, 2010 at 4:37 PM, Edward Capriolo <ed...@gmail.com>wrote:
>
>> Joydeep,
>>
>> I am sorry. I put that when I thought we were going to actively move
>> to xdocs. You an remove that if you like.
>>
>> As i said in a thread before the problem with the wiki is that no one
>> actively updates it. Example:
>>
>> http://wiki.apache.org/hadoop/Hive/LanguageManual/Select
>> oopse: Really what about "in support"...
>> https://issues.apache.org/jira/browse/HIVE-801
>>
>> Which is why I hold the option that all patches except bug fixes
>> should probably come with xdocs, People are free to disagree.
>>
>> Edward
>>
>> On Thu, Aug 12, 2010 at 3:16 AM, Joydeep Sen Sarma <js...@facebook.com>
>> wrote:
>> > i hate this message: 'THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT
>> EDIT!Join Syntax'
>> >
>> > why must edits to the wiki be banned if there are xdocs? hadoop has
>> both.
>> >
>> > there will always be things that are not captured in xdocs. it's pretty
>> sad to discourage free form edits by people who want to contribute without
>> checking out source. (what is this - the 80s?)
>> > ________________________________________
>> > From: Edward Capriolo [edlinuxguru@gmail.com]
>> > Sent: Tuesday, August 10, 2010 2:57 PM
>> > To: hive-user@hadoop.apache.org
>> > Cc: hive-dev@hadoop.apache.org
>> > Subject: Re: How HIVE manages a join
>> >
>> > Sorry.
>> > $hive_root/docs/xdocs/language_manual/joins.xml
>> >
>> > On Tue, Aug 10, 2010 at 5:57 PM, Edward Capriolo <ed...@gmail.com>
>> wrote:
>> >> This page is is already in version control..
>> >>
>> >> /home/edward/cassandra-handler/docs/xdocs/language_manual/joins.xml
>> >>
>> >> Edward
>> >>
>> >> On Tue, Aug 10, 2010 at 5:15 PM, Carl Steinbach <ca...@cloudera.com>
>> wrote:
>> >>> Hi Yongqiang,
>> >>> Please go ahead and update the wiki page. I will copy it over to
>> version
>> >>> control when you are done.
>> >>> Thanks.
>> >>> Carl
>> >>>
>> >>> On Tue, Aug 10, 2010 at 2:11 PM, yongqiang he <
>> heyongqiangict@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> In the Hive Join wiki page, it says
>> >>>> "THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"
>> >>>>
>> >>>> Where should i do the update?
>> >>>>
>> >>>> On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he <
>> heyongqiangict@gmail.com>
>> >>>> wrote:
>> >>>> > Yeah. The sort merge bucket mapjoin has been finished for sometime,
>> >>>> > and seems stable now. I did one skew join but haven't get a chance
>> to
>> >>>> > look at another skew join Namit mentioned to me. But definitely
>> should
>> >>>> > update the wiki earlier. My bad.
>> >>>> >
>> >>>> > On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher <
>> hammer@cloudera.com>
>> >>>> > wrote:
>> >>>> >> Yongqiang mentioned he was going to update the wiki with this
>> >>>> >> information in
>> >>>> >> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
>> >>>> >>
>> >>>> >> Yongqiang, have you gotten a chance to complete the sort merge
>> bucket
>> >>>> >> map
>> >>>> >> join and the other skew join you mention in the above thread?
>> >>>> >>
>> >>>> >> Thanks,
>> >>>> >> Jeff
>> >>>> >>
>> >>>> >> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
>> >>>> >> <bh...@students.iiit.ac.in> wrote:
>> >>>> >>>
>> >>>> >>> Roberto ..
>> >>>> >>>
>> >>>> >>> You can find these links useful ..
>> >>>> >>>
>> >>>> >>>
>> >>>> >>>
>> >>>> >>>
>> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
>> >>>> >>> - Simple joins and optimizations..
>> >>>> >>>
>> >>>> >>>
>> >>>> >>>
>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  -
>> >>>> >>> New kind of joins / features of hive ..
>> >>>> >>>
>> >>>> >>> Thanks
>> >>>> >>>
>> >>>> >>> Bharath.V
>> >>>> >>> 4th year Undergraduate..
>> >>>> >>> IIIT Hyderabad
>> >>>> >>>
>> >>>> >>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
>> >>>> >>> <ro...@guest.telecomitalia.it> wrote:
>> >>>> >>>>
>> >>>> >>>> Hi,
>> >>>> >>>>
>> >>>> >>>> I cannot find any documentation about what algorithm performs
>> HIVE to
>> >>>> >>>> translate JOIN clauses to Map-Reduce tasks.
>> >>>> >>>>
>> >>>> >>>> In particular, if I have two tables A and B, each table is
>> written on
>> >>>> >>>> a
>> >>>> >>>> separate file and each file is splitted on hadoop nodes. When I
>> >>>> >>>> perform a
>> >>>> >>>> JOIN with A.column = B.column, the framework has to compare full
>> data
>> >>>> >>>> from
>> >>>> >>>> the first file and full data from the second file. In order to
>> >>>> >>>> perform a
>> >>>> >>>> full scan of all possibile combinations of values, how can
>> hadoop
>> >>>> >>>> perform
>> >>>> >>>> it? If each node contains a portion of each file, it seems not
>> >>>> >>>> possible to
>> >>>> >>>> have a complete comparison. Does one of the two files enterely
>> >>>> >>>> replicated on
>> >>>> >>>> each node? Or, does HIVE use another kind of
>> strategy/optimization?
>> >>>> >>>>
>> >>>> >>>> Thanks.
>> >>>> >>
>> >>>> >>
>> >>>> >
>> >>>
>> >>>
>> >>
>> >
>>
>
>
>

Re: How HIVE manages a join

Posted by Raghu Murthy <rm...@facebook.com>.

The hive.pdf link in the Design page is this one:
http://www.slideshare.net/namit_jain/hive-demo-paper-at-vldb-2009

A later paper in ICDE'10 is available here:
http://i.stanford.edu/~ragho/hive-icde2010.pdf

Both of these papers and others are linked from:
http://wiki.apache.org/hadoop/Hive/Presentations

Hope this helps.


On Aug 12, 2010, at 2:05 PM, akshaya iyengar wrote:

Hello,
I apologize if this is out of context for the current thread. I was looking for the Hive architecture diagram on this page http://wiki.apache.org/hadoop/Hive/Design . The pdf link doesnt seem to work for me as well.

It would be a great help if someone could direct me to this information.

Thanks,
Akshaya

On Thu, Aug 12, 2010 at 4:37 PM, Edward Capriolo <ed...@gmail.com>> wrote:
Joydeep,

I am sorry. I put that when I thought we were going to actively move
to xdocs. You an remove that if you like.

As i said in a thread before the problem with the wiki is that no one
actively updates it. Example:

http://wiki.apache.org/hadoop/Hive/LanguageManual/Select
oopse: Really what about "in support"...
https://issues.apache.org/jira/browse/HIVE-801

Which is why I hold the option that all patches except bug fixes
should probably come with xdocs, People are free to disagree.

Edward

On Thu, Aug 12, 2010 at 3:16 AM, Joydeep Sen Sarma <js...@facebook.com>> wrote:
> i hate this message: 'THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax'
>
> why must edits to the wiki be banned if there are xdocs? hadoop has both.
>
> there will always be things that are not captured in xdocs. it's pretty sad to discourage free form edits by people who want to contribute without checking out source. (what is this - the 80s?)
> ________________________________________
> From: Edward Capriolo [edlinuxguru@gmail.com<ma...@gmail.com>]
> Sent: Tuesday, August 10, 2010 2:57 PM
> To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
> Cc: hive-dev@hadoop.apache.org<ma...@hadoop.apache.org>
> Subject: Re: How HIVE manages a join
>
> Sorry.
> $hive_root/docs/xdocs/language_manual/joins.xml
>
> On Tue, Aug 10, 2010 at 5:57 PM, Edward Capriolo <ed...@gmail.com>> wrote:
>> This page is is already in version control..
>>
>> /home/edward/cassandra-handler/docs/xdocs/language_manual/joins.xml
>>
>> Edward
>>
>> On Tue, Aug 10, 2010 at 5:15 PM, Carl Steinbach <ca...@cloudera.com>> wrote:
>>> Hi Yongqiang,
>>> Please go ahead and update the wiki page. I will copy it over to version
>>> control when you are done.
>>> Thanks.
>>> Carl
>>>
>>> On Tue, Aug 10, 2010 at 2:11 PM, yongqiang he <he...@gmail.com>>
>>> wrote:
>>>>
>>>> In the Hive Join wiki page, it says
>>>> "THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"
>>>>
>>>> Where should i do the update?
>>>>
>>>> On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he <he...@gmail.com>>
>>>> wrote:
>>>> > Yeah. The sort merge bucket mapjoin has been finished for sometime,
>>>> > and seems stable now. I did one skew join but haven't get a chance to
>>>> > look at another skew join Namit mentioned to me. But definitely should
>>>> > update the wiki earlier. My bad.
>>>> >
>>>> > On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher <ha...@cloudera.com>>
>>>> > wrote:
>>>> >> Yongqiang mentioned he was going to update the wiki with this
>>>> >> information in
>>>> >> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
>>>> >>
>>>> >> Yongqiang, have you gotten a chance to complete the sort merge bucket
>>>> >> map
>>>> >> join and the other skew join you mention in the above thread?
>>>> >>
>>>> >> Thanks,
>>>> >> Jeff
>>>> >>
>>>> >> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
>>>> >> <bh...@students.iiit.ac.in>> wrote:
>>>> >>>
>>>> >>> Roberto ..
>>>> >>>
>>>> >>> You can find these links useful ..
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
>>>> >>> - Simple joins and optimizations..
>>>> >>>
>>>> >>>
>>>> >>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  -
>>>> >>> New kind of joins / features of hive ..
>>>> >>>
>>>> >>> Thanks
>>>> >>>
>>>> >>> Bharath.V
>>>> >>> 4th year Undergraduate..
>>>> >>> IIIT Hyderabad
>>>> >>>
>>>> >>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
>>>> >>> <ro...@guest.telecomitalia.it>> wrote:
>>>> >>>>
>>>> >>>> Hi,
>>>> >>>>
>>>> >>>> I cannot find any documentation about what algorithm performs HIVE to
>>>> >>>> translate JOIN clauses to Map-Reduce tasks.
>>>> >>>>
>>>> >>>> In particular, if I have two tables A and B, each table is written on
>>>> >>>> a
>>>> >>>> separate file and each file is splitted on hadoop nodes. When I
>>>> >>>> perform a
>>>> >>>> JOIN with A.column = B.column, the framework has to compare full data
>>>> >>>> from
>>>> >>>> the first file and full data from the second file. In order to
>>>> >>>> perform a
>>>> >>>> full scan of all possibile combinations of values, how can hadoop
>>>> >>>> perform
>>>> >>>> it? If each node contains a portion of each file, it seems not
>>>> >>>> possible to
>>>> >>>> have a complete comparison. Does one of the two files enterely
>>>> >>>> replicated on
>>>> >>>> each node? Or, does HIVE use another kind of strategy/optimization?
>>>> >>>>
>>>> >>>> Thanks.
>>>> >>
>>>> >>
>>>> >
>>>
>>>
>>
>

Re: How HIVE manages a join

Posted by Edward Capriolo <ed...@gmail.com>.

On Thu, Aug 12, 2010 at 5:05 PM, akshaya iyengar
<ak...@gmail.com> wrote:
> Hello,
> I apologize if this is out of context for the current thread. I was looking
> for the Hive architecture diagram on this page
> http://wiki.apache.org/hadoop/Hive/Design . The pdf link doesnt seem to work
> for me as well.
>
> It would be a great help if someone could direct me to this information.
>
> Thanks,
> Akshaya
>
> On Thu, Aug 12, 2010 at 4:37 PM, Edward Capriolo <ed...@gmail.com>
> wrote:
>>
>> Joydeep,
>>
>> I am sorry. I put that when I thought we were going to actively move
>> to xdocs. You an remove that if you like.
>>
>> As i said in a thread before the problem with the wiki is that no one
>> actively updates it. Example:
>>
>> http://wiki.apache.org/hadoop/Hive/LanguageManual/Select
>> oopse: Really what about "in support"...
>> https://issues.apache.org/jira/browse/HIVE-801
>>
>> Which is why I hold the option that all patches except bug fixes
>> should probably come with xdocs, People are free to disagree.
>>
>> Edward
>>
>> On Thu, Aug 12, 2010 at 3:16 AM, Joydeep Sen Sarma <js...@facebook.com>
>> wrote:
>> > i hate this message: 'THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT
>> > EDIT!Join Syntax'
>> >
>> > why must edits to the wiki be banned if there are xdocs? hadoop has
>> > both.
>> >
>> > there will always be things that are not captured in xdocs. it's pretty
>> > sad to discourage free form edits by people who want to contribute without
>> > checking out source. (what is this - the 80s?)
>> > ________________________________________
>> > From: Edward Capriolo [edlinuxguru@gmail.com]
>> > Sent: Tuesday, August 10, 2010 2:57 PM
>> > To: hive-user@hadoop.apache.org
>> > Cc: hive-dev@hadoop.apache.org
>> > Subject: Re: How HIVE manages a join
>> >
>> > Sorry.
>> > $hive_root/docs/xdocs/language_manual/joins.xml
>> >
>> > On Tue, Aug 10, 2010 at 5:57 PM, Edward Capriolo <ed...@gmail.com>
>> > wrote:
>> >> This page is is already in version control..
>> >>
>> >> /home/edward/cassandra-handler/docs/xdocs/language_manual/joins.xml
>> >>
>> >> Edward
>> >>
>> >> On Tue, Aug 10, 2010 at 5:15 PM, Carl Steinbach <ca...@cloudera.com>
>> >> wrote:
>> >>> Hi Yongqiang,
>> >>> Please go ahead and update the wiki page. I will copy it over to
>> >>> version
>> >>> control when you are done.
>> >>> Thanks.
>> >>> Carl
>> >>>
>> >>> On Tue, Aug 10, 2010 at 2:11 PM, yongqiang he
>> >>> <he...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> In the Hive Join wiki page, it says
>> >>>> "THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"
>> >>>>
>> >>>> Where should i do the update?
>> >>>>
>> >>>> On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he
>> >>>> <he...@gmail.com>
>> >>>> wrote:
>> >>>> > Yeah. The sort merge bucket mapjoin has been finished for sometime,
>> >>>> > and seems stable now. I did one skew join but haven't get a chance
>> >>>> > to
>> >>>> > look at another skew join Namit mentioned to me. But definitely
>> >>>> > should
>> >>>> > update the wiki earlier. My bad.
>> >>>> >
>> >>>> > On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher
>> >>>> > <ha...@cloudera.com>
>> >>>> > wrote:
>> >>>> >> Yongqiang mentioned he was going to update the wiki with this
>> >>>> >> information in
>> >>>> >> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
>> >>>> >>
>> >>>> >> Yongqiang, have you gotten a chance to complete the sort merge
>> >>>> >> bucket
>> >>>> >> map
>> >>>> >> join and the other skew join you mention in the above thread?
>> >>>> >>
>> >>>> >> Thanks,
>> >>>> >> Jeff
>> >>>> >>
>> >>>> >> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
>> >>>> >> <bh...@students.iiit.ac.in> wrote:
>> >>>> >>>
>> >>>> >>> Roberto ..
>> >>>> >>>
>> >>>> >>> You can find these links useful ..
>> >>>> >>>
>> >>>> >>>
>> >>>> >>>
>> >>>> >>>
>> >>>> >>> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
>> >>>> >>> - Simple joins and optimizations..
>> >>>> >>>
>> >>>> >>>
>> >>>> >>>
>> >>>> >>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  -
>> >>>> >>> New kind of joins / features of hive ..
>> >>>> >>>
>> >>>> >>> Thanks
>> >>>> >>>
>> >>>> >>> Bharath.V
>> >>>> >>> 4th year Undergraduate..
>> >>>> >>> IIIT Hyderabad
>> >>>> >>>
>> >>>> >>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
>> >>>> >>> <ro...@guest.telecomitalia.it> wrote:
>> >>>> >>>>
>> >>>> >>>> Hi,
>> >>>> >>>>
>> >>>> >>>> I cannot find any documentation about what algorithm performs
>> >>>> >>>> HIVE to
>> >>>> >>>> translate JOIN clauses to Map-Reduce tasks.
>> >>>> >>>>
>> >>>> >>>> In particular, if I have two tables A and B, each table is
>> >>>> >>>> written on
>> >>>> >>>> a
>> >>>> >>>> separate file and each file is splitted on hadoop nodes. When I
>> >>>> >>>> perform a
>> >>>> >>>> JOIN with A.column = B.column, the framework has to compare full
>> >>>> >>>> data
>> >>>> >>>> from
>> >>>> >>>> the first file and full data from the second file. In order to
>> >>>> >>>> perform a
>> >>>> >>>> full scan of all possibile combinations of values, how can
>> >>>> >>>> hadoop
>> >>>> >>>> perform
>> >>>> >>>> it? If each node contains a portion of each file, it seems not
>> >>>> >>>> possible to
>> >>>> >>>> have a complete comparison. Does one of the two files enterely
>> >>>> >>>> replicated on
>> >>>> >>>> each node? Or, does HIVE use another kind of
>> >>>> >>>> strategy/optimization?
>> >>>> >>>>
>> >>>> >>>> Thanks.
>> >>>> >>
>> >>>> >>
>> >>>> >
>> >>>
>> >>>
>> >>
>> >
>
>

The wiki had all attachments removed because crackers will using it to
post STUFF (use your imagination)

Edward

Re: How HIVE manages a join

Posted by John Sichi <js...@facebook.com>.

All the attachments on the wiki disappeared a while ago.  I noticed it recently on my design doc for views.  I think probably somebody disabled attachment upload due to spam, and this made the existing attachments inaccessible as well.  Probably need to open an INFRA ticket in Apache JIRA.

JVS

On Aug 12, 2010, at 2:05 PM, akshaya iyengar wrote:

Hello,
I apologize if this is out of context for the current thread. I was looking for the Hive architecture diagram on this page http://wiki.apache.org/hadoop/Hive/Design . The pdf link doesnt seem to work for me as well.

It would be a great help if someone could direct me to this information.

Thanks,
Akshaya

On Thu, Aug 12, 2010 at 4:37 PM, Edward Capriolo <ed...@gmail.com>> wrote:
Joydeep,

I am sorry. I put that when I thought we were going to actively move
to xdocs. You an remove that if you like.

As i said in a thread before the problem with the wiki is that no one
actively updates it. Example:

http://wiki.apache.org/hadoop/Hive/LanguageManual/Select
oopse: Really what about "in support"...
https://issues.apache.org/jira/browse/HIVE-801

Which is why I hold the option that all patches except bug fixes
should probably come with xdocs, People are free to disagree.

Edward

On Thu, Aug 12, 2010 at 3:16 AM, Joydeep Sen Sarma <js...@facebook.com>> wrote:
> i hate this message: 'THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax'
>
> why must edits to the wiki be banned if there are xdocs? hadoop has both.
>
> there will always be things that are not captured in xdocs. it's pretty sad to discourage free form edits by people who want to contribute without checking out source. (what is this - the 80s?)
> ________________________________________
> From: Edward Capriolo [edlinuxguru@gmail.com<ma...@gmail.com>]
> Sent: Tuesday, August 10, 2010 2:57 PM
> To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
> Cc: hive-dev@hadoop.apache.org<ma...@hadoop.apache.org>
> Subject: Re: How HIVE manages a join
>
> Sorry.
> $hive_root/docs/xdocs/language_manual/joins.xml
>
> On Tue, Aug 10, 2010 at 5:57 PM, Edward Capriolo <ed...@gmail.com>> wrote:
>> This page is is already in version control..
>>
>> /home/edward/cassandra-handler/docs/xdocs/language_manual/joins.xml
>>
>> Edward
>>
>> On Tue, Aug 10, 2010 at 5:15 PM, Carl Steinbach <ca...@cloudera.com>> wrote:
>>> Hi Yongqiang,
>>> Please go ahead and update the wiki page. I will copy it over to version
>>> control when you are done.
>>> Thanks.
>>> Carl
>>>
>>> On Tue, Aug 10, 2010 at 2:11 PM, yongqiang he <he...@gmail.com>>
>>> wrote:
>>>>
>>>> In the Hive Join wiki page, it says
>>>> "THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"
>>>>
>>>> Where should i do the update?
>>>>
>>>> On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he <he...@gmail.com>>
>>>> wrote:
>>>> > Yeah. The sort merge bucket mapjoin has been finished for sometime,
>>>> > and seems stable now. I did one skew join but haven't get a chance to
>>>> > look at another skew join Namit mentioned to me. But definitely should
>>>> > update the wiki earlier. My bad.
>>>> >
>>>> > On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher <ha...@cloudera.com>>
>>>> > wrote:
>>>> >> Yongqiang mentioned he was going to update the wiki with this
>>>> >> information in
>>>> >> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
>>>> >>
>>>> >> Yongqiang, have you gotten a chance to complete the sort merge bucket
>>>> >> map
>>>> >> join and the other skew join you mention in the above thread?
>>>> >>
>>>> >> Thanks,
>>>> >> Jeff
>>>> >>
>>>> >> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
>>>> >> <bh...@students.iiit.ac.in>> wrote:
>>>> >>>
>>>> >>> Roberto ..
>>>> >>>
>>>> >>> You can find these links useful ..
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
>>>> >>> - Simple joins and optimizations..
>>>> >>>
>>>> >>>
>>>> >>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  -
>>>> >>> New kind of joins / features of hive ..
>>>> >>>
>>>> >>> Thanks
>>>> >>>
>>>> >>> Bharath.V
>>>> >>> 4th year Undergraduate..
>>>> >>> IIIT Hyderabad
>>>> >>>
>>>> >>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
>>>> >>> <ro...@guest.telecomitalia.it>> wrote:
>>>> >>>>
>>>> >>>> Hi,
>>>> >>>>
>>>> >>>> I cannot find any documentation about what algorithm performs HIVE to
>>>> >>>> translate JOIN clauses to Map-Reduce tasks.
>>>> >>>>
>>>> >>>> In particular, if I have two tables A and B, each table is written on
>>>> >>>> a
>>>> >>>> separate file and each file is splitted on hadoop nodes. When I
>>>> >>>> perform a
>>>> >>>> JOIN with A.column = B.column, the framework has to compare full data
>>>> >>>> from
>>>> >>>> the first file and full data from the second file. In order to
>>>> >>>> perform a
>>>> >>>> full scan of all possibile combinations of values, how can hadoop
>>>> >>>> perform
>>>> >>>> it? If each node contains a portion of each file, it seems not
>>>> >>>> possible to
>>>> >>>> have a complete comparison. Does one of the two files enterely
>>>> >>>> replicated on
>>>> >>>> each node? Or, does HIVE use another kind of strategy/optimization?
>>>> >>>>
>>>> >>>> Thanks.
>>>> >>
>>>> >>
>>>> >
>>>
>>>
>>
>

Re: How HIVE manages a join

Posted by akshaya iyengar <ak...@gmail.com>.

Hello,
I apologize if this is out of context for the current thread. I was looking
for the Hive architecture diagram on this page
http://wiki.apache.org/hadoop/Hive/Design . The pdf link doesnt seem to work
for me as well.

It would be a great help if someone could direct me to this information.

Thanks,
Akshaya

On Thu, Aug 12, 2010 at 4:37 PM, Edward Capriolo <ed...@gmail.com>wrote:

> Joydeep,
>
> I am sorry. I put that when I thought we were going to actively move
> to xdocs. You an remove that if you like.
>
> As i said in a thread before the problem with the wiki is that no one
> actively updates it. Example:
>
> http://wiki.apache.org/hadoop/Hive/LanguageManual/Select
> oopse: Really what about "in support"...
> https://issues.apache.org/jira/browse/HIVE-801
>
> Which is why I hold the option that all patches except bug fixes
> should probably come with xdocs, People are free to disagree.
>
> Edward
>
> On Thu, Aug 12, 2010 at 3:16 AM, Joydeep Sen Sarma <js...@facebook.com>
> wrote:
> > i hate this message: 'THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT
> EDIT!Join Syntax'
> >
> > why must edits to the wiki be banned if there are xdocs? hadoop has both.
> >
> > there will always be things that are not captured in xdocs. it's pretty
> sad to discourage free form edits by people who want to contribute without
> checking out source. (what is this - the 80s?)
> > ________________________________________
> > From: Edward Capriolo [edlinuxguru@gmail.com]
> > Sent: Tuesday, August 10, 2010 2:57 PM
> > To: hive-user@hadoop.apache.org
> > Cc: hive-dev@hadoop.apache.org
> > Subject: Re: How HIVE manages a join
> >
> > Sorry.
> > $hive_root/docs/xdocs/language_manual/joins.xml
> >
> > On Tue, Aug 10, 2010 at 5:57 PM, Edward Capriolo <ed...@gmail.com>
> wrote:
> >> This page is is already in version control..
> >>
> >> /home/edward/cassandra-handler/docs/xdocs/language_manual/joins.xml
> >>
> >> Edward
> >>
> >> On Tue, Aug 10, 2010 at 5:15 PM, Carl Steinbach <ca...@cloudera.com>
> wrote:
> >>> Hi Yongqiang,
> >>> Please go ahead and update the wiki page. I will copy it over to
> version
> >>> control when you are done.
> >>> Thanks.
> >>> Carl
> >>>
> >>> On Tue, Aug 10, 2010 at 2:11 PM, yongqiang he <
> heyongqiangict@gmail.com>
> >>> wrote:
> >>>>
> >>>> In the Hive Join wiki page, it says
> >>>> "THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"
> >>>>
> >>>> Where should i do the update?
> >>>>
> >>>> On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he <
> heyongqiangict@gmail.com>
> >>>> wrote:
> >>>> > Yeah. The sort merge bucket mapjoin has been finished for sometime,
> >>>> > and seems stable now. I did one skew join but haven't get a chance
> to
> >>>> > look at another skew join Namit mentioned to me. But definitely
> should
> >>>> > update the wiki earlier. My bad.
> >>>> >
> >>>> > On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher <
> hammer@cloudera.com>
> >>>> > wrote:
> >>>> >> Yongqiang mentioned he was going to update the wiki with this
> >>>> >> information in
> >>>> >> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
> >>>> >>
> >>>> >> Yongqiang, have you gotten a chance to complete the sort merge
> bucket
> >>>> >> map
> >>>> >> join and the other skew join you mention in the above thread?
> >>>> >>
> >>>> >> Thanks,
> >>>> >> Jeff
> >>>> >>
> >>>> >> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
> >>>> >> <bh...@students.iiit.ac.in> wrote:
> >>>> >>>
> >>>> >>> Roberto ..
> >>>> >>>
> >>>> >>> You can find these links useful ..
> >>>> >>>
> >>>> >>>
> >>>> >>>
> >>>> >>>
> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
> >>>> >>> - Simple joins and optimizations..
> >>>> >>>
> >>>> >>>
> >>>> >>>
> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  -
> >>>> >>> New kind of joins / features of hive ..
> >>>> >>>
> >>>> >>> Thanks
> >>>> >>>
> >>>> >>> Bharath.V
> >>>> >>> 4th year Undergraduate..
> >>>> >>> IIIT Hyderabad
> >>>> >>>
> >>>> >>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
> >>>> >>> <ro...@guest.telecomitalia.it> wrote:
> >>>> >>>>
> >>>> >>>> Hi,
> >>>> >>>>
> >>>> >>>> I cannot find any documentation about what algorithm performs
> HIVE to
> >>>> >>>> translate JOIN clauses to Map-Reduce tasks.
> >>>> >>>>
> >>>> >>>> In particular, if I have two tables A and B, each table is
> written on
> >>>> >>>> a
> >>>> >>>> separate file and each file is splitted on hadoop nodes. When I
> >>>> >>>> perform a
> >>>> >>>> JOIN with A.column = B.column, the framework has to compare full
> data
> >>>> >>>> from
> >>>> >>>> the first file and full data from the second file. In order to
> >>>> >>>> perform a
> >>>> >>>> full scan of all possibile combinations of values, how can hadoop
> >>>> >>>> perform
> >>>> >>>> it? If each node contains a portion of each file, it seems not
> >>>> >>>> possible to
> >>>> >>>> have a complete comparison. Does one of the two files enterely
> >>>> >>>> replicated on
> >>>> >>>> each node? Or, does HIVE use another kind of
> strategy/optimization?
> >>>> >>>>
> >>>> >>>> Thanks.
> >>>> >>
> >>>> >>
> >>>> >
> >>>
> >>>
> >>
> >
>

Re: How HIVE manages a join

Posted by Edward Capriolo <ed...@gmail.com>.

Joydeep,

I am sorry. I put that when I thought we were going to actively move
to xdocs. You an remove that if you like.

As i said in a thread before the problem with the wiki is that no one
actively updates it. Example:

http://wiki.apache.org/hadoop/Hive/LanguageManual/Select
oopse: Really what about "in support"...
https://issues.apache.org/jira/browse/HIVE-801

Which is why I hold the option that all patches except bug fixes
should probably come with xdocs, People are free to disagree.

Edward

On Thu, Aug 12, 2010 at 3:16 AM, Joydeep Sen Sarma <js...@facebook.com> wrote:
> i hate this message: 'THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax'
>
> why must edits to the wiki be banned if there are xdocs? hadoop has both.
>
> there will always be things that are not captured in xdocs. it's pretty sad to discourage free form edits by people who want to contribute without checking out source. (what is this - the 80s?)
> ________________________________________
> From: Edward Capriolo [edlinuxguru@gmail.com]
> Sent: Tuesday, August 10, 2010 2:57 PM
> To: hive-user@hadoop.apache.org
> Cc: hive-dev@hadoop.apache.org
> Subject: Re: How HIVE manages a join
>
> Sorry.
> $hive_root/docs/xdocs/language_manual/joins.xml
>
> On Tue, Aug 10, 2010 at 5:57 PM, Edward Capriolo <ed...@gmail.com> wrote:
>> This page is is already in version control..
>>
>> /home/edward/cassandra-handler/docs/xdocs/language_manual/joins.xml
>>
>> Edward
>>
>> On Tue, Aug 10, 2010 at 5:15 PM, Carl Steinbach <ca...@cloudera.com> wrote:
>>> Hi Yongqiang,
>>> Please go ahead and update the wiki page. I will copy it over to version
>>> control when you are done.
>>> Thanks.
>>> Carl
>>>
>>> On Tue, Aug 10, 2010 at 2:11 PM, yongqiang he <he...@gmail.com>
>>> wrote:
>>>>
>>>> In the Hive Join wiki page, it says
>>>> "THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"
>>>>
>>>> Where should i do the update?
>>>>
>>>> On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he <he...@gmail.com>
>>>> wrote:
>>>> > Yeah. The sort merge bucket mapjoin has been finished for sometime,
>>>> > and seems stable now. I did one skew join but haven't get a chance to
>>>> > look at another skew join Namit mentioned to me. But definitely should
>>>> > update the wiki earlier. My bad.
>>>> >
>>>> > On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher <ha...@cloudera.com>
>>>> > wrote:
>>>> >> Yongqiang mentioned he was going to update the wiki with this
>>>> >> information in
>>>> >> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
>>>> >>
>>>> >> Yongqiang, have you gotten a chance to complete the sort merge bucket
>>>> >> map
>>>> >> join and the other skew join you mention in the above thread?
>>>> >>
>>>> >> Thanks,
>>>> >> Jeff
>>>> >>
>>>> >> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
>>>> >> <bh...@students.iiit.ac.in> wrote:
>>>> >>>
>>>> >>> Roberto ..
>>>> >>>
>>>> >>> You can find these links useful ..
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
>>>> >>> - Simple joins and optimizations..
>>>> >>>
>>>> >>>
>>>> >>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  -
>>>> >>> New kind of joins / features of hive ..
>>>> >>>
>>>> >>> Thanks
>>>> >>>
>>>> >>> Bharath.V
>>>> >>> 4th year Undergraduate..
>>>> >>> IIIT Hyderabad
>>>> >>>
>>>> >>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
>>>> >>> <ro...@guest.telecomitalia.it> wrote:
>>>> >>>>
>>>> >>>> Hi,
>>>> >>>>
>>>> >>>> I cannot find any documentation about what algorithm performs HIVE to
>>>> >>>> translate JOIN clauses to Map-Reduce tasks.
>>>> >>>>
>>>> >>>> In particular, if I have two tables A and B, each table is written on
>>>> >>>> a
>>>> >>>> separate file and each file is splitted on hadoop nodes. When I
>>>> >>>> perform a
>>>> >>>> JOIN with A.column = B.column, the framework has to compare full data
>>>> >>>> from
>>>> >>>> the first file and full data from the second file. In order to
>>>> >>>> perform a
>>>> >>>> full scan of all possibile combinations of values, how can hadoop
>>>> >>>> perform
>>>> >>>> it? If each node contains a portion of each file, it seems not
>>>> >>>> possible to
>>>> >>>> have a complete comparison. Does one of the two files enterely
>>>> >>>> replicated on
>>>> >>>> each node? Or, does HIVE use another kind of strategy/optimization?
>>>> >>>>
>>>> >>>> Thanks.
>>>> >>
>>>> >>
>>>> >
>>>
>>>
>>
>

RE: How HIVE manages a join

Posted by Joydeep Sen Sarma <js...@facebook.com>.

i hate this message: 'THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax'

why must edits to the wiki be banned if there are xdocs? hadoop has both.

there will always be things that are not captured in xdocs. it's pretty sad to discourage free form edits by people who want to contribute without checking out source. (what is this - the 80s?)
________________________________________
From: Edward Capriolo [edlinuxguru@gmail.com]
Sent: Tuesday, August 10, 2010 2:57 PM
To: hive-user@hadoop.apache.org
Cc: hive-dev@hadoop.apache.org
Subject: Re: How HIVE manages a join

Sorry.
$hive_root/docs/xdocs/language_manual/joins.xml

On Tue, Aug 10, 2010 at 5:57 PM, Edward Capriolo <ed...@gmail.com> wrote:
> This page is is already in version control..
>
> /home/edward/cassandra-handler/docs/xdocs/language_manual/joins.xml
>
> Edward
>
> On Tue, Aug 10, 2010 at 5:15 PM, Carl Steinbach <ca...@cloudera.com> wrote:
>> Hi Yongqiang,
>> Please go ahead and update the wiki page. I will copy it over to version
>> control when you are done.
>> Thanks.
>> Carl
>>
>> On Tue, Aug 10, 2010 at 2:11 PM, yongqiang he <he...@gmail.com>
>> wrote:
>>>
>>> In the Hive Join wiki page, it says
>>> "THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"
>>>
>>> Where should i do the update?
>>>
>>> On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he <he...@gmail.com>
>>> wrote:
>>> > Yeah. The sort merge bucket mapjoin has been finished for sometime,
>>> > and seems stable now. I did one skew join but haven't get a chance to
>>> > look at another skew join Namit mentioned to me. But definitely should
>>> > update the wiki earlier. My bad.
>>> >
>>> > On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher <ha...@cloudera.com>
>>> > wrote:
>>> >> Yongqiang mentioned he was going to update the wiki with this
>>> >> information in
>>> >> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
>>> >>
>>> >> Yongqiang, have you gotten a chance to complete the sort merge bucket
>>> >> map
>>> >> join and the other skew join you mention in the above thread?
>>> >>
>>> >> Thanks,
>>> >> Jeff
>>> >>
>>> >> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
>>> >> <bh...@students.iiit.ac.in> wrote:
>>> >>>
>>> >>> Roberto ..
>>> >>>
>>> >>> You can find these links useful ..
>>> >>>
>>> >>>
>>> >>>
>>> >>> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
>>> >>> - Simple joins and optimizations..
>>> >>>
>>> >>>
>>> >>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  -
>>> >>> New kind of joins / features of hive ..
>>> >>>
>>> >>> Thanks
>>> >>>
>>> >>> Bharath.V
>>> >>> 4th year Undergraduate..
>>> >>> IIIT Hyderabad
>>> >>>
>>> >>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
>>> >>> <ro...@guest.telecomitalia.it> wrote:
>>> >>>>
>>> >>>> Hi,
>>> >>>>
>>> >>>> I cannot find any documentation about what algorithm performs HIVE to
>>> >>>> translate JOIN clauses to Map-Reduce tasks.
>>> >>>>
>>> >>>> In particular, if I have two tables A and B, each table is written on
>>> >>>> a
>>> >>>> separate file and each file is splitted on hadoop nodes. When I
>>> >>>> perform a
>>> >>>> JOIN with A.column = B.column, the framework has to compare full data
>>> >>>> from
>>> >>>> the first file and full data from the second file. In order to
>>> >>>> perform a
>>> >>>> full scan of all possibile combinations of values, how can hadoop
>>> >>>> perform
>>> >>>> it? If each node contains a portion of each file, it seems not
>>> >>>> possible to
>>> >>>> have a complete comparison. Does one of the two files enterely
>>> >>>> replicated on
>>> >>>> each node? Or, does HIVE use another kind of strategy/optimization?
>>> >>>>
>>> >>>> Thanks.
>>> >>
>>> >>
>>> >
>>
>>
>

RE: How HIVE manages a join

Posted by Joydeep Sen Sarma <js...@facebook.com>.

i hate this message: 'THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax'

why must edits to the wiki be banned if there are xdocs? hadoop has both.

there will always be things that are not captured in xdocs. it's pretty sad to discourage free form edits by people who want to contribute without checking out source. (what is this - the 80s?)
________________________________________
From: Edward Capriolo [edlinuxguru@gmail.com]
Sent: Tuesday, August 10, 2010 2:57 PM
To: hive-user@hadoop.apache.org
Cc: hive-dev@hadoop.apache.org
Subject: Re: How HIVE manages a join

Sorry.
$hive_root/docs/xdocs/language_manual/joins.xml

On Tue, Aug 10, 2010 at 5:57 PM, Edward Capriolo <ed...@gmail.com> wrote:
> This page is is already in version control..
>
> /home/edward/cassandra-handler/docs/xdocs/language_manual/joins.xml
>
> Edward
>
> On Tue, Aug 10, 2010 at 5:15 PM, Carl Steinbach <ca...@cloudera.com> wrote:
>> Hi Yongqiang,
>> Please go ahead and update the wiki page. I will copy it over to version
>> control when you are done.
>> Thanks.
>> Carl
>>
>> On Tue, Aug 10, 2010 at 2:11 PM, yongqiang he <he...@gmail.com>
>> wrote:
>>>
>>> In the Hive Join wiki page, it says
>>> "THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"
>>>
>>> Where should i do the update?
>>>
>>> On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he <he...@gmail.com>
>>> wrote:
>>> > Yeah. The sort merge bucket mapjoin has been finished for sometime,
>>> > and seems stable now. I did one skew join but haven't get a chance to
>>> > look at another skew join Namit mentioned to me. But definitely should
>>> > update the wiki earlier. My bad.
>>> >
>>> > On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher <ha...@cloudera.com>
>>> > wrote:
>>> >> Yongqiang mentioned he was going to update the wiki with this
>>> >> information in
>>> >> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
>>> >>
>>> >> Yongqiang, have you gotten a chance to complete the sort merge bucket
>>> >> map
>>> >> join and the other skew join you mention in the above thread?
>>> >>
>>> >> Thanks,
>>> >> Jeff
>>> >>
>>> >> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
>>> >> <bh...@students.iiit.ac.in> wrote:
>>> >>>
>>> >>> Roberto ..
>>> >>>
>>> >>> You can find these links useful ..
>>> >>>
>>> >>>
>>> >>>
>>> >>> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
>>> >>> - Simple joins and optimizations..
>>> >>>
>>> >>>
>>> >>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  -
>>> >>> New kind of joins / features of hive ..
>>> >>>
>>> >>> Thanks
>>> >>>
>>> >>> Bharath.V
>>> >>> 4th year Undergraduate..
>>> >>> IIIT Hyderabad
>>> >>>
>>> >>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
>>> >>> <ro...@guest.telecomitalia.it> wrote:
>>> >>>>
>>> >>>> Hi,
>>> >>>>
>>> >>>> I cannot find any documentation about what algorithm performs HIVE to
>>> >>>> translate JOIN clauses to Map-Reduce tasks.
>>> >>>>
>>> >>>> In particular, if I have two tables A and B, each table is written on
>>> >>>> a
>>> >>>> separate file and each file is splitted on hadoop nodes. When I
>>> >>>> perform a
>>> >>>> JOIN with A.column = B.column, the framework has to compare full data
>>> >>>> from
>>> >>>> the first file and full data from the second file. In order to
>>> >>>> perform a
>>> >>>> full scan of all possibile combinations of values, how can hadoop
>>> >>>> perform
>>> >>>> it? If each node contains a portion of each file, it seems not
>>> >>>> possible to
>>> >>>> have a complete comparison. Does one of the two files enterely
>>> >>>> replicated on
>>> >>>> each node? Or, does HIVE use another kind of strategy/optimization?
>>> >>>>
>>> >>>> Thanks.
>>> >>
>>> >>
>>> >
>>
>>
>

Re: How HIVE manages a join

Posted by Edward Capriolo <ed...@gmail.com>.

Sorry.
$hive_root/docs/xdocs/language_manual/joins.xml

On Tue, Aug 10, 2010 at 5:57 PM, Edward Capriolo <ed...@gmail.com> wrote:
> This page is is already in version control..
>
> /home/edward/cassandra-handler/docs/xdocs/language_manual/joins.xml
>
> Edward
>
> On Tue, Aug 10, 2010 at 5:15 PM, Carl Steinbach <ca...@cloudera.com> wrote:
>> Hi Yongqiang,
>> Please go ahead and update the wiki page. I will copy it over to version
>> control when you are done.
>> Thanks.
>> Carl
>>
>> On Tue, Aug 10, 2010 at 2:11 PM, yongqiang he <he...@gmail.com>
>> wrote:
>>>
>>> In the Hive Join wiki page, it says
>>> "THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"
>>>
>>> Where should i do the update?
>>>
>>> On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he <he...@gmail.com>
>>> wrote:
>>> > Yeah. The sort merge bucket mapjoin has been finished for sometime,
>>> > and seems stable now. I did one skew join but haven't get a chance to
>>> > look at another skew join Namit mentioned to me. But definitely should
>>> > update the wiki earlier. My bad.
>>> >
>>> > On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher <ha...@cloudera.com>
>>> > wrote:
>>> >> Yongqiang mentioned he was going to update the wiki with this
>>> >> information in
>>> >> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
>>> >>
>>> >> Yongqiang, have you gotten a chance to complete the sort merge bucket
>>> >> map
>>> >> join and the other skew join you mention in the above thread?
>>> >>
>>> >> Thanks,
>>> >> Jeff
>>> >>
>>> >> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
>>> >> <bh...@students.iiit.ac.in> wrote:
>>> >>>
>>> >>> Roberto ..
>>> >>>
>>> >>> You can find these links useful ..
>>> >>>
>>> >>>
>>> >>>
>>> >>> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
>>> >>> - Simple joins and optimizations..
>>> >>>
>>> >>>
>>> >>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  -
>>> >>> New kind of joins / features of hive ..
>>> >>>
>>> >>> Thanks
>>> >>>
>>> >>> Bharath.V
>>> >>> 4th year Undergraduate..
>>> >>> IIIT Hyderabad
>>> >>>
>>> >>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
>>> >>> <ro...@guest.telecomitalia.it> wrote:
>>> >>>>
>>> >>>> Hi,
>>> >>>>
>>> >>>> I cannot find any documentation about what algorithm performs HIVE to
>>> >>>> translate JOIN clauses to Map-Reduce tasks.
>>> >>>>
>>> >>>> In particular, if I have two tables A and B, each table is written on
>>> >>>> a
>>> >>>> separate file and each file is splitted on hadoop nodes. When I
>>> >>>> perform a
>>> >>>> JOIN with A.column = B.column, the framework has to compare full data
>>> >>>> from
>>> >>>> the first file and full data from the second file. In order to
>>> >>>> perform a
>>> >>>> full scan of all possibile combinations of values, how can hadoop
>>> >>>> perform
>>> >>>> it? If each node contains a portion of each file, it seems not
>>> >>>> possible to
>>> >>>> have a complete comparison. Does one of the two files enterely
>>> >>>> replicated on
>>> >>>> each node? Or, does HIVE use another kind of strategy/optimization?
>>> >>>>
>>> >>>> Thanks.
>>> >>
>>> >>
>>> >
>>
>>
>

Re: How HIVE manages a join

Posted by Edward Capriolo <ed...@gmail.com>.

Sorry.
$hive_root/docs/xdocs/language_manual/joins.xml

On Tue, Aug 10, 2010 at 5:57 PM, Edward Capriolo <ed...@gmail.com> wrote:
> This page is is already in version control..
>
> /home/edward/cassandra-handler/docs/xdocs/language_manual/joins.xml
>
> Edward
>
> On Tue, Aug 10, 2010 at 5:15 PM, Carl Steinbach <ca...@cloudera.com> wrote:
>> Hi Yongqiang,
>> Please go ahead and update the wiki page. I will copy it over to version
>> control when you are done.
>> Thanks.
>> Carl
>>
>> On Tue, Aug 10, 2010 at 2:11 PM, yongqiang he <he...@gmail.com>
>> wrote:
>>>
>>> In the Hive Join wiki page, it says
>>> "THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"
>>>
>>> Where should i do the update?
>>>
>>> On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he <he...@gmail.com>
>>> wrote:
>>> > Yeah. The sort merge bucket mapjoin has been finished for sometime,
>>> > and seems stable now. I did one skew join but haven't get a chance to
>>> > look at another skew join Namit mentioned to me. But definitely should
>>> > update the wiki earlier. My bad.
>>> >
>>> > On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher <ha...@cloudera.com>
>>> > wrote:
>>> >> Yongqiang mentioned he was going to update the wiki with this
>>> >> information in
>>> >> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
>>> >>
>>> >> Yongqiang, have you gotten a chance to complete the sort merge bucket
>>> >> map
>>> >> join and the other skew join you mention in the above thread?
>>> >>
>>> >> Thanks,
>>> >> Jeff
>>> >>
>>> >> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
>>> >> <bh...@students.iiit.ac.in> wrote:
>>> >>>
>>> >>> Roberto ..
>>> >>>
>>> >>> You can find these links useful ..
>>> >>>
>>> >>>
>>> >>>
>>> >>> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
>>> >>> - Simple joins and optimizations..
>>> >>>
>>> >>>
>>> >>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  -
>>> >>> New kind of joins / features of hive ..
>>> >>>
>>> >>> Thanks
>>> >>>
>>> >>> Bharath.V
>>> >>> 4th year Undergraduate..
>>> >>> IIIT Hyderabad
>>> >>>
>>> >>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
>>> >>> <ro...@guest.telecomitalia.it> wrote:
>>> >>>>
>>> >>>> Hi,
>>> >>>>
>>> >>>> I cannot find any documentation about what algorithm performs HIVE to
>>> >>>> translate JOIN clauses to Map-Reduce tasks.
>>> >>>>
>>> >>>> In particular, if I have two tables A and B, each table is written on
>>> >>>> a
>>> >>>> separate file and each file is splitted on hadoop nodes. When I
>>> >>>> perform a
>>> >>>> JOIN with A.column = B.column, the framework has to compare full data
>>> >>>> from
>>> >>>> the first file and full data from the second file. In order to
>>> >>>> perform a
>>> >>>> full scan of all possibile combinations of values, how can hadoop
>>> >>>> perform
>>> >>>> it? If each node contains a portion of each file, it seems not
>>> >>>> possible to
>>> >>>> have a complete comparison. Does one of the two files enterely
>>> >>>> replicated on
>>> >>>> each node? Or, does HIVE use another kind of strategy/optimization?
>>> >>>>
>>> >>>> Thanks.
>>> >>
>>> >>
>>> >
>>
>>
>

Re: How HIVE manages a join

Posted by Edward Capriolo <ed...@gmail.com>.

This page is is already in version control..

/home/edward/cassandra-handler/docs/xdocs/language_manual/joins.xml

Edward

On Tue, Aug 10, 2010 at 5:15 PM, Carl Steinbach <ca...@cloudera.com> wrote:
> Hi Yongqiang,
> Please go ahead and update the wiki page. I will copy it over to version
> control when you are done.
> Thanks.
> Carl
>
> On Tue, Aug 10, 2010 at 2:11 PM, yongqiang he <he...@gmail.com>
> wrote:
>>
>> In the Hive Join wiki page, it says
>> "THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"
>>
>> Where should i do the update?
>>
>> On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he <he...@gmail.com>
>> wrote:
>> > Yeah. The sort merge bucket mapjoin has been finished for sometime,
>> > and seems stable now. I did one skew join but haven't get a chance to
>> > look at another skew join Namit mentioned to me. But definitely should
>> > update the wiki earlier. My bad.
>> >
>> > On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher <ha...@cloudera.com>
>> > wrote:
>> >> Yongqiang mentioned he was going to update the wiki with this
>> >> information in
>> >> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
>> >>
>> >> Yongqiang, have you gotten a chance to complete the sort merge bucket
>> >> map
>> >> join and the other skew join you mention in the above thread?
>> >>
>> >> Thanks,
>> >> Jeff
>> >>
>> >> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
>> >> <bh...@students.iiit.ac.in> wrote:
>> >>>
>> >>> Roberto ..
>> >>>
>> >>> You can find these links useful ..
>> >>>
>> >>>
>> >>>
>> >>> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
>> >>> - Simple joins and optimizations..
>> >>>
>> >>>
>> >>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  -
>> >>> New kind of joins / features of hive ..
>> >>>
>> >>> Thanks
>> >>>
>> >>> Bharath.V
>> >>> 4th year Undergraduate..
>> >>> IIIT Hyderabad
>> >>>
>> >>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
>> >>> <ro...@guest.telecomitalia.it> wrote:
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> I cannot find any documentation about what algorithm performs HIVE to
>> >>>> translate JOIN clauses to Map-Reduce tasks.
>> >>>>
>> >>>> In particular, if I have two tables A and B, each table is written on
>> >>>> a
>> >>>> separate file and each file is splitted on hadoop nodes. When I
>> >>>> perform a
>> >>>> JOIN with A.column = B.column, the framework has to compare full data
>> >>>> from
>> >>>> the first file and full data from the second file. In order to
>> >>>> perform a
>> >>>> full scan of all possibile combinations of values, how can hadoop
>> >>>> perform
>> >>>> it? If each node contains a portion of each file, it seems not
>> >>>> possible to
>> >>>> have a complete comparison. Does one of the two files enterely
>> >>>> replicated on
>> >>>> each node? Or, does HIVE use another kind of strategy/optimization?
>> >>>>
>> >>>> Thanks.
>> >>
>> >>
>> >
>
>

Re: How HIVE manages a join

Posted by Edward Capriolo <ed...@gmail.com>.

This page is is already in version control..

/home/edward/cassandra-handler/docs/xdocs/language_manual/joins.xml

Edward

On Tue, Aug 10, 2010 at 5:15 PM, Carl Steinbach <ca...@cloudera.com> wrote:
> Hi Yongqiang,
> Please go ahead and update the wiki page. I will copy it over to version
> control when you are done.
> Thanks.
> Carl
>
> On Tue, Aug 10, 2010 at 2:11 PM, yongqiang he <he...@gmail.com>
> wrote:
>>
>> In the Hive Join wiki page, it says
>> "THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"
>>
>> Where should i do the update?
>>
>> On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he <he...@gmail.com>
>> wrote:
>> > Yeah. The sort merge bucket mapjoin has been finished for sometime,
>> > and seems stable now. I did one skew join but haven't get a chance to
>> > look at another skew join Namit mentioned to me. But definitely should
>> > update the wiki earlier. My bad.
>> >
>> > On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher <ha...@cloudera.com>
>> > wrote:
>> >> Yongqiang mentioned he was going to update the wiki with this
>> >> information in
>> >> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
>> >>
>> >> Yongqiang, have you gotten a chance to complete the sort merge bucket
>> >> map
>> >> join and the other skew join you mention in the above thread?
>> >>
>> >> Thanks,
>> >> Jeff
>> >>
>> >> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
>> >> <bh...@students.iiit.ac.in> wrote:
>> >>>
>> >>> Roberto ..
>> >>>
>> >>> You can find these links useful ..
>> >>>
>> >>>
>> >>>
>> >>> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
>> >>> - Simple joins and optimizations..
>> >>>
>> >>>
>> >>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  -
>> >>> New kind of joins / features of hive ..
>> >>>
>> >>> Thanks
>> >>>
>> >>> Bharath.V
>> >>> 4th year Undergraduate..
>> >>> IIIT Hyderabad
>> >>>
>> >>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
>> >>> <ro...@guest.telecomitalia.it> wrote:
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> I cannot find any documentation about what algorithm performs HIVE to
>> >>>> translate JOIN clauses to Map-Reduce tasks.
>> >>>>
>> >>>> In particular, if I have two tables A and B, each table is written on
>> >>>> a
>> >>>> separate file and each file is splitted on hadoop nodes. When I
>> >>>> perform a
>> >>>> JOIN with A.column = B.column, the framework has to compare full data
>> >>>> from
>> >>>> the first file and full data from the second file. In order to
>> >>>> perform a
>> >>>> full scan of all possibile combinations of values, how can hadoop
>> >>>> perform
>> >>>> it? If each node contains a portion of each file, it seems not
>> >>>> possible to
>> >>>> have a complete comparison. Does one of the two files enterely
>> >>>> replicated on
>> >>>> each node? Or, does HIVE use another kind of strategy/optimization?
>> >>>>
>> >>>> Thanks.
>> >>
>> >>
>> >
>
>

Re: How HIVE manages a join

Posted by Carl Steinbach <ca...@cloudera.com>.

Hi Yongqiang,

Please go ahead and update the wiki page. I will copy it over to version
control when you are done.

Thanks.

Carl

On Tue, Aug 10, 2010 at 2:11 PM, yongqiang he <he...@gmail.com>wrote:

> In the Hive Join wiki page, it says
> "THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"
>
> Where should i do the update?
>
> On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he <he...@gmail.com>
> wrote:
> > Yeah. The sort merge bucket mapjoin has been finished for sometime,
> > and seems stable now. I did one skew join but haven't get a chance to
> > look at another skew join Namit mentioned to me. But definitely should
> > update the wiki earlier. My bad.
> >
> > On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher <ha...@cloudera.com>
> wrote:
> >> Yongqiang mentioned he was going to update the wiki with this
> information in
> >> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
> >>
> >> Yongqiang, have you gotten a chance to complete the sort merge bucket
> map
> >> join and the other skew join you mention in the above thread?
> >>
> >> Thanks,
> >> Jeff
> >>
> >> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
> >> <bh...@students.iiit.ac.in> wrote:
> >>>
> >>> Roberto ..
> >>>
> >>> You can find these links useful ..
> >>>
> >>>
> >>>
> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
> >>> - Simple joins and optimizations..
> >>>
> >>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team
> -
> >>> New kind of joins / features of hive ..
> >>>
> >>> Thanks
> >>>
> >>> Bharath.V
> >>> 4th year Undergraduate..
> >>> IIIT Hyderabad
> >>>
> >>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
> >>> <ro...@guest.telecomitalia.it> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I cannot find any documentation about what algorithm performs HIVE to
> >>>> translate JOIN clauses to Map-Reduce tasks.
> >>>>
> >>>> In particular, if I have two tables A and B, each table is written on
> a
> >>>> separate file and each file is splitted on hadoop nodes. When I
> perform a
> >>>> JOIN with A.column = B.column, the framework has to compare full data
> from
> >>>> the first file and full data from the second file. In order to perform
> a
> >>>> full scan of all possibile combinations of values, how can hadoop
> perform
> >>>> it? If each node contains a portion of each file, it seems not
> possible to
> >>>> have a complete comparison. Does one of the two files enterely
> replicated on
> >>>> each node? Or, does HIVE use another kind of strategy/optimization?
> >>>>
> >>>> Thanks.
> >>
> >>
> >
>

Re: How HIVE manages a join

Posted by Carl Steinbach <ca...@cloudera.com>.

Hi Yongqiang,

Please go ahead and update the wiki page. I will copy it over to version
control when you are done.

Thanks.

Carl

On Tue, Aug 10, 2010 at 2:11 PM, yongqiang he <he...@gmail.com>wrote:

> In the Hive Join wiki page, it says
> "THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"
>
> Where should i do the update?
>
> On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he <he...@gmail.com>
> wrote:
> > Yeah. The sort merge bucket mapjoin has been finished for sometime,
> > and seems stable now. I did one skew join but haven't get a chance to
> > look at another skew join Namit mentioned to me. But definitely should
> > update the wiki earlier. My bad.
> >
> > On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher <ha...@cloudera.com>
> wrote:
> >> Yongqiang mentioned he was going to update the wiki with this
> information in
> >> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
> >>
> >> Yongqiang, have you gotten a chance to complete the sort merge bucket
> map
> >> join and the other skew join you mention in the above thread?
> >>
> >> Thanks,
> >> Jeff
> >>
> >> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
> >> <bh...@students.iiit.ac.in> wrote:
> >>>
> >>> Roberto ..
> >>>
> >>> You can find these links useful ..
> >>>
> >>>
> >>>
> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
> >>> - Simple joins and optimizations..
> >>>
> >>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team
> -
> >>> New kind of joins / features of hive ..
> >>>
> >>> Thanks
> >>>
> >>> Bharath.V
> >>> 4th year Undergraduate..
> >>> IIIT Hyderabad
> >>>
> >>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
> >>> <ro...@guest.telecomitalia.it> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I cannot find any documentation about what algorithm performs HIVE to
> >>>> translate JOIN clauses to Map-Reduce tasks.
> >>>>
> >>>> In particular, if I have two tables A and B, each table is written on
> a
> >>>> separate file and each file is splitted on hadoop nodes. When I
> perform a
> >>>> JOIN with A.column = B.column, the framework has to compare full data
> from
> >>>> the first file and full data from the second file. In order to perform
> a
> >>>> full scan of all possibile combinations of values, how can hadoop
> perform
> >>>> it? If each node contains a portion of each file, it seems not
> possible to
> >>>> have a complete comparison. Does one of the two files enterely
> replicated on
> >>>> each node? Or, does HIVE use another kind of strategy/optimization?
> >>>>
> >>>> Thanks.
> >>
> >>
> >
>

Re: How HIVE manages a join

Posted by yongqiang he <he...@gmail.com>.

In the Hive Join wiki page, it says
"THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"

Where should i do the update?

On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he <he...@gmail.com> wrote:
> Yeah. The sort merge bucket mapjoin has been finished for sometime,
> and seems stable now. I did one skew join but haven't get a chance to
> look at another skew join Namit mentioned to me. But definitely should
> update the wiki earlier. My bad.
>
> On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher <ha...@cloudera.com> wrote:
>> Yongqiang mentioned he was going to update the wiki with this information in
>> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
>>
>> Yongqiang, have you gotten a chance to complete the sort merge bucket map
>> join and the other skew join you mention in the above thread?
>>
>> Thanks,
>> Jeff
>>
>> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
>> <bh...@students.iiit.ac.in> wrote:
>>>
>>> Roberto ..
>>>
>>> You can find these links useful ..
>>>
>>>
>>> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
>>> - Simple joins and optimizations..
>>>
>>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  -
>>> New kind of joins / features of hive ..
>>>
>>> Thanks
>>>
>>> Bharath.V
>>> 4th year Undergraduate..
>>> IIIT Hyderabad
>>>
>>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
>>> <ro...@guest.telecomitalia.it> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I cannot find any documentation about what algorithm performs HIVE to
>>>> translate JOIN clauses to Map-Reduce tasks.
>>>>
>>>> In particular, if I have two tables A and B, each table is written on a
>>>> separate file and each file is splitted on hadoop nodes. When I perform a
>>>> JOIN with A.column = B.column, the framework has to compare full data from
>>>> the first file and full data from the second file. In order to perform a
>>>> full scan of all possibile combinations of values, how can hadoop perform
>>>> it? If each node contains a portion of each file, it seems not possible to
>>>> have a complete comparison. Does one of the two files enterely replicated on
>>>> each node? Or, does HIVE use another kind of strategy/optimization?
>>>>
>>>> Thanks.
>>
>>
>

Re: How HIVE manages a join

Posted by yongqiang he <he...@gmail.com>.

In the Hive Join wiki page, it says
"THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax"

Where should i do the update?

On Fri, Aug 6, 2010 at 11:46 PM, yongqiang he <he...@gmail.com> wrote:
> Yeah. The sort merge bucket mapjoin has been finished for sometime,
> and seems stable now. I did one skew join but haven't get a chance to
> look at another skew join Namit mentioned to me. But definitely should
> update the wiki earlier. My bad.
>
> On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher <ha...@cloudera.com> wrote:
>> Yongqiang mentioned he was going to update the wiki with this information in
>> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
>>
>> Yongqiang, have you gotten a chance to complete the sort merge bucket map
>> join and the other skew join you mention in the above thread?
>>
>> Thanks,
>> Jeff
>>
>> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
>> <bh...@students.iiit.ac.in> wrote:
>>>
>>> Roberto ..
>>>
>>> You can find these links useful ..
>>>
>>>
>>> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
>>> - Simple joins and optimizations..
>>>
>>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  -
>>> New kind of joins / features of hive ..
>>>
>>> Thanks
>>>
>>> Bharath.V
>>> 4th year Undergraduate..
>>> IIIT Hyderabad
>>>
>>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
>>> <ro...@guest.telecomitalia.it> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I cannot find any documentation about what algorithm performs HIVE to
>>>> translate JOIN clauses to Map-Reduce tasks.
>>>>
>>>> In particular, if I have two tables A and B, each table is written on a
>>>> separate file and each file is splitted on hadoop nodes. When I perform a
>>>> JOIN with A.column = B.column, the framework has to compare full data from
>>>> the first file and full data from the second file. In order to perform a
>>>> full scan of all possibile combinations of values, how can hadoop perform
>>>> it? If each node contains a portion of each file, it seems not possible to
>>>> have a complete comparison. Does one of the two files enterely replicated on
>>>> each node? Or, does HIVE use another kind of strategy/optimization?
>>>>
>>>> Thanks.
>>
>>
>

Re: How HIVE manages a join

Posted by yongqiang he <he...@gmail.com>.

Yeah. The sort merge bucket mapjoin has been finished for sometime,
and seems stable now. I did one skew join but haven't get a chance to
look at another skew join Namit mentioned to me. But definitely should
update the wiki earlier. My bad.

On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher <ha...@cloudera.com> wrote:
> Yongqiang mentioned he was going to update the wiki with this information in
> the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
>
> Yongqiang, have you gotten a chance to complete the sort merge bucket map
> join and the other skew join you mention in the above thread?
>
> Thanks,
> Jeff
>
> On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada
> <bh...@students.iiit.ac.in> wrote:
>>
>> Roberto ..
>>
>> You can find these links useful ..
>>
>>
>> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551
>> - Simple joins and optimizations..
>>
>> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  -
>> New kind of joins / features of hive ..
>>
>> Thanks
>>
>> Bharath.V
>> 4th year Undergraduate..
>> IIIT Hyderabad
>>
>> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto
>> <ro...@guest.telecomitalia.it> wrote:
>>>
>>> Hi,
>>>
>>> I cannot find any documentation about what algorithm performs HIVE to
>>> translate JOIN clauses to Map-Reduce tasks.
>>>
>>> In particular, if I have two tables A and B, each table is written on a
>>> separate file and each file is splitted on hadoop nodes. When I perform a
>>> JOIN with A.column = B.column, the framework has to compare full data from
>>> the first file and full data from the second file. In order to perform a
>>> full scan of all possibile combinations of values, how can hadoop perform
>>> it? If each node contains a portion of each file, it seems not possible to
>>> have a complete comparison. Does one of the two files enterely replicated on
>>> each node? Or, does HIVE use another kind of strategy/optimization?
>>>
>>> Thanks.
>
>

Re: How HIVE manages a join

Posted by Jeff Hammerbacher <ha...@cloudera.com>.

Yongqiang mentioned he was going to update the wiki with this information in
the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.

Yongqiang, have you gotten a chance to complete the sort merge bucket map
join and the other skew join you mention in the above thread?

Thanks,
Jeff

On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada <
bharat_v@students.iiit.ac.in> wrote:

> Roberto ..
>
> You can find these links useful ..
>
>
> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551- Simple joins and optimizations..
>
> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  -
> New kind of joins / features of hive ..
>
> Thanks
>
> Bharath.V
> 4th year Undergraduate..
> IIIT Hyderabad
>
>
> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto <
> roberto.cappa@guest.telecomitalia.it> wrote:
>
>> Hi,
>>
>> I cannot find any documentation about what algorithm performs HIVE to
>> translate JOIN clauses to Map-Reduce tasks.
>>
>> In particular, if I have two tables A and B, each table is written on a
>> separate file and each file is splitted on hadoop nodes. When I perform a
>> JOIN with A.column = B.column, the framework has to compare full data from
>> the first file and full data from the second file. In order to perform a
>> full scan of all possibile combinations of values, how can hadoop perform
>> it? If each node contains a portion of each file, it seems not possible to
>> have a complete comparison. Does one of the two files enterely replicated on
>> each node? Or, does HIVE use another kind of strategy/optimization?
>>
>> Thanks.
>
>
>

Re: How HIVE manages a join

Posted by bharath vissapragada <bh...@students.iiit.ac.in>.

Roberto ..

You can find these links useful ..

http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551-
Simple joins and optimizations..

http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team  -
New kind of joins / features of hive ..

Thanks

Bharath.V
4th year Undergraduate..
IIIT Hyderabad

On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto <
roberto.cappa@guest.telecomitalia.it> wrote:

> Hi,
>
> I cannot find any documentation about what algorithm performs HIVE to
> translate JOIN clauses to Map-Reduce tasks.
>
> In particular, if I have two tables A and B, each table is written on a
> separate file and each file is splitted on hadoop nodes. When I perform a
> JOIN with A.column = B.column, the framework has to compare full data from
> the first file and full data from the second file. In order to perform a
> full scan of all possibile combinations of values, how can hadoop perform
> it? If each node contains a portion of each file, it seems not possible to
> have a complete comparison. Does one of the two files enterely replicated on
> each node? Or, does HIVE use another kind of strategy/optimization?
>
> Thanks.