You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by William Slacum <ws...@gmail.com> on 2015/08/04 22:46:15 UTC

Origin of hive.auto.convert.sortmerge.join.noconditionaltask

Hi all,

I've had some questions from users regarding setting
`hive.auto.convert.sortmerge.join.noconditionaltask`. I see, in some
documentation from users and vendors, that it is recommended to set this
parameter. In neither Hive 0.12 nor 0.14 can I find in HiveConf where this
is actually defined and used. Am I correct in thinking that this is just
some cruft that's survived without verification?

Thanks!

Re: Origin of hive.auto.convert.sortmerge.join.noconditionaltask

Posted by Bill Slacum <ws...@gmail.com>.
Makes me feel not insane :) Thanks!



> On Aug 5, 2015, at 12:30 AM, Lefty Leverenz <le...@gmail.com> wrote:
> 
> Good question.  I can't find it in any Hive releases.  There's hive.auto.convert.join.noconditionaltask (starting in 0.11.0) but not hive.auto.convert.sortmerge.join.noconditionaltask.
> 
> Several JIRA issues mention it, including the 0.13.0 release note for HIVE-6098 "Merge Tez branch into trunk":
> 
>> Hive settings: 
>> 
>> // needed because SMB isn't supported on tez yet 
>> set hive.optimize.bucketmapjoin=false; 
>> set hive.optimize.bucketmapjoin.sortedmerge=false; 
>> set hive.auto.convert.sortmerge.join=false; 
>> set hive.auto.convert.sortmerge.join.noconditionaltask=false; 
>> set hive.auto.convert.join.noconditionaltask=true; 
> 
> And it was in the Join Optimization wikidoc from July 2013 until last month, when Vikram removed it (see page history): 
>> Auto Conversion to SMB Map Join
>> Sort-Merge-Bucket (SMB) joins can be converted to SMB map joins as well. SMB joins are used wherever the tables are sorted and bucketed. The join boils down to just merging the already sorted tables, allowing this operation to be faster than an ordinary map-join. However, if the tables are partitioned, there could be a slow down as each mapper would need to get a very small chunk of a partition which has a single key.
>> The following configuration settings enable the conversion of an SMB to a map-join SMB:
>> set hive.auto.convert.sortmerge.join=true;
>> set hive.optimize.bucketmapjoin = true;
>> set hive.optimize.bucketmapjoin.sortedmerge = true;
>> set hive.auto.convert.sortmerge.join.noconditionaltask=true;
> 
> -- Lefty
> 
> 
> 
>> On Tue, Aug 4, 2015 at 4:46 PM, William Slacum <ws...@gmail.com> wrote:
>> Hi all,
>> 
>> I've had some questions from users regarding setting `hive.auto.convert.sortmerge.join.noconditionaltask`. I see, in some documentation from users and vendors, that it is recommended to set this parameter. In neither Hive 0.12 nor 0.14 can I find in HiveConf where this is actually defined and used. Am I correct in thinking that this is just some cruft that's survived without verification?
>> 
>> Thanks!
> 

Re: Origin of hive.auto.convert.sortmerge.join.noconditionaltask

Posted by Lefty Leverenz <le...@gmail.com>.
Good question.  I can't find it in any Hive releases.
There's hive.auto.convert.join.noconditionaltask (starting in 0.11.0) but
not hive.auto.convert.sortmerge.join.noconditionaltask.

Several JIRA issues mention it, including the 0.13.0 release note for
HIVE-6098 <https://issues.apache.org/jira/browse/HIVE-6098> "Merge Tez
branch into trunk":

Hive settings:
>
> // needed because SMB isn't supported on tez yet
> set hive.optimize.bucketmapjoin=false;
> set hive.optimize.bucketmapjoin.sortedmerge=false;
> set hive.auto.convert.sortmerge.join=false;
> set hive.auto.convert.sortmerge.join.noconditionaltask=false;
> set hive.auto.convert.join.noconditionaltask=true;
>

And it was in the Join Optimization
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization>
wikidoc
from July 2013 until last month, when Vikram removed it (see page history
<https://cwiki.apache.org/confluence/pages/viewpreviousversions.action?pageId=33293167>
):

> Auto Conversion to SMB Map Join
>
> Sort-Merge-Bucket (SMB) joins can be converted to SMB map joins as well.
> SMB joins are used wherever the tables are sorted and bucketed. The join
> boils down to just merging the already sorted tables, allowing this
> operation to be faster than an ordinary map-join. However, if the tables
> are partitioned, there could be a slow down as each mapper would need to
> get a very small chunk of a partition which has a single key.
>
> The following configuration settings enable the conversion of an SMB to a
> map-join SMB:
>
> set hive.auto.convert.sortmerge.join=true;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;set hive.auto.convert.sortmerge.join.noconditionaltask=true;
>
>
-- Lefty



On Tue, Aug 4, 2015 at 4:46 PM, William Slacum <ws...@gmail.com> wrote:

> Hi all,
>
> I've had some questions from users regarding setting
> `hive.auto.convert.sortmerge.join.noconditionaltask`. I see, in some
> documentation from users and vendors, that it is recommended to set this
> parameter. In neither Hive 0.12 nor 0.14 can I find in HiveConf where this
> is actually defined and used. Am I correct in thinking that this is just
> some cruft that's survived without verification?
>
> Thanks!
>