You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by John Sichi <js...@fb.com> on 2011/06/06 20:11:42 UTC

Travel Assistance applications now open for ApacheCon NA 2011

The Apache Software Foundation (ASF)'s Travel Assistance Committee (TAC) is
now accepting applications for ApacheCon North America 2011, 7-11 November
in Vancouver BC, Canada.

The TAC is seeking individuals from the Apache community at-large --users,
developers, educators, students, Committers, and Members-- who would like to
attend ApacheCon, but need some financial support in order to be able to get
there. There are limited places available, and all applicants will be scored
on their individual merit.

Financial assistance is available to cover flights/trains, accommodation and
entrance fees either in part or in full, depending on circumstances.
However, the support available for those attending only the BarCamp (7-8
November) is less than that for those attending the entire event (Conference
+ BarCamp 7-11 November). The Travel Assistance Committee aims to support
all official ASF events, including cross-project activities; as such, it may
be prudent for those in Asia and Europe to wait for an event geographically
closer to them.

More information can be found at http://www.apache.org/travel/index.html
including a link to the online application and detailed instructions for
submitting.

Applications will close on 8 July 2011 at 22:00 BST (UTC/GMT +1).

We wish good luck to all those who will apply, and thank you in advance for
tweeting, blogging, and otherwise spreading the word.

Regards,
The Travel Assistance Committee

Re: Skew Join Optimization in hive

Posted by Shantian Purkad <sh...@yahoo.com>.

We have given hints to use mapside joins on small tables.


We are planning to break this query into multiple, but would prefer options that help us keep the queries as is (with few modifications and tuning instead of breaking the queries into multiple steps as there is quite bit of complicate logic and dependencies in the joins)



________________________________
From: Igor Tatarinov <ig...@decide.com>
To: user@hive.apache.org; Shantian Purkad <sh...@yahoo.com>
Sent: Tuesday, June 7, 2011 12:58 PM
Subject: Re: Skew Join Optimization in hive


Have you tried splitting the query into 2 or 3 steps and/or enabling map jons (SET hive.auto.convert.join = true;) if some of the tables are smallish?



On Tue, Jun 7, 2011 at 12:31 PM, Shantian Purkad <sh...@yahoo.com> wrote:

Hi,
>
>I have a query which joins 12 different tables (most of them left outer joins) and the query takes almost 3 hours. 90% of the time is taken by a single reducer. One reducer is getting bulk of the data to process.
>
>How can I get around this and have fair distribution of data across all reducers? I tried to enable the skewjoin optimization but getting below NPE after first step of the job is executed.
>
>Any suggestions/ideas will be or great help.
>
>Thanks,
>Shantian
>
>2011-06-07 19:22:28,923 Stage-11 map = 100%,  reduce = 85%
>2011-06-07 19:22:30,932 Stage-11 map = 100%,  reduce = 100%
>Ended Job = job_201106071542_0010
>java.lang.NullPointerException
>    at
 org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(ConditionalResolverSkewJoin.java:97)
>    at org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
>    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)
>    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)
>    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
>    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>    at
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.ConditionalTask
>hive> 
>
>

Re: Skew Join Optimization in hive

Posted by Igor Tatarinov <ig...@decide.com>.

Have you tried splitting the query into 2 or 3 steps and/or enabling map
jons (SET hive.auto.convert.join = true;) if some of the tables are
smallish?


On Tue, Jun 7, 2011 at 12:31 PM, Shantian Purkad
<sh...@yahoo.com>wrote:

> Hi,
>
> I have a query which joins 12 different tables (most of them left outer
> joins) and the query takes almost 3 hours. 90% of the time is taken by a
> single reducer. One reducer is getting bulk of the data to process.
>
> How can I get around this and have fair distribution of data across all
> reducers? I tried to enable the skewjoin optimization but getting below NPE
> after first step of the job is executed.
>
> Any suggestions/ideas will be or great help.
>
> Thanks,
> Shantian
>
> 2011-06-07 19:22:28,923 Stage-11 map = 100%,  reduce = 85%
> 2011-06-07 19:22:30,932 Stage-11 map = 100%,  reduce = 100%
> Ended Job = job_201106071542_0010
> java.lang.NullPointerException
>     at
> org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(ConditionalResolverSkewJoin.java:97)
>     at
> org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>     at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> FAILED: Execution Error, return code -101 from
> org.apache.hadoop.hive.ql.exec.ConditionalTask
> hive>
>
>

Skew Join Optimization in hive

Posted by Shantian Purkad <sh...@yahoo.com>.

Hi,

I have a query which joins 12 different tables (most of them left outer joins) and the query takes almost 3 hours. 90% of the time is taken by a single reducer. One reducer is getting bulk of the data to process.

How can I get around this and have fair distribution of data across all reducers? I tried to enable the skewjoin optimization but getting below NPE after first step of the job is executed.

Any suggestions/ideas will be or great help.

Thanks,
Shantian

2011-06-07 19:22:28,923 Stage-11 map = 100%,  reduce = 85%
2011-06-07 19:22:30,932 Stage-11 map = 100%,  reduce = 100%
Ended Job = job_201106071542_0010
java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(ConditionalResolverSkewJoin.java:97)
    at org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.ConditionalTask
hive>

Skew Join Optimization in hive

Posted by Shantian Purkad <sh...@yahoo.com>.

Hi,

I have a query which joins 12 different tables (most of them left outer joins) and the query takes almost 3 hours. 90% of the time is taken by a single reducer. One reducer is getting bulk of the data to process.

How can I get around this and have fair distribution of data across all reducers? I tried to enable the skewjoin optimization but getting below NPE after first step of the job is executed.

Any suggestions/ideas will be or great help.

Thanks,
Shantian

2011-06-07 19:22:28,923 Stage-11 map = 100%,  reduce = 85%
2011-06-07 19:22:30,932 Stage-11 map = 100%,  reduce = 100%
Ended Job = job_201106071542_0010
java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(ConditionalResolverSkewJoin.java:97)
    at org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.ConditionalTask
hive>