You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by yuemeng1 <yu...@huawei.com> on 2014/12/02 03:22:27 UTC

Job aborted due to stage failure

hi,i built a hive on spark package and my spark assembly jar is 
spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a query in hive 
shell,before execute this query,
i set all the  require which hive need with  spark.and i execute a join 
query :
select distinct st.sno,sname from student st join score sc 
on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
but it failed,
get follow error in spark webUI:
Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, 
most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): 
java.lang.NullPointerException+details

Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:56)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)

Driver stacktrace:

can u give me a help to deal this probelm,and i think my built was 
succussed!

Re: Job aborted due to stage failure

Posted by yuemeng1 <yu...@huawei.com>.
Hi,Lefty
Currently, I have some other things to do, I'm going to edit wikidocs 
tomorrow

thanks
Yuemeng

On 2014/12/5 9:59, Lefty Leverenz wrote:
> Yuemeng, you can find out how to edit wikidocs here: About This Wiki 
> <https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-Howtogetpermissiontoedit>.
>
> -- Lefty
>
> On Wed, Dec 3, 2014 at 10:05 PM, Xuefu Zhang <xzhang@cloudera.com 
> <ma...@cloudera.com>> wrote:
>
>     Hi Yuemeng,
>
>     I'm glad that Hive on Spark finally works for you. As you know,
>     this project is still in development and yet to be released. Thus,
>     please forgive about the lack of proper documentation. We have a
>     "Get Started" page that's linked in HIVE-7292. If you can improve
>     the document there, it would be very helpful for other Hive users.
>
>     Thanks,
>     Xuefu
>
>     On Wed, Dec 3, 2014 at 5:42 PM, yuemeng1 <yuemeng1@huawei.com
>     <ma...@huawei.com>> wrote:
>
>         hi,thanks a lot for your help,with your help ,my hive-on-spark
>         can work well now
>         it take me long time to install and deploy.here are  some
>         advice,i think we need to improve the installation
>         documentation, allowing users to use the least amount of time
>         to compile and install
>         1)add which spark version we should pick from spark github if
>         we select built spark instead of download a spark
>         pre-built,tell them the right built commad!(not include Pyarn
>         ,Phive)
>         2)if they get some error during built ,such as
>         [ERRO/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobStatus.java:[22,24]cannot
>         find symbol
>         [ERROR]symbol: class JobExecutionStatus,tell them what they
>         can do?
>         for our users,first to use it ,then  feel good or bad?
>         and if u need,i can add something to start document
>
>
>         thanks
>         yuemeng
>
>
>
>
>
>
>         On 2014/12/3 11:03, Xuefu Zhang wrote:
>>         When you build Spark, remove -Phive as well as -Pyarn. When
>>         you run hive queries, you may need to run "set
>>         spark.home=/path/to/spark/dir";
>>
>>         Thanks,
>>         Xuefu
>>
>>         On Tue, Dec 2, 2014 at 6:29 PM, yuemeng1 <yuemeng1@huawei.com
>>         <ma...@huawei.com>> wrote:
>>
>>             hi,XueFu,thanks a lot for your help,now i will provide
>>             more detail to reproduce this ssue:
>>             1),i checkout a spark branch from hive
>>             github(https://github.com/apache/hive/tree/spark on Nov
>>             29,becasue of for version now it will give something
>>             wrong about:Caused by: java.lang.RuntimeException: Unable
>>             to instantiate
>>             org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
>>             ),
>>             and built command:mvn clean package -DskipTests
>>             -Phadoop-2 -Pdist
>>             after built i get package from
>>             :/home/ym/hive-on-spark/hive1129/hive/packaging/target(apache-hive-0.15.0-SNAPSHOT-bin.tar.gz)
>>             2)i checkout spark from
>>             https://github.com/apache/spark/tree/v1.2.0-snapshot0,becasue
>>             of spark branch-1.2 is with spark
>>             parent(1.2.1-SNAPSHOT),so i chose v1.2.0-snapshot0 and i
>>             compare this spark's pom.xml with
>>             spark-parent-1.2.0-SNAPSHOT.pom(get from
>>             http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
>>             there is only difference is spark-parent name,and built
>>             command is :
>>
>>             |mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean package|
>>
>>             3)comand i execute in hive-shell:
>>             ./hive --auxpath
>>             /opt/hispark/spark/assembly/target/scala-2.10/spark-assembly-1.2.0-hadoop2.4.0.jar(copy
>>             this jar to hive dir lib already)
>>             create table student(sno int,sname string,sage int,ssex
>>             string) row format delimited FIELDS TERMINATED BY ',';
>>             create table score(sno int,cno int,sage int) row format
>>             delimited FIELDS TERMINATED BY ',';
>>             load data local inpath
>>             '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/student.txt'
>>             into table student;
>>             load data local inpath
>>             '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/score.txt'
>>             into table score;
>>             set hive.execution.engine=spark;
>>             set spark.master=spark://10.175.xxx.xxx:7077;
>>             set spark.eventLog.enabled=true;
>>             set spark.executor.memory=9086m;
>>             set
>>             spark.serializer=org.apache.spark.serializer.KryoSerializer;
>>             select distinct st.sno,sname from student st join score
>>             sc on(st.sno=sc.sno) where sc.cno IN(11,12,13) and
>>             st.sage > 28;(work in mr)
>>             4)
>>             studdent.txt file
>>             1,rsh,27,female
>>             2,kupo,28,male
>>             3,astin,29,female
>>             4,beike,30,male
>>             5,aili,31,famle
>>
>>             score.txt file
>>             1,10,80
>>             2,11,85
>>             3,12,90
>>             4,13,95
>>             5,14,100
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>             On 2014/12/2 23:28, Xuefu Zhang wrote:
>>>             Could you provide details on how to reproduce the issue?
>>>             such as the exact spark branch, the command to build
>>>             Spark, how you build Hive, and what queries/commands you
>>>             run.
>>>
>>>             We are running Hive on Spark all the time. Our
>>>             pre-commit test runs without any issue.
>>>
>>>             Thanks,
>>>             Xuefu
>>>
>>>             On Tue, Dec 2, 2014 at 4:13 AM, yuemeng1
>>>             <yuemeng1@huawei.com <ma...@huawei.com>> wrote:
>>>
>>>                 hi,XueFu
>>>                 i checkout a spark branch from
>>>                 sparkgithub(tags:v1.2.0-snapshot0)and i compare this
>>>                 spark's pom.xml with
>>>                 spark-parent-1.2.0-SNAPSHOT.pom(get from
>>>                 http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
>>>                 there is only difference is follow:
>>>                 in spark-parent-1.2.0-SNAPSHOT.pom
>>>                 <artifactId>spark-parent</artifactId>
>>>                 <version>1.2.0-SNAPSHOT</version>
>>>                 and in v1.2.0-snapshot0
>>>                 <artifactId>spark-parent</artifactId>
>>>                 <version>1.2.0</version>
>>>                 i think there is no essence diff,and i built
>>>                 v1.2.0-snapshot0 and deploy it as my spark clusters
>>>                 when i run query about join two table ,it still give
>>>                 some error what i show u earlier
>>>
>>>                 Job aborted due to stage failure: Task 0 in stage
>>>                 1.0 failed 4 times, most recent failure: Lost task
>>>                 0.3 in stage 1.0 (TID 7, datasight18):
>>>                 java.lang.NullPointerException+details
>>>
>>>                 Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>>                 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>>                 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>>                 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>>                 	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>>                 	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>>                 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>>                 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>>                 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>                 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>                 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>>                 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>                 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>                 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>>                 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>                 	at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>>                 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>>                 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>                 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>                 	at java.lang.Thread.run(Thread.java:722)
>>>
>>>                 Driver stacktrace:
>>>
>>>
>>>
>>>                 i think my spark clusters did't had any problem,but
>>>                 why always give me such error
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>                 On 2014/12/2 13:39, Xuefu Zhang wrote:
>>>>                 You need to build your spark assembly from spark
>>>>                 1.2 branch. this should give your both a spark
>>>>                 build as well as spark-assembly jar, which you need
>>>>                 to copy to Hive lib directory. Snapshot is fine,
>>>>                 and spark 1.2 hasn't been released yet.
>>>>
>>>>                 --Xuefu
>>>>
>>>>                 On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1
>>>>                 <yuemeng1@huawei.com <ma...@huawei.com>>
>>>>                 wrote:
>>>>
>>>>
>>>>
>>>>                     hi.XueFu,
>>>>                     thanks a lot for your inforamtion,but as far as
>>>>                     i know ,the latest spark version on github is
>>>>                     spark-snapshot-1.3,but there is no
>>>>                     spark-1.2,only have a branch-1.2 with
>>>>                     spark-snapshot-1.2,can u tell me which spark
>>>>                     version i should built,and for now,that's
>>>>                     spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar
>>>>                     produce error like that
>>>>
>>>>
>>>>                     On 2014/12/2 11:03, Xuefu Zhang wrote:
>>>>>                     It seems that wrong class, HiveInputFormat, is
>>>>>                     loaded. The stacktrace is way off the current
>>>>>                     Hive code. You need to build Spark 1.2 and
>>>>>                     copy spark-assembly jar to Hive's lib
>>>>>                     directory and that it.
>>>>>
>>>>>                     --Xuefu
>>>>>
>>>>>                     On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1
>>>>>                     <yuemeng1@huawei.com
>>>>>                     <ma...@huawei.com>> wrote:
>>>>>
>>>>>                         hi,i built a hive on spark package and my
>>>>>                         spark assembly jar is
>>>>>                         spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when
>>>>>                         i run a query in hive shell,before execute
>>>>>                         this query,
>>>>>                         i set all the require which hive need with
>>>>>                         spark.and i execute a join query :
>>>>>                         select distinct st.sno,sname from student
>>>>>                         st join score sc on(st.sno=sc.sno) where
>>>>>                         sc.cno IN(11,12,13) and st.sage > 28;
>>>>>                         but it failed,
>>>>>                         get follow error in spark webUI:
>>>>>                         Job aborted due to stage failure: Task 0
>>>>>                         in stage 1.0 failed 4 times, most recent
>>>>>                         failure: Lost task 0.3 in stage 1.0 (TID
>>>>>                         7, datasight18):
>>>>>                         java.lang.NullPointerException+details
>>>>>
>>>>>                         Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>>>>                         	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>>>>                         	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>>>>                         	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>>>>                         	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>>>>                         	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>>>>                         	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>>>>                         	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>>>>                         	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>>>                         	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>>>                         	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>>>>                         	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>>>                         	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>>>                         	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>>>>                         	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>>>                         	at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>>>>                         	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>>>>                         	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>>>                         	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>>>                         	at java.lang.Thread.run(Thread.java:722)
>>>>>
>>>>>                         Driver stacktrace:
>>>>>
>>>>>                         can u give me a help to deal this
>>>>>                         probelm,and i think my built was succussed!
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
>


Re: Job aborted due to stage failure

Posted by Lefty Leverenz <le...@gmail.com>.
Yuemeng, you can find out how to edit wikidocs here:  About This Wiki
<https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-Howtogetpermissiontoedit>
.

-- Lefty

On Wed, Dec 3, 2014 at 10:05 PM, Xuefu Zhang <xz...@cloudera.com> wrote:

> Hi Yuemeng,
>
> I'm glad that Hive on Spark finally works for you. As you know, this
> project is still in development and yet to be released. Thus, please
> forgive about the lack of proper documentation. We have a "Get Started"
> page that's linked in HIVE-7292. If you can improve the document there, it
> would be very helpful for other Hive users.
>
> Thanks,
> Xuefu
>
> On Wed, Dec 3, 2014 at 5:42 PM, yuemeng1 <yu...@huawei.com> wrote:
>
>>  hi,thanks a lot for your help,with your help ,my hive-on-spark can work
>> well now
>> it take me long time to install and deploy.here are  some advice,i think
>> we need to improve the installation documentation, allowing users to use
>> the least amount of time to compile and install
>> 1)add which spark version we should pick from spark github if we select
>> built spark instead of download a spark pre-built,tell them the right built
>> commad!(not include Pyarn ,Phive)
>> 2)if they get some error during built ,such as [ERRO
>> /hive/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobStatus.java:
>> [22,24] cannot find symbol
>> [ERROR] symbol: class JobExecutionStatus,tell them what they can do?
>> for our users,first to use it ,then  feel good or bad?
>> and if u need,i can add something to start document
>>
>>
>> thanks
>> yuemeng
>>
>>
>>
>>
>>
>>
>> On 2014/12/3 11:03, Xuefu Zhang wrote:
>>
>>  When you build Spark, remove -Phive as well as -Pyarn. When you run
>> hive queries, you may need to run "set spark.home=/path/to/spark/dir";
>>
>>  Thanks,
>>  Xuefu
>>
>> On Tue, Dec 2, 2014 at 6:29 PM, yuemeng1 <yu...@huawei.com> wrote:
>>
>>>  hi,XueFu,thanks a lot for your help,now i will provide more detail to
>>> reproduce this ssue:
>>> 1),i checkout a spark branch from hive github(
>>> https://github.com/apache/hive/tree/spark on Nov 29,becasue of for
>>> version now it will give something wrong about:Caused by:
>>> java.lang.RuntimeException: Unable to instantiate
>>> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ),
>>> and built command:mvn clean package -DskipTests -Phadoop-2 -Pdist
>>> after built i get package from
>>> :/home/ym/hive-on-spark/hive1129/hive/packaging/target(apache-hive-0.15.0-SNAPSHOT-bin.tar.gz)
>>> 2)i checkout spark from
>>> https://github.com/apache/spark/tree/v1.2.0-snapshot0,becasue of spark
>>> branch-1.2 is with spark parent(1.2.1-SNAPSHOT),so i chose
>>> v1.2.0-snapshot0 and i compare this spark's pom.xml with
>>> spark-parent-1.2.0-SNAPSHOT.pom(get from
>>> http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
>>> there is only difference is spark-parent name,and built command is :
>>>
>>> mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean package
>>>
>>> 3)comand i execute in hive-shell:
>>> ./hive --auxpath
>>> /opt/hispark/spark/assembly/target/scala-2.10/spark-assembly-1.2.0-hadoop2.4.0.jar(copy
>>> this jar to hive dir lib already)
>>> create table student(sno int,sname string,sage int,ssex string) row
>>> format delimited FIELDS TERMINATED BY ',';
>>> create table score(sno int,cno int,sage int) row format delimited FIELDS
>>> TERMINATED BY ',';
>>> load data local inpath
>>> '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/student.txt'
>>> into table student;
>>> load data local inpath
>>> '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/score.txt'
>>> into table score;
>>> set hive.execution.engine=spark;
>>> set spark.master=spark://10.175.xxx.xxx:7077;
>>> set spark.eventLog.enabled=true;
>>> set spark.executor.memory=9086m;
>>> set spark.serializer=org.apache.spark.serializer.KryoSerializer;
>>> select distinct st.sno,sname from student st join score sc
>>> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;(work in mr)
>>> 4)
>>> studdent.txt file
>>> 1,rsh,27,female
>>> 2,kupo,28,male
>>> 3,astin,29,female
>>> 4,beike,30,male
>>> 5,aili,31,famle
>>>
>>> score.txt file
>>> 1,10,80
>>> 2,11,85
>>> 3,12,90
>>> 4,13,95
>>> 5,14,100
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 2014/12/2 23:28, Xuefu Zhang wrote:
>>>
>>>  Could you provide details on how to reproduce the issue? such as the
>>> exact spark branch, the command to build Spark, how you build Hive, and
>>> what queries/commands you run.
>>>
>>>  We are running Hive on Spark all the time. Our pre-commit test runs
>>> without any issue.
>>>
>>>  Thanks,
>>>  Xuefu
>>>
>>> On Tue, Dec 2, 2014 at 4:13 AM, yuemeng1 <yu...@huawei.com> wrote:
>>>
>>>>  hi,XueFu
>>>> i checkout a spark branch from sparkgithub(tags:v1.2.0-snapshot0)and i
>>>> compare this spark's pom.xml with spark-parent-1.2.0-SNAPSHOT.pom(get from
>>>> http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
>>>> there is only difference is follow:
>>>> in spark-parent-1.2.0-SNAPSHOT.pom
>>>>   <artifactId>spark-parent</artifactId>
>>>>   <version>1.2.0-SNAPSHOT</version>
>>>> and in v1.2.0-snapshot0
>>>> <artifactId>spark-parent</artifactId>
>>>>   <version>1.2.0</version>
>>>> i think there is no essence diff,and i built v1.2.0-snapshot0 and
>>>> deploy it as my spark clusters
>>>> when i run query about join two table ,it still give some error what i
>>>> show u earlier
>>>>
>>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times,
>>>> most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
>>>> java.lang.NullPointerException+details
>>>>
>>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>>> 	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>>> 	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>>> 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>>> 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>>> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>> 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>>> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>> 	at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>>> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>> 	at java.lang.Thread.run(Thread.java:722)
>>>>
>>>> Driver stacktrace:
>>>>
>>>>
>>>>
>>>>  i think my spark clusters did't had any problem,but why always give me
>>>> such error
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 2014/12/2 13:39, Xuefu Zhang wrote:
>>>>
>>>>  You need to build your spark assembly from spark 1.2 branch. this
>>>> should give your both a spark build as well as spark-assembly jar, which
>>>> you need to copy to Hive lib directory. Snapshot is fine, and spark 1.2
>>>> hasn't been released yet.
>>>>
>>>>  --Xuefu
>>>>
>>>> On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1 <yu...@huawei.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> hi.XueFu,
>>>>> thanks a lot for your inforamtion,but as far as i know ,the latest
>>>>> spark version on github is spark-snapshot-1.3,but there is no
>>>>> spark-1.2,only have a branch-1.2 with spark-snapshot-1.2,can u tell me
>>>>> which spark version i should built,and for now,that's
>>>>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error like that
>>>>>
>>>>>
>>>>> On 2014/12/2 11:03, Xuefu Zhang wrote:
>>>>>
>>>>>  It seems that wrong class, HiveInputFormat, is loaded. The
>>>>> stacktrace is way off the current Hive code. You need to build Spark 1.2
>>>>> and copy spark-assembly jar to Hive's lib directory and that it.
>>>>>
>>>>>  --Xuefu
>>>>>
>>>>> On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yu...@huawei.com> wrote:
>>>>>
>>>>>>  hi,i built a hive on spark package and my spark assembly jar is
>>>>>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a query in hive
>>>>>> shell,before execute this query,
>>>>>> i set all the  require which hive need with  spark.and i execute a
>>>>>> join query :
>>>>>> select distinct st.sno,sname from student st join score sc
>>>>>> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
>>>>>> but it failed,
>>>>>> get follow error in spark webUI:
>>>>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times,
>>>>>> most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
>>>>>> java.lang.NullPointerException+details
>>>>>>
>>>>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>>>>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>>>>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>>>>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>>>>> 	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>>>>> 	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>>>>> 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>>>>> 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>>>>> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>>>> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>>>> 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>>>>> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>>>> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>>>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>>>>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>>>> 	at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>>>>> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>>>> 	at java.lang.Thread.run(Thread.java:722)
>>>>>>
>>>>>> Driver stacktrace:
>>>>>>
>>>>>>
>>>>>>  can u give me a help to deal this probelm,and i think my built was
>>>>>> succussed!
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>

Re: Job aborted due to stage failure

Posted by Xuefu Zhang <xz...@cloudera.com>.
Hi Yuemeng,

I'm glad that Hive on Spark finally works for you. As you know, this
project is still in development and yet to be released. Thus, please
forgive about the lack of proper documentation. We have a "Get Started"
page that's linked in HIVE-7292. If you can improve the document there, it
would be very helpful for other Hive users.

Thanks,
Xuefu

On Wed, Dec 3, 2014 at 5:42 PM, yuemeng1 <yu...@huawei.com> wrote:

>  hi,thanks a lot for your help,with your help ,my hive-on-spark can work
> well now
> it take me long time to install and deploy.here are  some advice,i think
> we need to improve the installation documentation, allowing users to use
> the least amount of time to compile and install
> 1)add which spark version we should pick from spark github if we select
> built spark instead of download a spark pre-built,tell them the right built
> commad!(not include Pyarn ,Phive)
> 2)if they get some error during built ,such as [ERRO
> /hive/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobStatus.java:
> [22,24] cannot find symbol
> [ERROR] symbol: class JobExecutionStatus,tell them what they can do?
> for our users,first to use it ,then  feel good or bad?
> and if u need,i can add something to start document
>
>
> thanks
> yuemeng
>
>
>
>
>
>
> On 2014/12/3 11:03, Xuefu Zhang wrote:
>
>  When you build Spark, remove -Phive as well as -Pyarn. When you run hive
> queries, you may need to run "set spark.home=/path/to/spark/dir";
>
>  Thanks,
>  Xuefu
>
> On Tue, Dec 2, 2014 at 6:29 PM, yuemeng1 <yu...@huawei.com> wrote:
>
>>  hi,XueFu,thanks a lot for your help,now i will provide more detail to
>> reproduce this ssue:
>> 1),i checkout a spark branch from hive github(
>> https://github.com/apache/hive/tree/spark on Nov 29,becasue of for
>> version now it will give something wrong about:Caused by:
>> java.lang.RuntimeException: Unable to instantiate
>> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ),
>> and built command:mvn clean package -DskipTests -Phadoop-2 -Pdist
>> after built i get package from
>> :/home/ym/hive-on-spark/hive1129/hive/packaging/target(apache-hive-0.15.0-SNAPSHOT-bin.tar.gz)
>> 2)i checkout spark from
>> https://github.com/apache/spark/tree/v1.2.0-snapshot0,becasue of spark
>> branch-1.2 is with spark parent(1.2.1-SNAPSHOT),so i chose
>> v1.2.0-snapshot0 and i compare this spark's pom.xml with
>> spark-parent-1.2.0-SNAPSHOT.pom(get from
>> http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
>> there is only difference is spark-parent name,and built command is :
>>
>> mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean package
>>
>> 3)comand i execute in hive-shell:
>> ./hive --auxpath
>> /opt/hispark/spark/assembly/target/scala-2.10/spark-assembly-1.2.0-hadoop2.4.0.jar(copy
>> this jar to hive dir lib already)
>> create table student(sno int,sname string,sage int,ssex string) row
>> format delimited FIELDS TERMINATED BY ',';
>> create table score(sno int,cno int,sage int) row format delimited FIELDS
>> TERMINATED BY ',';
>> load data local inpath
>> '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/student.txt'
>> into table student;
>> load data local inpath
>> '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/score.txt'
>> into table score;
>> set hive.execution.engine=spark;
>> set spark.master=spark://10.175.xxx.xxx:7077;
>> set spark.eventLog.enabled=true;
>> set spark.executor.memory=9086m;
>> set spark.serializer=org.apache.spark.serializer.KryoSerializer;
>> select distinct st.sno,sname from student st join score sc
>> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;(work in mr)
>> 4)
>> studdent.txt file
>> 1,rsh,27,female
>> 2,kupo,28,male
>> 3,astin,29,female
>> 4,beike,30,male
>> 5,aili,31,famle
>>
>> score.txt file
>> 1,10,80
>> 2,11,85
>> 3,12,90
>> 4,13,95
>> 5,14,100
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 2014/12/2 23:28, Xuefu Zhang wrote:
>>
>>  Could you provide details on how to reproduce the issue? such as the
>> exact spark branch, the command to build Spark, how you build Hive, and
>> what queries/commands you run.
>>
>>  We are running Hive on Spark all the time. Our pre-commit test runs
>> without any issue.
>>
>>  Thanks,
>>  Xuefu
>>
>> On Tue, Dec 2, 2014 at 4:13 AM, yuemeng1 <yu...@huawei.com> wrote:
>>
>>>  hi,XueFu
>>> i checkout a spark branch from sparkgithub(tags:v1.2.0-snapshot0)and i
>>> compare this spark's pom.xml with spark-parent-1.2.0-SNAPSHOT.pom(get from
>>> http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
>>> there is only difference is follow:
>>> in spark-parent-1.2.0-SNAPSHOT.pom
>>>   <artifactId>spark-parent</artifactId>
>>>   <version>1.2.0-SNAPSHOT</version>
>>> and in v1.2.0-snapshot0
>>> <artifactId>spark-parent</artifactId>
>>>   <version>1.2.0</version>
>>> i think there is no essence diff,and i built v1.2.0-snapshot0 and deploy
>>> it as my spark clusters
>>> when i run query about join two table ,it still give some error what i
>>> show u earlier
>>>
>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times,
>>> most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
>>> java.lang.NullPointerException+details
>>>
>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>> 	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>> 	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>> 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>> 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>> 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>> 	at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>> 	at java.lang.Thread.run(Thread.java:722)
>>>
>>> Driver stacktrace:
>>>
>>>
>>>
>>>  i think my spark clusters did't had any problem,but why always give me
>>> such error
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 2014/12/2 13:39, Xuefu Zhang wrote:
>>>
>>>  You need to build your spark assembly from spark 1.2 branch. this
>>> should give your both a spark build as well as spark-assembly jar, which
>>> you need to copy to Hive lib directory. Snapshot is fine, and spark 1.2
>>> hasn't been released yet.
>>>
>>>  --Xuefu
>>>
>>> On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1 <yu...@huawei.com> wrote:
>>>
>>>>
>>>>
>>>> hi.XueFu,
>>>> thanks a lot for your inforamtion,but as far as i know ,the latest
>>>> spark version on github is spark-snapshot-1.3,but there is no
>>>> spark-1.2,only have a branch-1.2 with spark-snapshot-1.2,can u tell me
>>>> which spark version i should built,and for now,that's
>>>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error like that
>>>>
>>>>
>>>> On 2014/12/2 11:03, Xuefu Zhang wrote:
>>>>
>>>>  It seems that wrong class, HiveInputFormat, is loaded. The stacktrace
>>>> is way off the current Hive code. You need to build Spark 1.2 and copy
>>>> spark-assembly jar to Hive's lib directory and that it.
>>>>
>>>>  --Xuefu
>>>>
>>>> On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yu...@huawei.com> wrote:
>>>>
>>>>>  hi,i built a hive on spark package and my spark assembly jar is
>>>>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a query in hive
>>>>> shell,before execute this query,
>>>>> i set all the  require which hive need with  spark.and i execute a
>>>>> join query :
>>>>> select distinct st.sno,sname from student st join score sc
>>>>> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
>>>>> but it failed,
>>>>> get follow error in spark webUI:
>>>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times,
>>>>> most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
>>>>> java.lang.NullPointerException+details
>>>>>
>>>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>>>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>>>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>>>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>>>> 	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>>>> 	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>>>> 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>>>> 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>>>> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>>> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>>> 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>>>> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>>> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>>>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>>> 	at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>>>> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>>> 	at java.lang.Thread.run(Thread.java:722)
>>>>>
>>>>> Driver stacktrace:
>>>>>
>>>>>
>>>>>  can u give me a help to deal this probelm,and i think my built was
>>>>> succussed!
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>

Re: Job aborted due to stage failure

Posted by yuemeng1 <yu...@huawei.com>.
hi,thanks a lot for your help,with your help ,my hive-on-spark can work 
well now
it take me long time to install and deploy.here are  some advice,i think 
we need to improve the installation documentation, allowing users to use 
the least amount of time to compile and install
1)add which spark version we should pick from spark github if we select 
built spark instead of download a spark pre-built,tell them the right 
built commad!(not include Pyarn ,Phive)
2)if they get some error during built ,such as 
[ERRO/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobStatus.java:[22,24]cannot 
find symbol
[ERROR]symbol: class JobExecutionStatus,tell them what they can do?
for our users,first to use it ,then  feel good or bad?
and if u need,i can add something to start document


thanks
yuemeng





On 2014/12/3 11:03, Xuefu Zhang wrote:
> When you build Spark, remove -Phive as well as -Pyarn. When you run 
> hive queries, you may need to run "set spark.home=/path/to/spark/dir";
>
> Thanks,
> Xuefu
>
> On Tue, Dec 2, 2014 at 6:29 PM, yuemeng1 <yuemeng1@huawei.com 
> <ma...@huawei.com>> wrote:
>
>     hi,XueFu,thanks a lot for your help,now i will provide more detail
>     to reproduce this ssue:
>     1),i checkout a spark branch from hive
>     github(https://github.com/apache/hive/tree/spark on Nov 29,becasue
>     of for version now it will give something wrong about:Caused by:
>     java.lang.RuntimeException: Unable to instantiate
>     org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ),
>     and built command:mvn clean package -DskipTests -Phadoop-2 -Pdist
>     after built i get package from
>     :/home/ym/hive-on-spark/hive1129/hive/packaging/target(apache-hive-0.15.0-SNAPSHOT-bin.tar.gz)
>     2)i checkout spark from
>     https://github.com/apache/spark/tree/v1.2.0-snapshot0,becasue of
>     spark branch-1.2 is with spark parent(1.2.1-SNAPSHOT),so i chose
>     v1.2.0-snapshot0 and i compare this spark's pom.xml with
>     spark-parent-1.2.0-SNAPSHOT.pom(get from
>     http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
>     there is only difference is spark-parent name,and built command is :
>
>     |mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean package|
>
>     3)comand i execute in hive-shell:
>     ./hive --auxpath
>     /opt/hispark/spark/assembly/target/scala-2.10/spark-assembly-1.2.0-hadoop2.4.0.jar(copy
>     this jar to hive dir lib already)
>     create table student(sno int,sname string,sage int,ssex string)
>     row format delimited FIELDS TERMINATED BY ',';
>     create table score(sno int,cno int,sage int) row format delimited
>     FIELDS TERMINATED BY ',';
>     load data local inpath
>     '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/student.txt'
>     into table student;
>     load data local inpath
>     '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/score.txt'
>     into table score;
>     set hive.execution.engine=spark;
>     set spark.master=spark://10.175.xxx.xxx:7077;
>     set spark.eventLog.enabled=true;
>     set spark.executor.memory=9086m;
>     set spark.serializer=org.apache.spark.serializer.KryoSerializer;
>     select distinct st.sno,sname from student st join score sc
>     on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;(work
>     in mr)
>     4)
>     studdent.txt file
>     1,rsh,27,female
>     2,kupo,28,male
>     3,astin,29,female
>     4,beike,30,male
>     5,aili,31,famle
>
>     score.txt file
>     1,10,80
>     2,11,85
>     3,12,90
>     4,13,95
>     5,14,100
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>     On 2014/12/2 23:28, Xuefu Zhang wrote:
>>     Could you provide details on how to reproduce the issue? such as
>>     the exact spark branch, the command to build Spark, how you build
>>     Hive, and what queries/commands you run.
>>
>>     We are running Hive on Spark all the time. Our pre-commit test
>>     runs without any issue.
>>
>>     Thanks,
>>     Xuefu
>>
>>     On Tue, Dec 2, 2014 at 4:13 AM, yuemeng1 <yuemeng1@huawei.com
>>     <ma...@huawei.com>> wrote:
>>
>>         hi,XueFu
>>         i checkout a spark branch from
>>         sparkgithub(tags:v1.2.0-snapshot0)and i compare this spark's
>>         pom.xml with spark-parent-1.2.0-SNAPSHOT.pom(get from
>>         http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
>>         there is only difference is follow:
>>         in spark-parent-1.2.0-SNAPSHOT.pom
>>         <artifactId>spark-parent</artifactId>
>>         <version>1.2.0-SNAPSHOT</version>
>>         and in v1.2.0-snapshot0
>>         <artifactId>spark-parent</artifactId>
>>           <version>1.2.0</version>
>>         i think there is no essence diff,and i built v1.2.0-snapshot0
>>         and deploy it as my spark clusters
>>         when i run query about join two table ,it still give some
>>         error what i show u earlier
>>
>>         Job aborted due to stage failure: Task 0 in stage 1.0 failed
>>         4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID
>>         7, datasight18): java.lang.NullPointerException+details
>>
>>         Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>         	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>         	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>         	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>         	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>         	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>         	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>         	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>         	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>         	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>         	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>         	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>         	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>         	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>         	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>         	at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>         	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>         	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>         	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>         	at java.lang.Thread.run(Thread.java:722)
>>
>>         Driver stacktrace:
>>
>>
>>
>>         i think my spark clusters did't had any problem,but why
>>         always give me such error
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>         On 2014/12/2 13:39, Xuefu Zhang wrote:
>>>         You need to build your spark assembly from spark 1.2 branch.
>>>         this should give your both a spark build as well as
>>>         spark-assembly jar, which you need to copy to Hive lib
>>>         directory. Snapshot is fine, and spark 1.2 hasn't been
>>>         released yet.
>>>
>>>         --Xuefu
>>>
>>>         On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1
>>>         <yuemeng1@huawei.com <ma...@huawei.com>> wrote:
>>>
>>>
>>>
>>>             hi.XueFu,
>>>             thanks a lot for your inforamtion,but as far as i know
>>>             ,the latest spark version on github is
>>>             spark-snapshot-1.3,but there is no spark-1.2,only have a
>>>             branch-1.2 with spark-snapshot-1.2,can u tell me which
>>>             spark version i should built,and for now,that's
>>>             spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce
>>>             error like that
>>>
>>>
>>>             On 2014/12/2 11:03, Xuefu Zhang wrote:
>>>>             It seems that wrong class, HiveInputFormat, is loaded.
>>>>             The stacktrace is way off the current Hive code. You
>>>>             need to build Spark 1.2 and copy spark-assembly jar to
>>>>             Hive's lib directory and that it.
>>>>
>>>>             --Xuefu
>>>>
>>>>             On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1
>>>>             <yuemeng1@huawei.com <ma...@huawei.com>> wrote:
>>>>
>>>>                 hi,i built a hive on spark package and my spark
>>>>                 assembly jar is
>>>>                 spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when
>>>>                 i run a query in hive shell,before execute this query,
>>>>                 i set all the require which hive need with
>>>>                 spark.and i execute a join query :
>>>>                 select distinct st.sno,sname from student st join
>>>>                 score sc on(st.sno=sc.sno) where sc.cno
>>>>                 IN(11,12,13) and st.sage > 28;
>>>>                 but it failed,
>>>>                 get follow error in spark webUI:
>>>>                 Job aborted due to stage failure: Task 0 in stage
>>>>                 1.0 failed 4 times, most recent failure: Lost task
>>>>                 0.3 in stage 1.0 (TID 7, datasight18):
>>>>                 java.lang.NullPointerException+details
>>>>
>>>>                 Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>>>                 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>>>                 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>>>                 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>>>                 	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>>>                 	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>>>                 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>>>                 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>>>                 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>>                 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>>                 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>>>                 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>>                 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>>                 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>>>                 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>>                 	at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>>>                 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>>>                 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>>                 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>>                 	at java.lang.Thread.run(Thread.java:722)
>>>>
>>>>                 Driver stacktrace:
>>>>
>>>>                 can u give me a help to deal this probelm,and i
>>>>                 think my built was succussed!
>>>>
>>>>
>>>
>>>
>>
>>
>
>


Re: Job aborted due to stage failure

Posted by Xuefu Zhang <xz...@cloudera.com>.
When you build Spark, remove -Phive as well as -Pyarn. When you run hive
queries, you may need to run "set spark.home=/path/to/spark/dir";

Thanks,
Xuefu

On Tue, Dec 2, 2014 at 6:29 PM, yuemeng1 <yu...@huawei.com> wrote:

>  hi,XueFu,thanks a lot for your help,now i will provide more detail to
> reproduce this ssue:
> 1),i checkout a spark branch from hive github(
> https://github.com/apache/hive/tree/spark on Nov 29,becasue of for
> version now it will give something wrong about:Caused by:
> java.lang.RuntimeException: Unable to instantiate
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ),
> and built command:mvn clean package -DskipTests -Phadoop-2 -Pdist
> after built i get package from
> :/home/ym/hive-on-spark/hive1129/hive/packaging/target(apache-hive-0.15.0-SNAPSHOT-bin.tar.gz)
> 2)i checkout spark from
> https://github.com/apache/spark/tree/v1.2.0-snapshot0,becasue of spark
> branch-1.2 is with spark parent(1.2.1-SNAPSHOT),so i chose
> v1.2.0-snapshot0 and i compare this spark's pom.xml with
> spark-parent-1.2.0-SNAPSHOT.pom(get from
> http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
> there is only difference is spark-parent name,and built command is :
>
> mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean package
>
> 3)comand i execute in hive-shell:
> ./hive --auxpath
> /opt/hispark/spark/assembly/target/scala-2.10/spark-assembly-1.2.0-hadoop2.4.0.jar(copy
> this jar to hive dir lib already)
> create table student(sno int,sname string,sage int,ssex string) row format
> delimited FIELDS TERMINATED BY ',';
> create table score(sno int,cno int,sage int) row format delimited FIELDS
> TERMINATED BY ',';
> load data local inpath
> '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/student.txt'
> into table student;
> load data local inpath
> '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/score.txt'
> into table score;
> set hive.execution.engine=spark;
> set spark.master=spark://10.175.xxx.xxx:7077;
> set spark.eventLog.enabled=true;
> set spark.executor.memory=9086m;
> set spark.serializer=org.apache.spark.serializer.KryoSerializer;
> select distinct st.sno,sname from student st join score sc
> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;(work in mr)
> 4)
> studdent.txt file
> 1,rsh,27,female
> 2,kupo,28,male
> 3,astin,29,female
> 4,beike,30,male
> 5,aili,31,famle
>
> score.txt file
> 1,10,80
> 2,11,85
> 3,12,90
> 4,13,95
> 5,14,100
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On 2014/12/2 23:28, Xuefu Zhang wrote:
>
>  Could you provide details on how to reproduce the issue? such as the
> exact spark branch, the command to build Spark, how you build Hive, and
> what queries/commands you run.
>
>  We are running Hive on Spark all the time. Our pre-commit test runs
> without any issue.
>
>  Thanks,
>  Xuefu
>
> On Tue, Dec 2, 2014 at 4:13 AM, yuemeng1 <yu...@huawei.com> wrote:
>
>>  hi,XueFu
>> i checkout a spark branch from sparkgithub(tags:v1.2.0-snapshot0)and i
>> compare this spark's pom.xml with spark-parent-1.2.0-SNAPSHOT.pom(get from
>> http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
>> there is only difference is follow:
>> in spark-parent-1.2.0-SNAPSHOT.pom
>>   <artifactId>spark-parent</artifactId>
>>   <version>1.2.0-SNAPSHOT</version>
>> and in v1.2.0-snapshot0
>> <artifactId>spark-parent</artifactId>
>>   <version>1.2.0</version>
>> i think there is no essence diff,and i built v1.2.0-snapshot0 and deploy
>> it as my spark clusters
>> when i run query about join two table ,it still give some error what i
>> show u earlier
>>
>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times,
>> most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
>> java.lang.NullPointerException+details
>>
>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>> 	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>> 	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>> 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>> 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>> 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> 	at org.apache.spark.scheduler.Task.run(Task.scala:56)
>> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> 	at java.lang.Thread.run(Thread.java:722)
>>
>> Driver stacktrace:
>>
>>
>>
>>  i think my spark clusters did't had any problem,but why always give me
>> such error
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 2014/12/2 13:39, Xuefu Zhang wrote:
>>
>>  You need to build your spark assembly from spark 1.2 branch. this
>> should give your both a spark build as well as spark-assembly jar, which
>> you need to copy to Hive lib directory. Snapshot is fine, and spark 1.2
>> hasn't been released yet.
>>
>>  --Xuefu
>>
>> On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1 <yu...@huawei.com> wrote:
>>
>>>
>>>
>>> hi.XueFu,
>>> thanks a lot for your inforamtion,but as far as i know ,the latest spark
>>> version on github is spark-snapshot-1.3,but there is no spark-1.2,only have
>>> a branch-1.2 with spark-snapshot-1.2,can u tell me which spark version i
>>> should built,and for now,that's
>>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error like that
>>>
>>>
>>> On 2014/12/2 11:03, Xuefu Zhang wrote:
>>>
>>>  It seems that wrong class, HiveInputFormat, is loaded. The stacktrace
>>> is way off the current Hive code. You need to build Spark 1.2 and copy
>>> spark-assembly jar to Hive's lib directory and that it.
>>>
>>>  --Xuefu
>>>
>>> On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yu...@huawei.com> wrote:
>>>
>>>>  hi,i built a hive on spark package and my spark assembly jar is
>>>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a query in hive
>>>> shell,before execute this query,
>>>> i set all the  require which hive need with  spark.and i execute a join
>>>> query :
>>>> select distinct st.sno,sname from student st join score sc
>>>> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
>>>> but it failed,
>>>> get follow error in spark webUI:
>>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times,
>>>> most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
>>>> java.lang.NullPointerException+details
>>>>
>>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>>> 	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>>> 	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>>> 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>>> 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>>> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>> 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>>> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>> 	at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>>> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>> 	at java.lang.Thread.run(Thread.java:722)
>>>>
>>>> Driver stacktrace:
>>>>
>>>>
>>>>  can u give me a help to deal this probelm,and i think my built was
>>>> succussed!
>>>>
>>>
>>>
>>>
>>
>>
>
>

Re: Job aborted due to stage failure

Posted by yuemeng1 <yu...@huawei.com>.
hi,XueFu,thanks a lot for your help,now i will provide more detail to 
reproduce this ssue:
1),i checkout a spark branch from hive 
github(https://github.com/apache/hive/tree/spark on Nov 29,becasue of 
for version now it will give something wrong about:Caused by: 
java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ),
and built command:mvn clean package -DskipTests -Phadoop-2 -Pdist
after built i get package from 
:/home/ym/hive-on-spark/hive1129/hive/packaging/target(apache-hive-0.15.0-SNAPSHOT-bin.tar.gz)
2)i checkout spark from 
https://github.com/apache/spark/tree/v1.2.0-snapshot0,becasue of spark 
branch-1.2 is with spark parent(1.2.1-SNAPSHOT),so i chose 
v1.2.0-snapshot0 and i compare this spark's pom.xml with 
spark-parent-1.2.0-SNAPSHOT.pom(get from 
http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and 
there is only difference is spark-parent name,and built command is :

|mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean package|

3)comand i execute in hive-shell:
./hive --auxpath 
/opt/hispark/spark/assembly/target/scala-2.10/spark-assembly-1.2.0-hadoop2.4.0.jar(copy 
this jar to hive dir lib already)
create table student(sno int,sname string,sage int,ssex string) row 
format delimited FIELDS TERMINATED BY ',';
create table score(sno int,cno int,sage int) row format delimited FIELDS 
TERMINATED BY ',';
load data local inpath 
'/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/student.txt' 
into table student;
load data local inpath 
'/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/score.txt' 
into table score;
set hive.execution.engine=spark;
set spark.master=spark://10.175.xxx.xxx:7077;
set spark.eventLog.enabled=true;
set spark.executor.memory=9086m;
set spark.serializer=org.apache.spark.serializer.KryoSerializer;
select distinct st.sno,sname from student st join score sc 
on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;(work in mr)
4)
studdent.txt file
1,rsh,27,female
2,kupo,28,male
3,astin,29,female
4,beike,30,male
5,aili,31,famle

score.txt file
1,10,80
2,11,85
3,12,90
4,13,95
5,14,100





























On 2014/12/2 23:28, Xuefu Zhang wrote:
> Could you provide details on how to reproduce the issue? such as the 
> exact spark branch, the command to build Spark, how you build Hive, 
> and what queries/commands you run.
>
> We are running Hive on Spark all the time. Our pre-commit test runs 
> without any issue.
>
> Thanks,
> Xuefu
>
> On Tue, Dec 2, 2014 at 4:13 AM, yuemeng1 <yuemeng1@huawei.com 
> <ma...@huawei.com>> wrote:
>
>     hi,XueFu
>     i checkout a spark branch from
>     sparkgithub(tags:v1.2.0-snapshot0)and i compare this spark's
>     pom.xml with spark-parent-1.2.0-SNAPSHOT.pom(get from
>     http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
>     there is only difference is follow:
>     in spark-parent-1.2.0-SNAPSHOT.pom
>       <artifactId>spark-parent</artifactId>
>       <version>1.2.0-SNAPSHOT</version>
>     and in v1.2.0-snapshot0
>     <artifactId>spark-parent</artifactId>
>       <version>1.2.0</version>
>     i think there is no essence diff,and i built v1.2.0-snapshot0 and
>     deploy it as my spark clusters
>     when i run query about join two table ,it still give some error
>     what i show u earlier
>
>     Job aborted due to stage failure: Task 0 in stage 1.0 failed 4
>     times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7,
>     datasight18): java.lang.NullPointerException+details
>
>     Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>     	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>     	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>     	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>     	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>     	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>     	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>     	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>     	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>     	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>     	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>     	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>     	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>     	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>     	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>     	at org.apache.spark.scheduler.Task.run(Task.scala:56)
>     	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>     	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>     	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>     	at java.lang.Thread.run(Thread.java:722)
>
>     Driver stacktrace:
>
>
>
>     i think my spark clusters did't had any problem,but why always
>     give me such error
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>     On 2014/12/2 13:39, Xuefu Zhang wrote:
>>     You need to build your spark assembly from spark 1.2 branch. this
>>     should give your both a spark build as well as spark-assembly
>>     jar, which you need to copy to Hive lib directory. Snapshot is
>>     fine, and spark 1.2 hasn't been released yet.
>>
>>     --Xuefu
>>
>>     On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1 <yuemeng1@huawei.com
>>     <ma...@huawei.com>> wrote:
>>
>>
>>
>>         hi.XueFu,
>>         thanks a lot for your inforamtion,but as far as i know ,the
>>         latest spark version on github is spark-snapshot-1.3,but
>>         there is no spark-1.2,only have a branch-1.2 with
>>         spark-snapshot-1.2,can u tell me which spark version i should
>>         built,and for now,that's
>>         spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error
>>         like that
>>
>>
>>         On 2014/12/2 11:03, Xuefu Zhang wrote:
>>>         It seems that wrong class, HiveInputFormat, is loaded. The
>>>         stacktrace is way off the current Hive code. You need to
>>>         build Spark 1.2 and copy spark-assembly jar to Hive's lib
>>>         directory and that it.
>>>
>>>         --Xuefu
>>>
>>>         On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1
>>>         <yuemeng1@huawei.com <ma...@huawei.com>> wrote:
>>>
>>>             hi,i built a hive on spark package and my spark assembly
>>>             jar is
>>>             spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run
>>>             a query in hive shell,before execute this query,
>>>             i set all the  require which hive need with  spark.and i
>>>             execute a join query :
>>>             select distinct st.sno,sname from student st join score
>>>             sc on(st.sno=sc.sno) where sc.cno IN(11,12,13) and
>>>             st.sage > 28;
>>>             but it failed,
>>>             get follow error in spark webUI:
>>>             Job aborted due to stage failure: Task 0 in stage 1.0
>>>             failed 4 times, most recent failure: Lost task 0.3 in
>>>             stage 1.0 (TID 7, datasight18):
>>>             java.lang.NullPointerException+details
>>>
>>>             Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>>             	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>>             	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>>             	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>>             	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>>             	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>>             	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>>             	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>>             	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>             	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>             	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>>             	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>             	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>             	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>>             	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>             	at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>>             	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>>             	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>             	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>             	at java.lang.Thread.run(Thread.java:722)
>>>
>>>             Driver stacktrace:
>>>
>>>             can u give me a help to deal this probelm,and i think my
>>>             built was succussed!
>>>
>>>
>>
>>
>
>


Re: Job aborted due to stage failure

Posted by Xuefu Zhang <xz...@cloudera.com>.
Could you provide details on how to reproduce the issue? such as the exact
spark branch, the command to build Spark, how you build Hive, and what
queries/commands you run.

We are running Hive on Spark all the time. Our pre-commit test runs without
any issue.

Thanks,
Xuefu

On Tue, Dec 2, 2014 at 4:13 AM, yuemeng1 <yu...@huawei.com> wrote:

>  hi,XueFu
> i checkout a spark branch from sparkgithub(tags:v1.2.0-snapshot0)and i
> compare this spark's pom.xml with spark-parent-1.2.0-SNAPSHOT.pom(get from
> http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
> there is only difference is follow:
> in spark-parent-1.2.0-SNAPSHOT.pom
>   <artifactId>spark-parent</artifactId>
>   <version>1.2.0-SNAPSHOT</version>
> and in v1.2.0-snapshot0
> <artifactId>spark-parent</artifactId>
>   <version>1.2.0</version>
> i think there is no essence diff,and i built v1.2.0-snapshot0 and deploy
> it as my spark clusters
> when i run query about join two table ,it still give some error what i
> show u earlier
>
> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most
> recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
> java.lang.NullPointerException+details
>
> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
> 	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
> 	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
> 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
> 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> 	at org.apache.spark.scheduler.Task.run(Task.scala:56)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> 	at java.lang.Thread.run(Thread.java:722)
>
> Driver stacktrace:
>
>
>
> i think my spark clusters did't had any problem,but why always give me
> such error
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On 2014/12/2 13:39, Xuefu Zhang wrote:
>
>  You need to build your spark assembly from spark 1.2 branch. this should
> give your both a spark build as well as spark-assembly jar, which you need
> to copy to Hive lib directory. Snapshot is fine, and spark 1.2 hasn't been
> released yet.
>
>  --Xuefu
>
> On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1 <yu...@huawei.com> wrote:
>
>>
>>
>> hi.XueFu,
>> thanks a lot for your inforamtion,but as far as i know ,the latest spark
>> version on github is spark-snapshot-1.3,but there is no spark-1.2,only have
>> a branch-1.2 with spark-snapshot-1.2,can u tell me which spark version i
>> should built,and for now,that's
>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error like that
>>
>>
>> On 2014/12/2 11:03, Xuefu Zhang wrote:
>>
>>  It seems that wrong class, HiveInputFormat, is loaded. The stacktrace
>> is way off the current Hive code. You need to build Spark 1.2 and copy
>> spark-assembly jar to Hive's lib directory and that it.
>>
>>  --Xuefu
>>
>> On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yu...@huawei.com> wrote:
>>
>>>  hi,i built a hive on spark package and my spark assembly jar is
>>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a query in hive
>>> shell,before execute this query,
>>> i set all the  require which hive need with  spark.and i execute a join
>>> query :
>>> select distinct st.sno,sname from student st join score sc
>>> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
>>> but it failed,
>>> get follow error in spark webUI:
>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times,
>>> most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
>>> java.lang.NullPointerException+details
>>>
>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>> 	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>> 	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>> 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>> 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>> 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>> 	at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>> 	at java.lang.Thread.run(Thread.java:722)
>>>
>>> Driver stacktrace:
>>>
>>>
>>>  can u give me a help to deal this probelm,and i think my built was
>>> succussed!
>>>
>>
>>
>>
>
>

RE: Job aborted due to stage failure

Posted by Mike Roberts <mi...@spyfu.com>.
unsubscribe

From: yuemeng1 [mailto:yuemeng1@huawei.com]
Sent: Tuesday, December 2, 2014 5:14 AM
To: user@hive.apache.org
Subject: Re: Job aborted due to stage failure

hi,XueFu
i checkout a spark branch from sparkgithub(tags:v1.2.0-snapshot0)and i compare this spark's pom.xml with spark-parent-1.2.0-SNAPSHOT.pom(get from http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and there is only difference is follow:
in spark-parent-1.2.0-SNAPSHOT.pom
  <artifactId>spark-parent</artifactId>
  <version>1.2.0-SNAPSHOT</version>
and in v1.2.0-snapshot0
<artifactId>spark-parent</artifactId>
  <version>1.2.0</version>
i think there is no essence diff,and i built v1.2.0-snapshot0 and deploy it as my spark clusters
when i run query about join two table ,it still give some error what i show u earlier

Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException+details

Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException

          at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)

          at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)

          at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)

          at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)

          at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)

          at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)

          at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)

          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)

          at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)

          at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)

          at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)

          at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)

          at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)

          at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)

          at org.apache.spark.scheduler.Task.run(Task.scala:56)

          at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)

          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

          at java.lang.Thread.run(Thread.java:722)



Driver stacktrace:


i think my spark clusters did't had any problem,but why always give me such error






















On 2014/12/2 13:39, Xuefu Zhang wrote:
You need to build your spark assembly from spark 1.2 branch. this should give your both a spark build as well as spark-assembly jar, which you need to copy to Hive lib directory. Snapshot is fine, and spark 1.2 hasn't been released yet.
--Xuefu

On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1 <yu...@huawei.com>> wrote:


hi.XueFu,
thanks a lot for your inforamtion,but as far as i know ,the latest spark version on github is spark-snapshot-1.3,but there is no spark-1.2,only have a branch-1.2 with spark-snapshot-1.2,can u tell me which spark version i should built,and for now,that's spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error like that


On 2014/12/2 11:03, Xuefu Zhang wrote:
It seems that wrong class, HiveInputFormat, is loaded. The stacktrace is way off the current Hive code. You need to build Spark 1.2 and copy spark-assembly jar to Hive's lib directory and that it.
--Xuefu

On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yu...@huawei.com>> wrote:
hi,i built a hive on spark package and my spark assembly jar is spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a query in hive shell,before execute this query,
i set all the  require which hive need with  spark.and i execute a join query :
select distinct st.sno,sname from student st join score sc on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
but it failed,
get follow error in spark webUI:
Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException+details

Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException

         at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)

         at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)

         at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)

         at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)

         at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)

         at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)

         at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)

         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)

         at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)

         at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)

         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)

         at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)

         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)

         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)

         at org.apache.spark.scheduler.Task.run(Task.scala:56)

         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)

         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

         at java.lang.Thread.run(Thread.java:722)



Driver stacktrace:


can u give me a help to deal this probelm,and i think my built was succussed!





Re: Job aborted due to stage failure

Posted by yuemeng1 <yu...@huawei.com>.
hi,XueFu
i checkout a spark branch from sparkgithub(tags:v1.2.0-snapshot0)and i 
compare this spark's pom.xml with spark-parent-1.2.0-SNAPSHOT.pom(get 
from 
http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and 
there is only difference is follow:
in spark-parent-1.2.0-SNAPSHOT.pom
   <artifactId>spark-parent</artifactId>
   <version>1.2.0-SNAPSHOT</version>
and in v1.2.0-snapshot0
<artifactId>spark-parent</artifactId>
   <version>1.2.0</version>
i think there is no essence diff,and i built v1.2.0-snapshot0 and deploy 
it as my spark clusters
when i run query about join two table ,it still give some error what i 
show u earlier

Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, 
most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): 
java.lang.NullPointerException+details

Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:56)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)

Driver stacktrace:



i think my spark clusters did't had any problem,but why always give me 
such error






















On 2014/12/2 13:39, Xuefu Zhang wrote:
> You need to build your spark assembly from spark 1.2 branch. this 
> should give your both a spark build as well as spark-assembly jar, 
> which you need to copy to Hive lib directory. Snapshot is fine, and 
> spark 1.2 hasn't been released yet.
>
> --Xuefu
>
> On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1 <yuemeng1@huawei.com 
> <ma...@huawei.com>> wrote:
>
>
>
>     hi.XueFu,
>     thanks a lot for your inforamtion,but as far as i know ,the latest
>     spark version on github is spark-snapshot-1.3,but there is no
>     spark-1.2,only have a branch-1.2 with spark-snapshot-1.2,can u
>     tell me which spark version i should built,and for now,that's
>     spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error like that
>
>
>     On 2014/12/2 11:03, Xuefu Zhang wrote:
>>     It seems that wrong class, HiveInputFormat, is loaded. The
>>     stacktrace is way off the current Hive code. You need to build
>>     Spark 1.2 and copy spark-assembly jar to Hive's lib directory and
>>     that it.
>>
>>     --Xuefu
>>
>>     On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yuemeng1@huawei.com
>>     <ma...@huawei.com>> wrote:
>>
>>         hi,i built a hive on spark package and my spark assembly jar
>>         is spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a
>>         query in hive shell,before execute this query,
>>         i set all the  require which hive need with spark.and i
>>         execute a join query :
>>         select distinct st.sno,sname from student st join score sc
>>         on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
>>         but it failed,
>>         get follow error in spark webUI:
>>         Job aborted due to stage failure: Task 0 in stage 1.0 failed
>>         4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID
>>         7, datasight18): java.lang.NullPointerException+details
>>
>>         Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>         	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>         	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>         	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>         	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>         	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>         	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>         	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>         	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>         	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>         	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>         	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>         	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>         	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>         	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>         	at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>         	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>         	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>         	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>         	at java.lang.Thread.run(Thread.java:722)
>>
>>         Driver stacktrace:
>>
>>         can u give me a help to deal this probelm,and i think my
>>         built was succussed!
>>
>>
>
>


Re: Job aborted due to stage failure

Posted by yuemeng1 <yu...@huawei.com>.
hi,i checkout a spark 1.2 branch from spark github,and built,then copy 
spark assembly jar into Hive lib directory,but when i run this qeury ,it 
still give me this error.
i am very confused,how can i let hive on spark work!

On 2014/12/2 13:39, Xuefu Zhang wrote:
> You need to build your spark assembly from spark 1.2 branch. this 
> should give your both a spark build as well as spark-assembly jar, 
> which you need to copy to Hive lib directory. Snapshot is fine, and 
> spark 1.2 hasn't been released yet.
>
> --Xuefu
>
> On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1 <yuemeng1@huawei.com 
> <ma...@huawei.com>> wrote:
>
>
>
>     hi.XueFu,
>     thanks a lot for your inforamtion,but as far as i know ,the latest
>     spark version on github is spark-snapshot-1.3,but there is no
>     spark-1.2,only have a branch-1.2 with spark-snapshot-1.2,can u
>     tell me which spark version i should built,and for now,that's
>     spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error like that
>
>
>     On 2014/12/2 11:03, Xuefu Zhang wrote:
>>     It seems that wrong class, HiveInputFormat, is loaded. The
>>     stacktrace is way off the current Hive code. You need to build
>>     Spark 1.2 and copy spark-assembly jar to Hive's lib directory and
>>     that it.
>>
>>     --Xuefu
>>
>>     On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yuemeng1@huawei.com
>>     <ma...@huawei.com>> wrote:
>>
>>         hi,i built a hive on spark package and my spark assembly jar
>>         is spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a
>>         query in hive shell,before execute this query,
>>         i set all the  require which hive need with spark.and i
>>         execute a join query :
>>         select distinct st.sno,sname from student st join score sc
>>         on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
>>         but it failed,
>>         get follow error in spark webUI:
>>         Job aborted due to stage failure: Task 0 in stage 1.0 failed
>>         4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID
>>         7, datasight18): java.lang.NullPointerException+details
>>
>>         Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>         	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>         	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>         	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>         	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>         	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>         	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>         	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>         	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>         	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>         	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>         	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>         	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>         	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>         	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>         	at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>         	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>         	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>         	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>         	at java.lang.Thread.run(Thread.java:722)
>>
>>         Driver stacktrace:
>>
>>         can u give me a help to deal this probelm,and i think my
>>         built was succussed!
>>
>>
>
>


Re: Job aborted due to stage failure

Posted by Xuefu Zhang <xz...@cloudera.com>.
You need to build your spark assembly from spark 1.2 branch. this should
give your both a spark build as well as spark-assembly jar, which you need
to copy to Hive lib directory. Snapshot is fine, and spark 1.2 hasn't been
released yet.

--Xuefu

On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1 <yu...@huawei.com> wrote:

>
>
> hi.XueFu,
> thanks a lot for your inforamtion,but as far as i know ,the latest spark
> version on github is spark-snapshot-1.3,but there is no spark-1.2,only have
> a branch-1.2 with spark-snapshot-1.2,can u tell me which spark version i
> should built,and for now,that's
> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error like that
>
>
> On 2014/12/2 11:03, Xuefu Zhang wrote:
>
>  It seems that wrong class, HiveInputFormat, is loaded. The stacktrace is
> way off the current Hive code. You need to build Spark 1.2 and copy
> spark-assembly jar to Hive's lib directory and that it.
>
>  --Xuefu
>
> On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yu...@huawei.com> wrote:
>
>>  hi,i built a hive on spark package and my spark assembly jar is
>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a query in hive
>> shell,before execute this query,
>> i set all the  require which hive need with  spark.and i execute a join
>> query :
>> select distinct st.sno,sname from student st join score sc
>> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
>> but it failed,
>> get follow error in spark webUI:
>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times,
>> most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
>> java.lang.NullPointerException+details
>>
>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>> 	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>> 	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>> 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>> 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>> 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> 	at org.apache.spark.scheduler.Task.run(Task.scala:56)
>> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> 	at java.lang.Thread.run(Thread.java:722)
>>
>> Driver stacktrace:
>>
>>
>>  can u give me a help to deal this probelm,and i think my built was
>> succussed!
>>
>
>
>

Re: Job aborted due to stage failure

Posted by yuemeng1 <yu...@huawei.com>.

hi.XueFu,
thanks a lot for your inforamtion,but as far as i know ,the latest spark 
version on github is spark-snapshot-1.3,but there is no spark-1.2,only 
have a branch-1.2 with spark-snapshot-1.2,can u tell me which spark 
version i should built,and for now,that's 
spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error like that

On 2014/12/2 11:03, Xuefu Zhang wrote:
> It seems that wrong class, HiveInputFormat, is loaded. The stacktrace 
> is way off the current Hive code. You need to build Spark 1.2 and copy 
> spark-assembly jar to Hive's lib directory and that it.
>
> --Xuefu
>
> On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yuemeng1@huawei.com 
> <ma...@huawei.com>> wrote:
>
>     hi,i built a hive on spark package and my spark assembly jar is
>     spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a query
>     in hive shell,before execute this query,
>     i set all the  require which hive need with  spark.and i execute a
>     join query :
>     select distinct st.sno,sname from student st join score sc
>     on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
>     but it failed,
>     get follow error in spark webUI:
>     Job aborted due to stage failure: Task 0 in stage 1.0 failed 4
>     times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7,
>     datasight18): java.lang.NullPointerException+details
>
>     Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>     	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>     	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>     	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>     	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>     	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>     	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>     	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>     	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>     	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>     	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>     	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>     	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>     	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>     	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>     	at org.apache.spark.scheduler.Task.run(Task.scala:56)
>     	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>     	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>     	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>     	at java.lang.Thread.run(Thread.java:722)
>
>     Driver stacktrace:
>
>     can u give me a help to deal this probelm,and i think my built was
>     succussed!
>
>


Re: Job aborted due to stage failure

Posted by Xuefu Zhang <xz...@cloudera.com>.
It seems that wrong class, HiveInputFormat, is loaded. The stacktrace is
way off the current Hive code. You need to build Spark 1.2 and copy
spark-assembly jar to Hive's lib directory and that it.

--Xuefu

On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yu...@huawei.com> wrote:

>  hi,i built a hive on spark package and my spark assembly jar is
> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a query in hive
> shell,before execute this query,
> i set all the  require which hive need with  spark.and i execute a join
> query :
> select distinct st.sno,sname from student st join score sc
> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
> but it failed,
> get follow error in spark webUI:
> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most
> recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
> java.lang.NullPointerException+details
>
> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
> 	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
> 	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
> 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
> 	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> 	at org.apache.spark.scheduler.Task.run(Task.scala:56)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> 	at java.lang.Thread.run(Thread.java:722)
>
> Driver stacktrace:
>
>
>  can u give me a help to deal this probelm,and i think my built was
> succussed!
>