You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by yuemeng1 <yu...@huawei.com> on 2014/12/02 03:22:27 UTC
Job aborted due to stage failure
hi,i built a hive on spark package and my spark assembly jar is
spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a query in hive
shell,before execute this query,
i set all the require which hive need with spark.and i execute a join
query :
select distinct st.sno,sname from student st join score sc
on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
but it failed,
get follow error in spark webUI:
Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times,
most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
java.lang.NullPointerException+details
Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Driver stacktrace:
can u give me a help to deal this probelm,and i think my built was
succussed!
Re: Job aborted due to stage failure
Posted by yuemeng1 <yu...@huawei.com>.
Hi,Lefty
Currently, I have some other things to do, I'm going to edit wikidocs
tomorrow
thanks
Yuemeng
On 2014/12/5 9:59, Lefty Leverenz wrote:
> Yuemeng, you can find out how to edit wikidocs here: About This Wiki
> <https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-Howtogetpermissiontoedit>.
>
> -- Lefty
>
> On Wed, Dec 3, 2014 at 10:05 PM, Xuefu Zhang <xzhang@cloudera.com
> <ma...@cloudera.com>> wrote:
>
> Hi Yuemeng,
>
> I'm glad that Hive on Spark finally works for you. As you know,
> this project is still in development and yet to be released. Thus,
> please forgive about the lack of proper documentation. We have a
> "Get Started" page that's linked in HIVE-7292. If you can improve
> the document there, it would be very helpful for other Hive users.
>
> Thanks,
> Xuefu
>
> On Wed, Dec 3, 2014 at 5:42 PM, yuemeng1 <yuemeng1@huawei.com
> <ma...@huawei.com>> wrote:
>
> hi,thanks a lot for your help,with your help ,my hive-on-spark
> can work well now
> it take me long time to install and deploy.here are some
> advice,i think we need to improve the installation
> documentation, allowing users to use the least amount of time
> to compile and install
> 1)add which spark version we should pick from spark github if
> we select built spark instead of download a spark
> pre-built,tell them the right built commad!(not include Pyarn
> ,Phive)
> 2)if they get some error during built ,such as
> [ERRO/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobStatus.java:[22,24]cannot
> find symbol
> [ERROR]symbol: class JobExecutionStatus,tell them what they
> can do?
> for our users,first to use it ,then feel good or bad?
> and if u need,i can add something to start document
>
>
> thanks
> yuemeng
>
>
>
>
>
>
> On 2014/12/3 11:03, Xuefu Zhang wrote:
>> When you build Spark, remove -Phive as well as -Pyarn. When
>> you run hive queries, you may need to run "set
>> spark.home=/path/to/spark/dir";
>>
>> Thanks,
>> Xuefu
>>
>> On Tue, Dec 2, 2014 at 6:29 PM, yuemeng1 <yuemeng1@huawei.com
>> <ma...@huawei.com>> wrote:
>>
>> hi,XueFu,thanks a lot for your help,now i will provide
>> more detail to reproduce this ssue:
>> 1),i checkout a spark branch from hive
>> github(https://github.com/apache/hive/tree/spark on Nov
>> 29,becasue of for version now it will give something
>> wrong about:Caused by: java.lang.RuntimeException: Unable
>> to instantiate
>> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
>> ),
>> and built command:mvn clean package -DskipTests
>> -Phadoop-2 -Pdist
>> after built i get package from
>> :/home/ym/hive-on-spark/hive1129/hive/packaging/target(apache-hive-0.15.0-SNAPSHOT-bin.tar.gz)
>> 2)i checkout spark from
>> https://github.com/apache/spark/tree/v1.2.0-snapshot0,becasue
>> of spark branch-1.2 is with spark
>> parent(1.2.1-SNAPSHOT),so i chose v1.2.0-snapshot0 and i
>> compare this spark's pom.xml with
>> spark-parent-1.2.0-SNAPSHOT.pom(get from
>> http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
>> there is only difference is spark-parent name,and built
>> command is :
>>
>> |mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean package|
>>
>> 3)comand i execute in hive-shell:
>> ./hive --auxpath
>> /opt/hispark/spark/assembly/target/scala-2.10/spark-assembly-1.2.0-hadoop2.4.0.jar(copy
>> this jar to hive dir lib already)
>> create table student(sno int,sname string,sage int,ssex
>> string) row format delimited FIELDS TERMINATED BY ',';
>> create table score(sno int,cno int,sage int) row format
>> delimited FIELDS TERMINATED BY ',';
>> load data local inpath
>> '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/student.txt'
>> into table student;
>> load data local inpath
>> '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/score.txt'
>> into table score;
>> set hive.execution.engine=spark;
>> set spark.master=spark://10.175.xxx.xxx:7077;
>> set spark.eventLog.enabled=true;
>> set spark.executor.memory=9086m;
>> set
>> spark.serializer=org.apache.spark.serializer.KryoSerializer;
>> select distinct st.sno,sname from student st join score
>> sc on(st.sno=sc.sno) where sc.cno IN(11,12,13) and
>> st.sage > 28;(work in mr)
>> 4)
>> studdent.txt file
>> 1,rsh,27,female
>> 2,kupo,28,male
>> 3,astin,29,female
>> 4,beike,30,male
>> 5,aili,31,famle
>>
>> score.txt file
>> 1,10,80
>> 2,11,85
>> 3,12,90
>> 4,13,95
>> 5,14,100
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 2014/12/2 23:28, Xuefu Zhang wrote:
>>> Could you provide details on how to reproduce the issue?
>>> such as the exact spark branch, the command to build
>>> Spark, how you build Hive, and what queries/commands you
>>> run.
>>>
>>> We are running Hive on Spark all the time. Our
>>> pre-commit test runs without any issue.
>>>
>>> Thanks,
>>> Xuefu
>>>
>>> On Tue, Dec 2, 2014 at 4:13 AM, yuemeng1
>>> <yuemeng1@huawei.com <ma...@huawei.com>> wrote:
>>>
>>> hi,XueFu
>>> i checkout a spark branch from
>>> sparkgithub(tags:v1.2.0-snapshot0)and i compare this
>>> spark's pom.xml with
>>> spark-parent-1.2.0-SNAPSHOT.pom(get from
>>> http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
>>> there is only difference is follow:
>>> in spark-parent-1.2.0-SNAPSHOT.pom
>>> <artifactId>spark-parent</artifactId>
>>> <version>1.2.0-SNAPSHOT</version>
>>> and in v1.2.0-snapshot0
>>> <artifactId>spark-parent</artifactId>
>>> <version>1.2.0</version>
>>> i think there is no essence diff,and i built
>>> v1.2.0-snapshot0 and deploy it as my spark clusters
>>> when i run query about join two table ,it still give
>>> some error what i show u earlier
>>>
>>> Job aborted due to stage failure: Task 0 in stage
>>> 1.0 failed 4 times, most recent failure: Lost task
>>> 0.3 in stage 1.0 (TID 7, datasight18):
>>> java.lang.NullPointerException+details
>>>
>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>> at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>> at java.lang.Thread.run(Thread.java:722)
>>>
>>> Driver stacktrace:
>>>
>>>
>>>
>>> i think my spark clusters did't had any problem,but
>>> why always give me such error
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 2014/12/2 13:39, Xuefu Zhang wrote:
>>>> You need to build your spark assembly from spark
>>>> 1.2 branch. this should give your both a spark
>>>> build as well as spark-assembly jar, which you need
>>>> to copy to Hive lib directory. Snapshot is fine,
>>>> and spark 1.2 hasn't been released yet.
>>>>
>>>> --Xuefu
>>>>
>>>> On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1
>>>> <yuemeng1@huawei.com <ma...@huawei.com>>
>>>> wrote:
>>>>
>>>>
>>>>
>>>> hi.XueFu,
>>>> thanks a lot for your inforamtion,but as far as
>>>> i know ,the latest spark version on github is
>>>> spark-snapshot-1.3,but there is no
>>>> spark-1.2,only have a branch-1.2 with
>>>> spark-snapshot-1.2,can u tell me which spark
>>>> version i should built,and for now,that's
>>>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar
>>>> produce error like that
>>>>
>>>>
>>>> On 2014/12/2 11:03, Xuefu Zhang wrote:
>>>>> It seems that wrong class, HiveInputFormat, is
>>>>> loaded. The stacktrace is way off the current
>>>>> Hive code. You need to build Spark 1.2 and
>>>>> copy spark-assembly jar to Hive's lib
>>>>> directory and that it.
>>>>>
>>>>> --Xuefu
>>>>>
>>>>> On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1
>>>>> <yuemeng1@huawei.com
>>>>> <ma...@huawei.com>> wrote:
>>>>>
>>>>> hi,i built a hive on spark package and my
>>>>> spark assembly jar is
>>>>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when
>>>>> i run a query in hive shell,before execute
>>>>> this query,
>>>>> i set all the require which hive need with
>>>>> spark.and i execute a join query :
>>>>> select distinct st.sno,sname from student
>>>>> st join score sc on(st.sno=sc.sno) where
>>>>> sc.cno IN(11,12,13) and st.sage > 28;
>>>>> but it failed,
>>>>> get follow error in spark webUI:
>>>>> Job aborted due to stage failure: Task 0
>>>>> in stage 1.0 failed 4 times, most recent
>>>>> failure: Lost task 0.3 in stage 1.0 (TID
>>>>> 7, datasight18):
>>>>> java.lang.NullPointerException+details
>>>>>
>>>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>>>> at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>>>> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>>>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>>> at java.lang.Thread.run(Thread.java:722)
>>>>>
>>>>> Driver stacktrace:
>>>>>
>>>>> can u give me a help to deal this
>>>>> probelm,and i think my built was succussed!
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
>
Re: Job aborted due to stage failure
Posted by Lefty Leverenz <le...@gmail.com>.
Yuemeng, you can find out how to edit wikidocs here: About This Wiki
<https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-Howtogetpermissiontoedit>
.
-- Lefty
On Wed, Dec 3, 2014 at 10:05 PM, Xuefu Zhang <xz...@cloudera.com> wrote:
> Hi Yuemeng,
>
> I'm glad that Hive on Spark finally works for you. As you know, this
> project is still in development and yet to be released. Thus, please
> forgive about the lack of proper documentation. We have a "Get Started"
> page that's linked in HIVE-7292. If you can improve the document there, it
> would be very helpful for other Hive users.
>
> Thanks,
> Xuefu
>
> On Wed, Dec 3, 2014 at 5:42 PM, yuemeng1 <yu...@huawei.com> wrote:
>
>> hi,thanks a lot for your help,with your help ,my hive-on-spark can work
>> well now
>> it take me long time to install and deploy.here are some advice,i think
>> we need to improve the installation documentation, allowing users to use
>> the least amount of time to compile and install
>> 1)add which spark version we should pick from spark github if we select
>> built spark instead of download a spark pre-built,tell them the right built
>> commad!(not include Pyarn ,Phive)
>> 2)if they get some error during built ,such as [ERRO
>> /hive/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobStatus.java:
>> [22,24] cannot find symbol
>> [ERROR] symbol: class JobExecutionStatus,tell them what they can do?
>> for our users,first to use it ,then feel good or bad?
>> and if u need,i can add something to start document
>>
>>
>> thanks
>> yuemeng
>>
>>
>>
>>
>>
>>
>> On 2014/12/3 11:03, Xuefu Zhang wrote:
>>
>> When you build Spark, remove -Phive as well as -Pyarn. When you run
>> hive queries, you may need to run "set spark.home=/path/to/spark/dir";
>>
>> Thanks,
>> Xuefu
>>
>> On Tue, Dec 2, 2014 at 6:29 PM, yuemeng1 <yu...@huawei.com> wrote:
>>
>>> hi,XueFu,thanks a lot for your help,now i will provide more detail to
>>> reproduce this ssue:
>>> 1),i checkout a spark branch from hive github(
>>> https://github.com/apache/hive/tree/spark on Nov 29,becasue of for
>>> version now it will give something wrong about:Caused by:
>>> java.lang.RuntimeException: Unable to instantiate
>>> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ),
>>> and built command:mvn clean package -DskipTests -Phadoop-2 -Pdist
>>> after built i get package from
>>> :/home/ym/hive-on-spark/hive1129/hive/packaging/target(apache-hive-0.15.0-SNAPSHOT-bin.tar.gz)
>>> 2)i checkout spark from
>>> https://github.com/apache/spark/tree/v1.2.0-snapshot0,becasue of spark
>>> branch-1.2 is with spark parent(1.2.1-SNAPSHOT),so i chose
>>> v1.2.0-snapshot0 and i compare this spark's pom.xml with
>>> spark-parent-1.2.0-SNAPSHOT.pom(get from
>>> http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
>>> there is only difference is spark-parent name,and built command is :
>>>
>>> mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean package
>>>
>>> 3)comand i execute in hive-shell:
>>> ./hive --auxpath
>>> /opt/hispark/spark/assembly/target/scala-2.10/spark-assembly-1.2.0-hadoop2.4.0.jar(copy
>>> this jar to hive dir lib already)
>>> create table student(sno int,sname string,sage int,ssex string) row
>>> format delimited FIELDS TERMINATED BY ',';
>>> create table score(sno int,cno int,sage int) row format delimited FIELDS
>>> TERMINATED BY ',';
>>> load data local inpath
>>> '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/student.txt'
>>> into table student;
>>> load data local inpath
>>> '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/score.txt'
>>> into table score;
>>> set hive.execution.engine=spark;
>>> set spark.master=spark://10.175.xxx.xxx:7077;
>>> set spark.eventLog.enabled=true;
>>> set spark.executor.memory=9086m;
>>> set spark.serializer=org.apache.spark.serializer.KryoSerializer;
>>> select distinct st.sno,sname from student st join score sc
>>> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;(work in mr)
>>> 4)
>>> studdent.txt file
>>> 1,rsh,27,female
>>> 2,kupo,28,male
>>> 3,astin,29,female
>>> 4,beike,30,male
>>> 5,aili,31,famle
>>>
>>> score.txt file
>>> 1,10,80
>>> 2,11,85
>>> 3,12,90
>>> 4,13,95
>>> 5,14,100
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 2014/12/2 23:28, Xuefu Zhang wrote:
>>>
>>> Could you provide details on how to reproduce the issue? such as the
>>> exact spark branch, the command to build Spark, how you build Hive, and
>>> what queries/commands you run.
>>>
>>> We are running Hive on Spark all the time. Our pre-commit test runs
>>> without any issue.
>>>
>>> Thanks,
>>> Xuefu
>>>
>>> On Tue, Dec 2, 2014 at 4:13 AM, yuemeng1 <yu...@huawei.com> wrote:
>>>
>>>> hi,XueFu
>>>> i checkout a spark branch from sparkgithub(tags:v1.2.0-snapshot0)and i
>>>> compare this spark's pom.xml with spark-parent-1.2.0-SNAPSHOT.pom(get from
>>>> http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
>>>> there is only difference is follow:
>>>> in spark-parent-1.2.0-SNAPSHOT.pom
>>>> <artifactId>spark-parent</artifactId>
>>>> <version>1.2.0-SNAPSHOT</version>
>>>> and in v1.2.0-snapshot0
>>>> <artifactId>spark-parent</artifactId>
>>>> <version>1.2.0</version>
>>>> i think there is no essence diff,and i built v1.2.0-snapshot0 and
>>>> deploy it as my spark clusters
>>>> when i run query about join two table ,it still give some error what i
>>>> show u earlier
>>>>
>>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times,
>>>> most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
>>>> java.lang.NullPointerException+details
>>>>
>>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>>> at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>>> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>> at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>> at java.lang.Thread.run(Thread.java:722)
>>>>
>>>> Driver stacktrace:
>>>>
>>>>
>>>>
>>>> i think my spark clusters did't had any problem,but why always give me
>>>> such error
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 2014/12/2 13:39, Xuefu Zhang wrote:
>>>>
>>>> You need to build your spark assembly from spark 1.2 branch. this
>>>> should give your both a spark build as well as spark-assembly jar, which
>>>> you need to copy to Hive lib directory. Snapshot is fine, and spark 1.2
>>>> hasn't been released yet.
>>>>
>>>> --Xuefu
>>>>
>>>> On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1 <yu...@huawei.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> hi.XueFu,
>>>>> thanks a lot for your inforamtion,but as far as i know ,the latest
>>>>> spark version on github is spark-snapshot-1.3,but there is no
>>>>> spark-1.2,only have a branch-1.2 with spark-snapshot-1.2,can u tell me
>>>>> which spark version i should built,and for now,that's
>>>>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error like that
>>>>>
>>>>>
>>>>> On 2014/12/2 11:03, Xuefu Zhang wrote:
>>>>>
>>>>> It seems that wrong class, HiveInputFormat, is loaded. The
>>>>> stacktrace is way off the current Hive code. You need to build Spark 1.2
>>>>> and copy spark-assembly jar to Hive's lib directory and that it.
>>>>>
>>>>> --Xuefu
>>>>>
>>>>> On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yu...@huawei.com> wrote:
>>>>>
>>>>>> hi,i built a hive on spark package and my spark assembly jar is
>>>>>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a query in hive
>>>>>> shell,before execute this query,
>>>>>> i set all the require which hive need with spark.and i execute a
>>>>>> join query :
>>>>>> select distinct st.sno,sname from student st join score sc
>>>>>> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
>>>>>> but it failed,
>>>>>> get follow error in spark webUI:
>>>>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times,
>>>>>> most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
>>>>>> java.lang.NullPointerException+details
>>>>>>
>>>>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>>>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>>>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>>>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>>>>> at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>>>>> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>>>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>>>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>>>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>>>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>>>>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>>>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>>>> at java.lang.Thread.run(Thread.java:722)
>>>>>>
>>>>>> Driver stacktrace:
>>>>>>
>>>>>>
>>>>>> can u give me a help to deal this probelm,and i think my built was
>>>>>> succussed!
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
Re: Job aborted due to stage failure
Posted by Xuefu Zhang <xz...@cloudera.com>.
Hi Yuemeng,
I'm glad that Hive on Spark finally works for you. As you know, this
project is still in development and yet to be released. Thus, please
forgive about the lack of proper documentation. We have a "Get Started"
page that's linked in HIVE-7292. If you can improve the document there, it
would be very helpful for other Hive users.
Thanks,
Xuefu
On Wed, Dec 3, 2014 at 5:42 PM, yuemeng1 <yu...@huawei.com> wrote:
> hi,thanks a lot for your help,with your help ,my hive-on-spark can work
> well now
> it take me long time to install and deploy.here are some advice,i think
> we need to improve the installation documentation, allowing users to use
> the least amount of time to compile and install
> 1)add which spark version we should pick from spark github if we select
> built spark instead of download a spark pre-built,tell them the right built
> commad!(not include Pyarn ,Phive)
> 2)if they get some error during built ,such as [ERRO
> /hive/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobStatus.java:
> [22,24] cannot find symbol
> [ERROR] symbol: class JobExecutionStatus,tell them what they can do?
> for our users,first to use it ,then feel good or bad?
> and if u need,i can add something to start document
>
>
> thanks
> yuemeng
>
>
>
>
>
>
> On 2014/12/3 11:03, Xuefu Zhang wrote:
>
> When you build Spark, remove -Phive as well as -Pyarn. When you run hive
> queries, you may need to run "set spark.home=/path/to/spark/dir";
>
> Thanks,
> Xuefu
>
> On Tue, Dec 2, 2014 at 6:29 PM, yuemeng1 <yu...@huawei.com> wrote:
>
>> hi,XueFu,thanks a lot for your help,now i will provide more detail to
>> reproduce this ssue:
>> 1),i checkout a spark branch from hive github(
>> https://github.com/apache/hive/tree/spark on Nov 29,becasue of for
>> version now it will give something wrong about:Caused by:
>> java.lang.RuntimeException: Unable to instantiate
>> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ),
>> and built command:mvn clean package -DskipTests -Phadoop-2 -Pdist
>> after built i get package from
>> :/home/ym/hive-on-spark/hive1129/hive/packaging/target(apache-hive-0.15.0-SNAPSHOT-bin.tar.gz)
>> 2)i checkout spark from
>> https://github.com/apache/spark/tree/v1.2.0-snapshot0,becasue of spark
>> branch-1.2 is with spark parent(1.2.1-SNAPSHOT),so i chose
>> v1.2.0-snapshot0 and i compare this spark's pom.xml with
>> spark-parent-1.2.0-SNAPSHOT.pom(get from
>> http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
>> there is only difference is spark-parent name,and built command is :
>>
>> mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean package
>>
>> 3)comand i execute in hive-shell:
>> ./hive --auxpath
>> /opt/hispark/spark/assembly/target/scala-2.10/spark-assembly-1.2.0-hadoop2.4.0.jar(copy
>> this jar to hive dir lib already)
>> create table student(sno int,sname string,sage int,ssex string) row
>> format delimited FIELDS TERMINATED BY ',';
>> create table score(sno int,cno int,sage int) row format delimited FIELDS
>> TERMINATED BY ',';
>> load data local inpath
>> '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/student.txt'
>> into table student;
>> load data local inpath
>> '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/score.txt'
>> into table score;
>> set hive.execution.engine=spark;
>> set spark.master=spark://10.175.xxx.xxx:7077;
>> set spark.eventLog.enabled=true;
>> set spark.executor.memory=9086m;
>> set spark.serializer=org.apache.spark.serializer.KryoSerializer;
>> select distinct st.sno,sname from student st join score sc
>> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;(work in mr)
>> 4)
>> studdent.txt file
>> 1,rsh,27,female
>> 2,kupo,28,male
>> 3,astin,29,female
>> 4,beike,30,male
>> 5,aili,31,famle
>>
>> score.txt file
>> 1,10,80
>> 2,11,85
>> 3,12,90
>> 4,13,95
>> 5,14,100
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 2014/12/2 23:28, Xuefu Zhang wrote:
>>
>> Could you provide details on how to reproduce the issue? such as the
>> exact spark branch, the command to build Spark, how you build Hive, and
>> what queries/commands you run.
>>
>> We are running Hive on Spark all the time. Our pre-commit test runs
>> without any issue.
>>
>> Thanks,
>> Xuefu
>>
>> On Tue, Dec 2, 2014 at 4:13 AM, yuemeng1 <yu...@huawei.com> wrote:
>>
>>> hi,XueFu
>>> i checkout a spark branch from sparkgithub(tags:v1.2.0-snapshot0)and i
>>> compare this spark's pom.xml with spark-parent-1.2.0-SNAPSHOT.pom(get from
>>> http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
>>> there is only difference is follow:
>>> in spark-parent-1.2.0-SNAPSHOT.pom
>>> <artifactId>spark-parent</artifactId>
>>> <version>1.2.0-SNAPSHOT</version>
>>> and in v1.2.0-snapshot0
>>> <artifactId>spark-parent</artifactId>
>>> <version>1.2.0</version>
>>> i think there is no essence diff,and i built v1.2.0-snapshot0 and deploy
>>> it as my spark clusters
>>> when i run query about join two table ,it still give some error what i
>>> show u earlier
>>>
>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times,
>>> most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
>>> java.lang.NullPointerException+details
>>>
>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>> at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>> at java.lang.Thread.run(Thread.java:722)
>>>
>>> Driver stacktrace:
>>>
>>>
>>>
>>> i think my spark clusters did't had any problem,but why always give me
>>> such error
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 2014/12/2 13:39, Xuefu Zhang wrote:
>>>
>>> You need to build your spark assembly from spark 1.2 branch. this
>>> should give your both a spark build as well as spark-assembly jar, which
>>> you need to copy to Hive lib directory. Snapshot is fine, and spark 1.2
>>> hasn't been released yet.
>>>
>>> --Xuefu
>>>
>>> On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1 <yu...@huawei.com> wrote:
>>>
>>>>
>>>>
>>>> hi.XueFu,
>>>> thanks a lot for your inforamtion,but as far as i know ,the latest
>>>> spark version on github is spark-snapshot-1.3,but there is no
>>>> spark-1.2,only have a branch-1.2 with spark-snapshot-1.2,can u tell me
>>>> which spark version i should built,and for now,that's
>>>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error like that
>>>>
>>>>
>>>> On 2014/12/2 11:03, Xuefu Zhang wrote:
>>>>
>>>> It seems that wrong class, HiveInputFormat, is loaded. The stacktrace
>>>> is way off the current Hive code. You need to build Spark 1.2 and copy
>>>> spark-assembly jar to Hive's lib directory and that it.
>>>>
>>>> --Xuefu
>>>>
>>>> On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yu...@huawei.com> wrote:
>>>>
>>>>> hi,i built a hive on spark package and my spark assembly jar is
>>>>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a query in hive
>>>>> shell,before execute this query,
>>>>> i set all the require which hive need with spark.and i execute a
>>>>> join query :
>>>>> select distinct st.sno,sname from student st join score sc
>>>>> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
>>>>> but it failed,
>>>>> get follow error in spark webUI:
>>>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times,
>>>>> most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
>>>>> java.lang.NullPointerException+details
>>>>>
>>>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>>>> at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>>>> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>>>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>>> at java.lang.Thread.run(Thread.java:722)
>>>>>
>>>>> Driver stacktrace:
>>>>>
>>>>>
>>>>> can u give me a help to deal this probelm,and i think my built was
>>>>> succussed!
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
Re: Job aborted due to stage failure
Posted by yuemeng1 <yu...@huawei.com>.
hi,thanks a lot for your help,with your help ,my hive-on-spark can work
well now
it take me long time to install and deploy.here are some advice,i think
we need to improve the installation documentation, allowing users to use
the least amount of time to compile and install
1)add which spark version we should pick from spark github if we select
built spark instead of download a spark pre-built,tell them the right
built commad!(not include Pyarn ,Phive)
2)if they get some error during built ,such as
[ERRO/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobStatus.java:[22,24]cannot
find symbol
[ERROR]symbol: class JobExecutionStatus,tell them what they can do?
for our users,first to use it ,then feel good or bad?
and if u need,i can add something to start document
thanks
yuemeng
On 2014/12/3 11:03, Xuefu Zhang wrote:
> When you build Spark, remove -Phive as well as -Pyarn. When you run
> hive queries, you may need to run "set spark.home=/path/to/spark/dir";
>
> Thanks,
> Xuefu
>
> On Tue, Dec 2, 2014 at 6:29 PM, yuemeng1 <yuemeng1@huawei.com
> <ma...@huawei.com>> wrote:
>
> hi,XueFu,thanks a lot for your help,now i will provide more detail
> to reproduce this ssue:
> 1),i checkout a spark branch from hive
> github(https://github.com/apache/hive/tree/spark on Nov 29,becasue
> of for version now it will give something wrong about:Caused by:
> java.lang.RuntimeException: Unable to instantiate
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ),
> and built command:mvn clean package -DskipTests -Phadoop-2 -Pdist
> after built i get package from
> :/home/ym/hive-on-spark/hive1129/hive/packaging/target(apache-hive-0.15.0-SNAPSHOT-bin.tar.gz)
> 2)i checkout spark from
> https://github.com/apache/spark/tree/v1.2.0-snapshot0,becasue of
> spark branch-1.2 is with spark parent(1.2.1-SNAPSHOT),so i chose
> v1.2.0-snapshot0 and i compare this spark's pom.xml with
> spark-parent-1.2.0-SNAPSHOT.pom(get from
> http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
> there is only difference is spark-parent name,and built command is :
>
> |mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean package|
>
> 3)comand i execute in hive-shell:
> ./hive --auxpath
> /opt/hispark/spark/assembly/target/scala-2.10/spark-assembly-1.2.0-hadoop2.4.0.jar(copy
> this jar to hive dir lib already)
> create table student(sno int,sname string,sage int,ssex string)
> row format delimited FIELDS TERMINATED BY ',';
> create table score(sno int,cno int,sage int) row format delimited
> FIELDS TERMINATED BY ',';
> load data local inpath
> '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/student.txt'
> into table student;
> load data local inpath
> '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/score.txt'
> into table score;
> set hive.execution.engine=spark;
> set spark.master=spark://10.175.xxx.xxx:7077;
> set spark.eventLog.enabled=true;
> set spark.executor.memory=9086m;
> set spark.serializer=org.apache.spark.serializer.KryoSerializer;
> select distinct st.sno,sname from student st join score sc
> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;(work
> in mr)
> 4)
> studdent.txt file
> 1,rsh,27,female
> 2,kupo,28,male
> 3,astin,29,female
> 4,beike,30,male
> 5,aili,31,famle
>
> score.txt file
> 1,10,80
> 2,11,85
> 3,12,90
> 4,13,95
> 5,14,100
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On 2014/12/2 23:28, Xuefu Zhang wrote:
>> Could you provide details on how to reproduce the issue? such as
>> the exact spark branch, the command to build Spark, how you build
>> Hive, and what queries/commands you run.
>>
>> We are running Hive on Spark all the time. Our pre-commit test
>> runs without any issue.
>>
>> Thanks,
>> Xuefu
>>
>> On Tue, Dec 2, 2014 at 4:13 AM, yuemeng1 <yuemeng1@huawei.com
>> <ma...@huawei.com>> wrote:
>>
>> hi,XueFu
>> i checkout a spark branch from
>> sparkgithub(tags:v1.2.0-snapshot0)and i compare this spark's
>> pom.xml with spark-parent-1.2.0-SNAPSHOT.pom(get from
>> http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
>> there is only difference is follow:
>> in spark-parent-1.2.0-SNAPSHOT.pom
>> <artifactId>spark-parent</artifactId>
>> <version>1.2.0-SNAPSHOT</version>
>> and in v1.2.0-snapshot0
>> <artifactId>spark-parent</artifactId>
>> <version>1.2.0</version>
>> i think there is no essence diff,and i built v1.2.0-snapshot0
>> and deploy it as my spark clusters
>> when i run query about join two table ,it still give some
>> error what i show u earlier
>>
>> Job aborted due to stage failure: Task 0 in stage 1.0 failed
>> 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID
>> 7, datasight18): java.lang.NullPointerException+details
>>
>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>> at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> at org.apache.spark.scheduler.Task.run(Task.scala:56)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> at java.lang.Thread.run(Thread.java:722)
>>
>> Driver stacktrace:
>>
>>
>>
>> i think my spark clusters did't had any problem,but why
>> always give me such error
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 2014/12/2 13:39, Xuefu Zhang wrote:
>>> You need to build your spark assembly from spark 1.2 branch.
>>> this should give your both a spark build as well as
>>> spark-assembly jar, which you need to copy to Hive lib
>>> directory. Snapshot is fine, and spark 1.2 hasn't been
>>> released yet.
>>>
>>> --Xuefu
>>>
>>> On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1
>>> <yuemeng1@huawei.com <ma...@huawei.com>> wrote:
>>>
>>>
>>>
>>> hi.XueFu,
>>> thanks a lot for your inforamtion,but as far as i know
>>> ,the latest spark version on github is
>>> spark-snapshot-1.3,but there is no spark-1.2,only have a
>>> branch-1.2 with spark-snapshot-1.2,can u tell me which
>>> spark version i should built,and for now,that's
>>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce
>>> error like that
>>>
>>>
>>> On 2014/12/2 11:03, Xuefu Zhang wrote:
>>>> It seems that wrong class, HiveInputFormat, is loaded.
>>>> The stacktrace is way off the current Hive code. You
>>>> need to build Spark 1.2 and copy spark-assembly jar to
>>>> Hive's lib directory and that it.
>>>>
>>>> --Xuefu
>>>>
>>>> On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1
>>>> <yuemeng1@huawei.com <ma...@huawei.com>> wrote:
>>>>
>>>> hi,i built a hive on spark package and my spark
>>>> assembly jar is
>>>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when
>>>> i run a query in hive shell,before execute this query,
>>>> i set all the require which hive need with
>>>> spark.and i execute a join query :
>>>> select distinct st.sno,sname from student st join
>>>> score sc on(st.sno=sc.sno) where sc.cno
>>>> IN(11,12,13) and st.sage > 28;
>>>> but it failed,
>>>> get follow error in spark webUI:
>>>> Job aborted due to stage failure: Task 0 in stage
>>>> 1.0 failed 4 times, most recent failure: Lost task
>>>> 0.3 in stage 1.0 (TID 7, datasight18):
>>>> java.lang.NullPointerException+details
>>>>
>>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>>> at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>>> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>> at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>> at java.lang.Thread.run(Thread.java:722)
>>>>
>>>> Driver stacktrace:
>>>>
>>>> can u give me a help to deal this probelm,and i
>>>> think my built was succussed!
>>>>
>>>>
>>>
>>>
>>
>>
>
>
Re: Job aborted due to stage failure
Posted by Xuefu Zhang <xz...@cloudera.com>.
When you build Spark, remove -Phive as well as -Pyarn. When you run hive
queries, you may need to run "set spark.home=/path/to/spark/dir";
Thanks,
Xuefu
On Tue, Dec 2, 2014 at 6:29 PM, yuemeng1 <yu...@huawei.com> wrote:
> hi,XueFu,thanks a lot for your help,now i will provide more detail to
> reproduce this ssue:
> 1),i checkout a spark branch from hive github(
> https://github.com/apache/hive/tree/spark on Nov 29,becasue of for
> version now it will give something wrong about:Caused by:
> java.lang.RuntimeException: Unable to instantiate
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ),
> and built command:mvn clean package -DskipTests -Phadoop-2 -Pdist
> after built i get package from
> :/home/ym/hive-on-spark/hive1129/hive/packaging/target(apache-hive-0.15.0-SNAPSHOT-bin.tar.gz)
> 2)i checkout spark from
> https://github.com/apache/spark/tree/v1.2.0-snapshot0,becasue of spark
> branch-1.2 is with spark parent(1.2.1-SNAPSHOT),so i chose
> v1.2.0-snapshot0 and i compare this spark's pom.xml with
> spark-parent-1.2.0-SNAPSHOT.pom(get from
> http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
> there is only difference is spark-parent name,and built command is :
>
> mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean package
>
> 3)comand i execute in hive-shell:
> ./hive --auxpath
> /opt/hispark/spark/assembly/target/scala-2.10/spark-assembly-1.2.0-hadoop2.4.0.jar(copy
> this jar to hive dir lib already)
> create table student(sno int,sname string,sage int,ssex string) row format
> delimited FIELDS TERMINATED BY ',';
> create table score(sno int,cno int,sage int) row format delimited FIELDS
> TERMINATED BY ',';
> load data local inpath
> '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/student.txt'
> into table student;
> load data local inpath
> '/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/score.txt'
> into table score;
> set hive.execution.engine=spark;
> set spark.master=spark://10.175.xxx.xxx:7077;
> set spark.eventLog.enabled=true;
> set spark.executor.memory=9086m;
> set spark.serializer=org.apache.spark.serializer.KryoSerializer;
> select distinct st.sno,sname from student st join score sc
> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;(work in mr)
> 4)
> studdent.txt file
> 1,rsh,27,female
> 2,kupo,28,male
> 3,astin,29,female
> 4,beike,30,male
> 5,aili,31,famle
>
> score.txt file
> 1,10,80
> 2,11,85
> 3,12,90
> 4,13,95
> 5,14,100
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On 2014/12/2 23:28, Xuefu Zhang wrote:
>
> Could you provide details on how to reproduce the issue? such as the
> exact spark branch, the command to build Spark, how you build Hive, and
> what queries/commands you run.
>
> We are running Hive on Spark all the time. Our pre-commit test runs
> without any issue.
>
> Thanks,
> Xuefu
>
> On Tue, Dec 2, 2014 at 4:13 AM, yuemeng1 <yu...@huawei.com> wrote:
>
>> hi,XueFu
>> i checkout a spark branch from sparkgithub(tags:v1.2.0-snapshot0)and i
>> compare this spark's pom.xml with spark-parent-1.2.0-SNAPSHOT.pom(get from
>> http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
>> there is only difference is follow:
>> in spark-parent-1.2.0-SNAPSHOT.pom
>> <artifactId>spark-parent</artifactId>
>> <version>1.2.0-SNAPSHOT</version>
>> and in v1.2.0-snapshot0
>> <artifactId>spark-parent</artifactId>
>> <version>1.2.0</version>
>> i think there is no essence diff,and i built v1.2.0-snapshot0 and deploy
>> it as my spark clusters
>> when i run query about join two table ,it still give some error what i
>> show u earlier
>>
>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times,
>> most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
>> java.lang.NullPointerException+details
>>
>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>> at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> at org.apache.spark.scheduler.Task.run(Task.scala:56)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> at java.lang.Thread.run(Thread.java:722)
>>
>> Driver stacktrace:
>>
>>
>>
>> i think my spark clusters did't had any problem,but why always give me
>> such error
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 2014/12/2 13:39, Xuefu Zhang wrote:
>>
>> You need to build your spark assembly from spark 1.2 branch. this
>> should give your both a spark build as well as spark-assembly jar, which
>> you need to copy to Hive lib directory. Snapshot is fine, and spark 1.2
>> hasn't been released yet.
>>
>> --Xuefu
>>
>> On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1 <yu...@huawei.com> wrote:
>>
>>>
>>>
>>> hi.XueFu,
>>> thanks a lot for your inforamtion,but as far as i know ,the latest spark
>>> version on github is spark-snapshot-1.3,but there is no spark-1.2,only have
>>> a branch-1.2 with spark-snapshot-1.2,can u tell me which spark version i
>>> should built,and for now,that's
>>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error like that
>>>
>>>
>>> On 2014/12/2 11:03, Xuefu Zhang wrote:
>>>
>>> It seems that wrong class, HiveInputFormat, is loaded. The stacktrace
>>> is way off the current Hive code. You need to build Spark 1.2 and copy
>>> spark-assembly jar to Hive's lib directory and that it.
>>>
>>> --Xuefu
>>>
>>> On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yu...@huawei.com> wrote:
>>>
>>>> hi,i built a hive on spark package and my spark assembly jar is
>>>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a query in hive
>>>> shell,before execute this query,
>>>> i set all the require which hive need with spark.and i execute a join
>>>> query :
>>>> select distinct st.sno,sname from student st join score sc
>>>> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
>>>> but it failed,
>>>> get follow error in spark webUI:
>>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times,
>>>> most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
>>>> java.lang.NullPointerException+details
>>>>
>>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>>> at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>>> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>> at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>> at java.lang.Thread.run(Thread.java:722)
>>>>
>>>> Driver stacktrace:
>>>>
>>>>
>>>> can u give me a help to deal this probelm,and i think my built was
>>>> succussed!
>>>>
>>>
>>>
>>>
>>
>>
>
>
Re: Job aborted due to stage failure
Posted by yuemeng1 <yu...@huawei.com>.
hi,XueFu,thanks a lot for your help,now i will provide more detail to
reproduce this ssue:
1),i checkout a spark branch from hive
github(https://github.com/apache/hive/tree/spark on Nov 29,becasue of
for version now it will give something wrong about:Caused by:
java.lang.RuntimeException: Unable to instantiate
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ),
and built command:mvn clean package -DskipTests -Phadoop-2 -Pdist
after built i get package from
:/home/ym/hive-on-spark/hive1129/hive/packaging/target(apache-hive-0.15.0-SNAPSHOT-bin.tar.gz)
2)i checkout spark from
https://github.com/apache/spark/tree/v1.2.0-snapshot0,becasue of spark
branch-1.2 is with spark parent(1.2.1-SNAPSHOT),so i chose
v1.2.0-snapshot0 and i compare this spark's pom.xml with
spark-parent-1.2.0-SNAPSHOT.pom(get from
http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
there is only difference is spark-parent name,and built command is :
|mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean package|
3)comand i execute in hive-shell:
./hive --auxpath
/opt/hispark/spark/assembly/target/scala-2.10/spark-assembly-1.2.0-hadoop2.4.0.jar(copy
this jar to hive dir lib already)
create table student(sno int,sname string,sage int,ssex string) row
format delimited FIELDS TERMINATED BY ',';
create table score(sno int,cno int,sage int) row format delimited FIELDS
TERMINATED BY ',';
load data local inpath
'/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/student.txt'
into table student;
load data local inpath
'/home/hive-on-spark/temp/spark-1.2.0/examples/src/main/resources/score.txt'
into table score;
set hive.execution.engine=spark;
set spark.master=spark://10.175.xxx.xxx:7077;
set spark.eventLog.enabled=true;
set spark.executor.memory=9086m;
set spark.serializer=org.apache.spark.serializer.KryoSerializer;
select distinct st.sno,sname from student st join score sc
on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;(work in mr)
4)
studdent.txt file
1,rsh,27,female
2,kupo,28,male
3,astin,29,female
4,beike,30,male
5,aili,31,famle
score.txt file
1,10,80
2,11,85
3,12,90
4,13,95
5,14,100
On 2014/12/2 23:28, Xuefu Zhang wrote:
> Could you provide details on how to reproduce the issue? such as the
> exact spark branch, the command to build Spark, how you build Hive,
> and what queries/commands you run.
>
> We are running Hive on Spark all the time. Our pre-commit test runs
> without any issue.
>
> Thanks,
> Xuefu
>
> On Tue, Dec 2, 2014 at 4:13 AM, yuemeng1 <yuemeng1@huawei.com
> <ma...@huawei.com>> wrote:
>
> hi,XueFu
> i checkout a spark branch from
> sparkgithub(tags:v1.2.0-snapshot0)and i compare this spark's
> pom.xml with spark-parent-1.2.0-SNAPSHOT.pom(get from
> http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
> there is only difference is follow:
> in spark-parent-1.2.0-SNAPSHOT.pom
> <artifactId>spark-parent</artifactId>
> <version>1.2.0-SNAPSHOT</version>
> and in v1.2.0-snapshot0
> <artifactId>spark-parent</artifactId>
> <version>1.2.0</version>
> i think there is no essence diff,and i built v1.2.0-snapshot0 and
> deploy it as my spark clusters
> when i run query about join two table ,it still give some error
> what i show u earlier
>
> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4
> times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7,
> datasight18): java.lang.NullPointerException+details
>
> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
> at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
> at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:56)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
>
> Driver stacktrace:
>
>
>
> i think my spark clusters did't had any problem,but why always
> give me such error
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On 2014/12/2 13:39, Xuefu Zhang wrote:
>> You need to build your spark assembly from spark 1.2 branch. this
>> should give your both a spark build as well as spark-assembly
>> jar, which you need to copy to Hive lib directory. Snapshot is
>> fine, and spark 1.2 hasn't been released yet.
>>
>> --Xuefu
>>
>> On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1 <yuemeng1@huawei.com
>> <ma...@huawei.com>> wrote:
>>
>>
>>
>> hi.XueFu,
>> thanks a lot for your inforamtion,but as far as i know ,the
>> latest spark version on github is spark-snapshot-1.3,but
>> there is no spark-1.2,only have a branch-1.2 with
>> spark-snapshot-1.2,can u tell me which spark version i should
>> built,and for now,that's
>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error
>> like that
>>
>>
>> On 2014/12/2 11:03, Xuefu Zhang wrote:
>>> It seems that wrong class, HiveInputFormat, is loaded. The
>>> stacktrace is way off the current Hive code. You need to
>>> build Spark 1.2 and copy spark-assembly jar to Hive's lib
>>> directory and that it.
>>>
>>> --Xuefu
>>>
>>> On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1
>>> <yuemeng1@huawei.com <ma...@huawei.com>> wrote:
>>>
>>> hi,i built a hive on spark package and my spark assembly
>>> jar is
>>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run
>>> a query in hive shell,before execute this query,
>>> i set all the require which hive need with spark.and i
>>> execute a join query :
>>> select distinct st.sno,sname from student st join score
>>> sc on(st.sno=sc.sno) where sc.cno IN(11,12,13) and
>>> st.sage > 28;
>>> but it failed,
>>> get follow error in spark webUI:
>>> Job aborted due to stage failure: Task 0 in stage 1.0
>>> failed 4 times, most recent failure: Lost task 0.3 in
>>> stage 1.0 (TID 7, datasight18):
>>> java.lang.NullPointerException+details
>>>
>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>> at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>> at java.lang.Thread.run(Thread.java:722)
>>>
>>> Driver stacktrace:
>>>
>>> can u give me a help to deal this probelm,and i think my
>>> built was succussed!
>>>
>>>
>>
>>
>
>
Re: Job aborted due to stage failure
Posted by Xuefu Zhang <xz...@cloudera.com>.
Could you provide details on how to reproduce the issue? such as the exact
spark branch, the command to build Spark, how you build Hive, and what
queries/commands you run.
We are running Hive on Spark all the time. Our pre-commit test runs without
any issue.
Thanks,
Xuefu
On Tue, Dec 2, 2014 at 4:13 AM, yuemeng1 <yu...@huawei.com> wrote:
> hi,XueFu
> i checkout a spark branch from sparkgithub(tags:v1.2.0-snapshot0)and i
> compare this spark's pom.xml with spark-parent-1.2.0-SNAPSHOT.pom(get from
> http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
> there is only difference is follow:
> in spark-parent-1.2.0-SNAPSHOT.pom
> <artifactId>spark-parent</artifactId>
> <version>1.2.0-SNAPSHOT</version>
> and in v1.2.0-snapshot0
> <artifactId>spark-parent</artifactId>
> <version>1.2.0</version>
> i think there is no essence diff,and i built v1.2.0-snapshot0 and deploy
> it as my spark clusters
> when i run query about join two table ,it still give some error what i
> show u earlier
>
> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most
> recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
> java.lang.NullPointerException+details
>
> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
> at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
> at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:56)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
>
> Driver stacktrace:
>
>
>
> i think my spark clusters did't had any problem,but why always give me
> such error
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On 2014/12/2 13:39, Xuefu Zhang wrote:
>
> You need to build your spark assembly from spark 1.2 branch. this should
> give your both a spark build as well as spark-assembly jar, which you need
> to copy to Hive lib directory. Snapshot is fine, and spark 1.2 hasn't been
> released yet.
>
> --Xuefu
>
> On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1 <yu...@huawei.com> wrote:
>
>>
>>
>> hi.XueFu,
>> thanks a lot for your inforamtion,but as far as i know ,the latest spark
>> version on github is spark-snapshot-1.3,but there is no spark-1.2,only have
>> a branch-1.2 with spark-snapshot-1.2,can u tell me which spark version i
>> should built,and for now,that's
>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error like that
>>
>>
>> On 2014/12/2 11:03, Xuefu Zhang wrote:
>>
>> It seems that wrong class, HiveInputFormat, is loaded. The stacktrace
>> is way off the current Hive code. You need to build Spark 1.2 and copy
>> spark-assembly jar to Hive's lib directory and that it.
>>
>> --Xuefu
>>
>> On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yu...@huawei.com> wrote:
>>
>>> hi,i built a hive on spark package and my spark assembly jar is
>>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a query in hive
>>> shell,before execute this query,
>>> i set all the require which hive need with spark.and i execute a join
>>> query :
>>> select distinct st.sno,sname from student st join score sc
>>> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
>>> but it failed,
>>> get follow error in spark webUI:
>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times,
>>> most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
>>> java.lang.NullPointerException+details
>>>
>>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>>> at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>>> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>> at java.lang.Thread.run(Thread.java:722)
>>>
>>> Driver stacktrace:
>>>
>>>
>>> can u give me a help to deal this probelm,and i think my built was
>>> succussed!
>>>
>>
>>
>>
>
>
RE: Job aborted due to stage failure
Posted by Mike Roberts <mi...@spyfu.com>.
unsubscribe
From: yuemeng1 [mailto:yuemeng1@huawei.com]
Sent: Tuesday, December 2, 2014 5:14 AM
To: user@hive.apache.org
Subject: Re: Job aborted due to stage failure
hi,XueFu
i checkout a spark branch from sparkgithub(tags:v1.2.0-snapshot0)and i compare this spark's pom.xml with spark-parent-1.2.0-SNAPSHOT.pom(get from http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and there is only difference is follow:
in spark-parent-1.2.0-SNAPSHOT.pom
<artifactId>spark-parent</artifactId>
<version>1.2.0-SNAPSHOT</version>
and in v1.2.0-snapshot0
<artifactId>spark-parent</artifactId>
<version>1.2.0</version>
i think there is no essence diff,and i built v1.2.0-snapshot0 and deploy it as my spark clusters
when i run query about join two table ,it still give some error what i show u earlier
Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException+details
Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Driver stacktrace:
i think my spark clusters did't had any problem,but why always give me such error
On 2014/12/2 13:39, Xuefu Zhang wrote:
You need to build your spark assembly from spark 1.2 branch. this should give your both a spark build as well as spark-assembly jar, which you need to copy to Hive lib directory. Snapshot is fine, and spark 1.2 hasn't been released yet.
--Xuefu
On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1 <yu...@huawei.com>> wrote:
hi.XueFu,
thanks a lot for your inforamtion,but as far as i know ,the latest spark version on github is spark-snapshot-1.3,but there is no spark-1.2,only have a branch-1.2 with spark-snapshot-1.2,can u tell me which spark version i should built,and for now,that's spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error like that
On 2014/12/2 11:03, Xuefu Zhang wrote:
It seems that wrong class, HiveInputFormat, is loaded. The stacktrace is way off the current Hive code. You need to build Spark 1.2 and copy spark-assembly jar to Hive's lib directory and that it.
--Xuefu
On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yu...@huawei.com>> wrote:
hi,i built a hive on spark package and my spark assembly jar is spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a query in hive shell,before execute this query,
i set all the require which hive need with spark.and i execute a join query :
select distinct st.sno,sname from student st join score sc on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
but it failed,
get follow error in spark webUI:
Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException+details
Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Driver stacktrace:
can u give me a help to deal this probelm,and i think my built was succussed!
Re: Job aborted due to stage failure
Posted by yuemeng1 <yu...@huawei.com>.
hi,XueFu
i checkout a spark branch from sparkgithub(tags:v1.2.0-snapshot0)and i
compare this spark's pom.xml with spark-parent-1.2.0-SNAPSHOT.pom(get
from
http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark_2.10-1.2-SNAPSHOT/org/apache/spark/spark-parent/1.2.0-SNAPSHOT/),and
there is only difference is follow:
in spark-parent-1.2.0-SNAPSHOT.pom
<artifactId>spark-parent</artifactId>
<version>1.2.0-SNAPSHOT</version>
and in v1.2.0-snapshot0
<artifactId>spark-parent</artifactId>
<version>1.2.0</version>
i think there is no essence diff,and i built v1.2.0-snapshot0 and deploy
it as my spark clusters
when i run query about join two table ,it still give some error what i
show u earlier
Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times,
most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
java.lang.NullPointerException+details
Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Driver stacktrace:
i think my spark clusters did't had any problem,but why always give me
such error
On 2014/12/2 13:39, Xuefu Zhang wrote:
> You need to build your spark assembly from spark 1.2 branch. this
> should give your both a spark build as well as spark-assembly jar,
> which you need to copy to Hive lib directory. Snapshot is fine, and
> spark 1.2 hasn't been released yet.
>
> --Xuefu
>
> On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1 <yuemeng1@huawei.com
> <ma...@huawei.com>> wrote:
>
>
>
> hi.XueFu,
> thanks a lot for your inforamtion,but as far as i know ,the latest
> spark version on github is spark-snapshot-1.3,but there is no
> spark-1.2,only have a branch-1.2 with spark-snapshot-1.2,can u
> tell me which spark version i should built,and for now,that's
> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error like that
>
>
> On 2014/12/2 11:03, Xuefu Zhang wrote:
>> It seems that wrong class, HiveInputFormat, is loaded. The
>> stacktrace is way off the current Hive code. You need to build
>> Spark 1.2 and copy spark-assembly jar to Hive's lib directory and
>> that it.
>>
>> --Xuefu
>>
>> On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yuemeng1@huawei.com
>> <ma...@huawei.com>> wrote:
>>
>> hi,i built a hive on spark package and my spark assembly jar
>> is spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a
>> query in hive shell,before execute this query,
>> i set all the require which hive need with spark.and i
>> execute a join query :
>> select distinct st.sno,sname from student st join score sc
>> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
>> but it failed,
>> get follow error in spark webUI:
>> Job aborted due to stage failure: Task 0 in stage 1.0 failed
>> 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID
>> 7, datasight18): java.lang.NullPointerException+details
>>
>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>> at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> at org.apache.spark.scheduler.Task.run(Task.scala:56)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> at java.lang.Thread.run(Thread.java:722)
>>
>> Driver stacktrace:
>>
>> can u give me a help to deal this probelm,and i think my
>> built was succussed!
>>
>>
>
>
Re: Job aborted due to stage failure
Posted by yuemeng1 <yu...@huawei.com>.
hi,i checkout a spark 1.2 branch from spark github,and built,then copy
spark assembly jar into Hive lib directory,but when i run this qeury ,it
still give me this error.
i am very confused,how can i let hive on spark work!
On 2014/12/2 13:39, Xuefu Zhang wrote:
> You need to build your spark assembly from spark 1.2 branch. this
> should give your both a spark build as well as spark-assembly jar,
> which you need to copy to Hive lib directory. Snapshot is fine, and
> spark 1.2 hasn't been released yet.
>
> --Xuefu
>
> On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1 <yuemeng1@huawei.com
> <ma...@huawei.com>> wrote:
>
>
>
> hi.XueFu,
> thanks a lot for your inforamtion,but as far as i know ,the latest
> spark version on github is spark-snapshot-1.3,but there is no
> spark-1.2,only have a branch-1.2 with spark-snapshot-1.2,can u
> tell me which spark version i should built,and for now,that's
> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error like that
>
>
> On 2014/12/2 11:03, Xuefu Zhang wrote:
>> It seems that wrong class, HiveInputFormat, is loaded. The
>> stacktrace is way off the current Hive code. You need to build
>> Spark 1.2 and copy spark-assembly jar to Hive's lib directory and
>> that it.
>>
>> --Xuefu
>>
>> On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yuemeng1@huawei.com
>> <ma...@huawei.com>> wrote:
>>
>> hi,i built a hive on spark package and my spark assembly jar
>> is spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a
>> query in hive shell,before execute this query,
>> i set all the require which hive need with spark.and i
>> execute a join query :
>> select distinct st.sno,sname from student st join score sc
>> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
>> but it failed,
>> get follow error in spark webUI:
>> Job aborted due to stage failure: Task 0 in stage 1.0 failed
>> 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID
>> 7, datasight18): java.lang.NullPointerException+details
>>
>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>> at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> at org.apache.spark.scheduler.Task.run(Task.scala:56)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> at java.lang.Thread.run(Thread.java:722)
>>
>> Driver stacktrace:
>>
>> can u give me a help to deal this probelm,and i think my
>> built was succussed!
>>
>>
>
>
Re: Job aborted due to stage failure
Posted by Xuefu Zhang <xz...@cloudera.com>.
You need to build your spark assembly from spark 1.2 branch. this should
give your both a spark build as well as spark-assembly jar, which you need
to copy to Hive lib directory. Snapshot is fine, and spark 1.2 hasn't been
released yet.
--Xuefu
On Mon, Dec 1, 2014 at 7:41 PM, yuemeng1 <yu...@huawei.com> wrote:
>
>
> hi.XueFu,
> thanks a lot for your inforamtion,but as far as i know ,the latest spark
> version on github is spark-snapshot-1.3,but there is no spark-1.2,only have
> a branch-1.2 with spark-snapshot-1.2,can u tell me which spark version i
> should built,and for now,that's
> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error like that
>
>
> On 2014/12/2 11:03, Xuefu Zhang wrote:
>
> It seems that wrong class, HiveInputFormat, is loaded. The stacktrace is
> way off the current Hive code. You need to build Spark 1.2 and copy
> spark-assembly jar to Hive's lib directory and that it.
>
> --Xuefu
>
> On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yu...@huawei.com> wrote:
>
>> hi,i built a hive on spark package and my spark assembly jar is
>> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a query in hive
>> shell,before execute this query,
>> i set all the require which hive need with spark.and i execute a join
>> query :
>> select distinct st.sno,sname from student st join score sc
>> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
>> but it failed,
>> get follow error in spark webUI:
>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times,
>> most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
>> java.lang.NullPointerException+details
>>
>> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>> at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> at org.apache.spark.scheduler.Task.run(Task.scala:56)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> at java.lang.Thread.run(Thread.java:722)
>>
>> Driver stacktrace:
>>
>>
>> can u give me a help to deal this probelm,and i think my built was
>> succussed!
>>
>
>
>
Re: Job aborted due to stage failure
Posted by yuemeng1 <yu...@huawei.com>.
hi.XueFu,
thanks a lot for your inforamtion,but as far as i know ,the latest spark
version on github is spark-snapshot-1.3,but there is no spark-1.2,only
have a branch-1.2 with spark-snapshot-1.2,can u tell me which spark
version i should built,and for now,that's
spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar produce error like that
On 2014/12/2 11:03, Xuefu Zhang wrote:
> It seems that wrong class, HiveInputFormat, is loaded. The stacktrace
> is way off the current Hive code. You need to build Spark 1.2 and copy
> spark-assembly jar to Hive's lib directory and that it.
>
> --Xuefu
>
> On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yuemeng1@huawei.com
> <ma...@huawei.com>> wrote:
>
> hi,i built a hive on spark package and my spark assembly jar is
> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a query
> in hive shell,before execute this query,
> i set all the require which hive need with spark.and i execute a
> join query :
> select distinct st.sno,sname from student st join score sc
> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
> but it failed,
> get follow error in spark webUI:
> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4
> times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7,
> datasight18): java.lang.NullPointerException+details
>
> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
> at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
> at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:56)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
>
> Driver stacktrace:
>
> can u give me a help to deal this probelm,and i think my built was
> succussed!
>
>
Re: Job aborted due to stage failure
Posted by Xuefu Zhang <xz...@cloudera.com>.
It seems that wrong class, HiveInputFormat, is loaded. The stacktrace is
way off the current Hive code. You need to build Spark 1.2 and copy
spark-assembly jar to Hive's lib directory and that it.
--Xuefu
On Mon, Dec 1, 2014 at 6:22 PM, yuemeng1 <yu...@huawei.com> wrote:
> hi,i built a hive on spark package and my spark assembly jar is
> spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,when i run a query in hive
> shell,before execute this query,
> i set all the require which hive need with spark.and i execute a join
> query :
> select distinct st.sno,sname from student st join score sc
> on(st.sno=sc.sno) where sc.cno IN(11,12,13) and st.sage > 28;
> but it failed,
> get follow error in spark webUI:
> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most
> recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18):
> java.lang.NullPointerException+details
>
> Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, datasight18): java.lang.NullPointerException
> at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
> at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
> at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
> at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:233)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210)
> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:56)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
>
> Driver stacktrace:
>
>
> can u give me a help to deal this probelm,and i think my built was
> succussed!
>