You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Ophir Etzion <op...@foursquare.com> on 2015/12/18 21:45:54 UTC
hive on spark
During spark-submit when running hive on spark I get:
Exception in thread "main" java.util.ServiceConfigurationError:
org.apache.hadoop.fs.FileSystem: Provider
org.apache.hadoop.hdfs.HftpFileSystem could not be instantiated
Caused by: java.lang.IllegalAccessError: tried to access method
org.apache.hadoop.fs.DelegationTokenRenewer.<init>(Ljava/lang/Class;)V
from class org.apache.hadoop.hdfs.HftpFileSystem
I managed to make hive on spark work on a staging cluster I have and now
I'm trying to do the same on a production cluster and this happened. Both
are cdh5.4.3.
I read that this is due to something not being compiled against the
correct hadoop version.
my main question what is the binary/jar/file that can cause this?
I tried replacing the binaries and jars to the ones used by the
staging cluster (that hive on spark worked on) and it didn't help.
Thank you for anyone reading this, and thank you for any direction on
where to look.
Ophir
RE: hive on spark
Posted by Mich Talebzadeh <mi...@peridale.co.uk>.
Hi,
Your statement
“I read that this is due to something not being compiled against the correct hadoop version.
my main question what is the binary/jar/file that can cause this?”
I believe this is the file in $HIVE_HOME/lib called spark-assembly-1.3.1-hadoop2.4.0.jar which you need to build it from the source code for Spark 1.3.1 excluding Hive jars
Something like below
./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.4,parquet-provided"
Then extract the above file and copy over to $HIVE_HOME/lib
Example
hive> set spark.home=/usr/lib/spark-1.3.1-bin-hadoop2.6; -- This is the precompiled binary installation fot Spark 1.3.1
hive> set hive.execution.engine=spark;
hive> set spark.master=yarn-client;
hive> select count(1) from t;
Query ID = hduser_20151218212056_4e1faef5-93bd-4e18-9375-659220d67530
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Spark Job = 35c78523-4a36-45e5-95f1-01052985ff4b
Query Hive on Spark job[0] stages:
0
1
Status: Running (Hive on Spark job[0])
Job Progress Format
CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]
2015-12-18 21:21:36,852 Stage-0_0: 0/256 Stage-1_0: 0/1
2015-12-18 21:21:39,900 Stage-0_0: 0/256 Stage-1_0: 0/1
2015-12-18 21:21:41,914 Stage-0_0: 0(+2)/256 Stage-1_0: 0/1
2015-12-18 21:21:44,933 Stage-0_0: 0(+2)/256 Stage-1_0: 0/1
2015-12-18 21:21:45,941 Stage-0_0: 1(+2)/256 Stage-1_0: 0/1
2015-12-18 21:21:46,952 Stage-0_0: 3(+2)/256 Stage-1_0: 0/1
2015-12-18 21:21:47,963 Stage-0_0: 4(+2)/256 Stage-1_0: 0/1
2015-12-18 21:21:48,969 Stage-0_0: 6(+2)/256 Stage-1_0: 0/1
2015-12-18 21:21:49,977 Stage-0_0: 7(+2)/256 Stage-1_0: 0/1
2015-12-18 21:21:50,991 Stage-0_0: 9(+2)/256 Stage-1_0: 0/1
2015-12-18 21:21:52,001 Stage-0_0: 10(+2)/256 Stage-1_0: 0/1
2015-12-18 21:21:53,013 Stage-0_0: 11(+2)/256 Stage-1_0: 0/1
2015-12-18 21:21:54,022 Stage-0_0: 13(+2)/256 Stage-1_0: 0/1
2015-12-18 21:21:55,030 Stage-0_0: 15(+2)/256 Stage-1_0: 0/1
2015-12-18 21:21:56,038 Stage-0_0: 18(+2)/256 Stage-1_0: 0/1
2015-12-18 21:21:57,053 Stage-0_0: 52(+2)/256 Stage-1_0: 0/1
2015-12-18 21:21:58,058 Stage-0_0: 90(+2)/256 Stage-1_0: 0/1
2015-12-18 21:21:59,066 Stage-0_0: 129(+2)/256 Stage-1_0: 0/1
2015-12-18 21:22:00,075 Stage-0_0: 176(+2)/256 Stage-1_0: 0/1
2015-12-18 21:22:01,083 Stage-0_0: 224(+2)/256 Stage-1_0: 0/1
2015-12-18 21:22:02,111 Stage-0_0: 256/256 Finished Stage-1_0: 0(+1)/1
2015-12-18 21:22:03,117 Stage-0_0: 256/256 Finished Stage-1_0: 1/1 Finished
Status: Finished successfully in 62.46 seconds
OK
2074897
Time taken: 66.434 seconds, Fetched: 1 row(s)
HTH
Mich Talebzadeh
Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
<http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7.
co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4
Publications due shortly:
Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8
Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly
<http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com
NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
From: Ophir Etzion [mailto:ophir@foursquare.com]
Sent: 18 December 2015 20:46
To: user@hive.apache.org; user@spark.apache.org
Subject: hive on spark
During spark-submit when running hive on spark I get:
Exception in thread "main" java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.HftpFileSystem could not be instantiated
Caused by: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.fs.DelegationTokenRenewer.<init>(Ljava/lang/Class;)V from class org.apache.hadoop.hdfs.HftpFileSystem
I managed to make hive on spark work on a staging cluster I have and now I'm trying to do the same on a production cluster and this happened.
Both are cdh5.4.3.
I read that this is due to something not being compiled against the correct hadoop version.
my main question what is the binary/jar/file that can cause this?
I tried replacing the binaries and jars to the ones used by the staging cluster (that hive on spark worked on) and it didn't help.
Thank you for anyone reading this, and thank you for any direction on where to look.
Ophir
Re: hive on spark
Posted by Akhil Das <ak...@sigmoidanalytics.com>.
Looks like a version mismatch, you need to investigate more and make sure
the versions satisfies.
Thanks
Best Regards
On Sat, Dec 19, 2015 at 2:15 AM, Ophir Etzion <op...@foursquare.com> wrote:
> During spark-submit when running hive on spark I get:
>
> Exception in thread "main" java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.HftpFileSystem could not be instantiated
>
>
> Caused by: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.fs.DelegationTokenRenewer.<init>(Ljava/lang/Class;)V from class org.apache.hadoop.hdfs.HftpFileSystem
>
> I managed to make hive on spark work on a staging cluster I have and now
> I'm trying to do the same on a production cluster and this happened. Both
> are cdh5.4.3.
>
> I read that this is due to something not being compiled against the correct hadoop version.
> my main question what is the binary/jar/file that can cause this?
>
> I tried replacing the binaries and jars to the ones used by the staging cluster (that hive on spark worked on) and it didn't help.
>
> Thank you for anyone reading this, and thank you for any direction on where to look.
>
> Ophir
>
>