You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Ophir Etzion <op...@foursquare.com> on 2015/12/18 21:45:54 UTC

hive on spark

During spark-submit when running hive on spark I get:

Exception in thread "main" java.util.ServiceConfigurationError:
org.apache.hadoop.fs.FileSystem: Provider
org.apache.hadoop.hdfs.HftpFileSystem could not be instantiated


Caused by: java.lang.IllegalAccessError: tried to access method
org.apache.hadoop.fs.DelegationTokenRenewer.<init>(Ljava/lang/Class;)V
from class org.apache.hadoop.hdfs.HftpFileSystem

I managed to make hive on spark work on a staging cluster I have and now
I'm trying to do the same on a production cluster and this happened. Both
are cdh5.4.3.

I read that this is due to something not being compiled against the
correct hadoop version.
my main question what is the binary/jar/file that can cause this?

I tried replacing the binaries and jars to the ones used by the
staging cluster (that hive on spark worked on) and it didn't help.

Thank you for anyone reading this, and thank you for any direction on
where to look.

Ophir

RE: hive on spark

Posted by Mich Talebzadeh <mi...@peridale.co.uk>.

Hi,
 
Your statement
 
“I read that this is due to something not being compiled against the correct hadoop version.
my main question what is the binary/jar/file that can cause this?”
 

 

I believe this is the file in $HIVE_HOME/lib called spark-assembly-1.3.1-hadoop2.4.0.jar which you need to build it from the source code for Spark 1.3.1 excluding Hive jars

 

Something like below

 

./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.4,parquet-provided"

 

Then extract the above file and copy over to $HIVE_HOME/lib

 

Example

 

hive> set spark.home=/usr/lib/spark-1.3.1-bin-hadoop2.6;  -- This is the precompiled binary installation fot Spark 1.3.1

hive> set hive.execution.engine=spark;

hive> set spark.master=yarn-client;

hive> select count(1) from t;

Query ID = hduser_20151218212056_4e1faef5-93bd-4e18-9375-659220d67530

Total jobs = 1

Launching Job 1 out of 1

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

  set mapreduce.job.reduces=<number>

Starting Spark Job = 35c78523-4a36-45e5-95f1-01052985ff4b

 

Query Hive on Spark job[0] stages:

0

1

 

Status: Running (Hive on Spark job[0])

Job Progress Format

CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]

2015-12-18 21:21:36,852 Stage-0_0: 0/256        Stage-1_0: 0/1

2015-12-18 21:21:39,900 Stage-0_0: 0/256        Stage-1_0: 0/1

2015-12-18 21:21:41,914 Stage-0_0: 0(+2)/256    Stage-1_0: 0/1

2015-12-18 21:21:44,933 Stage-0_0: 0(+2)/256    Stage-1_0: 0/1

2015-12-18 21:21:45,941 Stage-0_0: 1(+2)/256    Stage-1_0: 0/1

2015-12-18 21:21:46,952 Stage-0_0: 3(+2)/256    Stage-1_0: 0/1

2015-12-18 21:21:47,963 Stage-0_0: 4(+2)/256    Stage-1_0: 0/1

2015-12-18 21:21:48,969 Stage-0_0: 6(+2)/256    Stage-1_0: 0/1

2015-12-18 21:21:49,977 Stage-0_0: 7(+2)/256    Stage-1_0: 0/1

2015-12-18 21:21:50,991 Stage-0_0: 9(+2)/256    Stage-1_0: 0/1

2015-12-18 21:21:52,001 Stage-0_0: 10(+2)/256   Stage-1_0: 0/1

2015-12-18 21:21:53,013 Stage-0_0: 11(+2)/256   Stage-1_0: 0/1

2015-12-18 21:21:54,022 Stage-0_0: 13(+2)/256   Stage-1_0: 0/1

2015-12-18 21:21:55,030 Stage-0_0: 15(+2)/256   Stage-1_0: 0/1

2015-12-18 21:21:56,038 Stage-0_0: 18(+2)/256   Stage-1_0: 0/1

2015-12-18 21:21:57,053 Stage-0_0: 52(+2)/256   Stage-1_0: 0/1

2015-12-18 21:21:58,058 Stage-0_0: 90(+2)/256   Stage-1_0: 0/1

2015-12-18 21:21:59,066 Stage-0_0: 129(+2)/256  Stage-1_0: 0/1

2015-12-18 21:22:00,075 Stage-0_0: 176(+2)/256  Stage-1_0: 0/1

2015-12-18 21:22:01,083 Stage-0_0: 224(+2)/256  Stage-1_0: 0/1

2015-12-18 21:22:02,111 Stage-0_0: 256/256 Finished     Stage-1_0: 0(+1)/1

2015-12-18 21:22:03,117 Stage-0_0: 256/256 Finished     Stage-1_0: 1/1 Finished

Status: Finished successfully in 62.46 seconds

OK

2074897

Time taken: 66.434 seconds, Fetched: 1 row(s)

 

 

HTH

 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

 <http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly

 

 <http://talebzadehmich.wordpress.com/> http://talebzadehmich.wordpress.com

 

NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.

 

From: Ophir Etzion [mailto:ophir@foursquare.com] 
Sent: 18 December 2015 20:46
To: user@hive.apache.org; user@spark.apache.org
Subject: hive on spark

 

During spark-submit when running hive on spark I get:
 
Exception in thread "main" java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.HftpFileSystem could not be instantiated
 
Caused by: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.fs.DelegationTokenRenewer.<init>(Ljava/lang/Class;)V from class org.apache.hadoop.hdfs.HftpFileSystem
 
I managed to make hive on spark work on a staging cluster I have and now I'm trying to do the same on a production cluster and this happened.
 
Both are cdh5.4.3.
 
I read that this is due to something not being compiled against the correct hadoop version.
my main question what is the binary/jar/file that can cause this?
 
I tried replacing the binaries and jars to the ones used by the staging cluster (that hive on spark worked on) and it didn't help.
 
Thank you for anyone reading this, and thank you for any direction on where to look.
 
Ophir

Re: hive on spark

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

Looks like a version mismatch, you need to investigate more and make sure
the versions satisfies.

Thanks
Best Regards

On Sat, Dec 19, 2015 at 2:15 AM, Ophir Etzion <op...@foursquare.com> wrote:

> During spark-submit when running hive on spark I get:
>
> Exception in thread "main" java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.HftpFileSystem could not be instantiated
>
>
> Caused by: java.lang.IllegalAccessError: tried to access method org.apache.hadoop.fs.DelegationTokenRenewer.<init>(Ljava/lang/Class;)V from class org.apache.hadoop.hdfs.HftpFileSystem
>
> I managed to make hive on spark work on a staging cluster I have and now
> I'm trying to do the same on a production cluster and this happened. Both
> are cdh5.4.3.
>
> I read that this is due to something not being compiled against the correct hadoop version.
> my main question what is the binary/jar/file that can cause this?
>
> I tried replacing the binaries and jars to the ones used by the staging cluster (that hive on spark worked on) and it didn't help.
>
> Thank you for anyone reading this, and thank you for any direction on where to look.
>
> Ophir
>
>