You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Adam Kramer (JIRA)" <ji...@apache.org> on 2017/11/02 14:02:00 UTC

[jira] [Comment Edited] (SPARK-22419) Hive and Hive Thriftserver jars missing from "without hadoop" build

    [ https://issues.apache.org/jira/browse/SPARK-22419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16235774#comment-16235774 ] 

Adam Kramer edited comment on SPARK-22419 at 11/2/17 2:01 PM:
--------------------------------------------------------------

I'll assume it's on purpose for my stated reasons above. Apologies for not posting to the mailing list, but I have a feeling this could act as a good web reference from search, I rarely get results from the mailing list while troubleshooting in Google. Also, the documentation for using Spark with upgraded versions of Hadoop (e.g. 2.8) is definitely lacking or at best confusing (i.e. a binary version including a version of Hadoop libs can still be configured to use another version of Hadoop by following instruction from the "without hadoop" wiki page). I suspect those instructions are old, but when using SPARK_DIST_CLASSPATH to override the hadoop libraries you run into things like log4j.properties files being hijacked by Hadoop version that change your application logging altogether. My guess is that its something that likely worked well a while ago or in a very specific situation, thus requires a lot of trial and error.


was (Author: adamjk):
I'll assume it's on purpose for my stated reasons above. Apologies for not posting to the mailing list, but I have a feeling this could act as a good web reference from search, I rarely get results from the mailing list while troubleshooting in Google. Also, the documentation for using Spark with upgraded versions of Hadoop (e.g. 2.8) is definitely lacking or at best confusing (i.e. a binary version including a version of Hadoop libs can still be configured to use another version of Hadoop by following instruction from the "without hadoop" wiki page). I suspect those instructions are old, but when using SPARK_DIST_CLASSPATH to override the hadoop libraries you run into things like log4j.properties files being hijacked by Hadoop version that change your application logging altogether. My guess is that its something that likely worked well a while ago or in a very specific situation requires a lot of investigation.

> Hive and Hive Thriftserver jars missing from "without hadoop" build
> -------------------------------------------------------------------
>
>                 Key: SPARK-22419
>                 URL: https://issues.apache.org/jira/browse/SPARK-22419
>             Project: Spark
>          Issue Type: Question
>          Components: Build
>    Affects Versions: 2.1.1
>            Reporter: Adam Kramer
>            Priority: Minor
>
> The "without hadoop" binary distribution does not have hive-related libraries in the jars directory.  This may be due to Hive being tied to major releases of Hadoop. My project requires using Hadoop 2.8, so "without hadoop" version seemed the best option. Should I use the make-distribution.sh instead?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org