You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Stephen Boesch <ja...@gmail.com> on 2014/07/18 01:00:26 UTC

Current way to include hive in a build

Having looked at trunk make-distribution.sh the --with-hive and --with-yarn
are now deprecated.

Here is the way I have built it:

Added to pom.xml:

   <profile>
      <id>cdh5</id>
      <activation>
        <activeByDefault>false</activeByDefault>
      </activation>
      <properties>
        <hadoop.version>2.3.0-cdh5.0.0</hadoop.version>
        <yarn.version>2.3.0-cdh5.0.0</yarn.version>
        <hbase.version>0.96.1.1-cdh5.0.0</hbase.version>
        <zookeeper.version>3.4.5-cdh5.0.0</zookeeper.version>
      </properties>
    </profile>

*mvn -Pyarn -Pcdh5 -Phive -Dhadoop.version=2.3.0-cdh5.0.1
-Dyarn.version=2.3.0-cdh5.0.0 -DskipTests clean package*


[INFO]
------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM .......................... SUCCESS [3.165s]
[INFO] Spark Project Core ................................ SUCCESS
[2:39.504s]
[INFO] Spark Project Bagel ............................... SUCCESS [7.596s]
[INFO] Spark Project GraphX .............................. SUCCESS [22.027s]
[INFO] Spark Project ML Library .......................... SUCCESS [36.284s]
[INFO] Spark Project Streaming ........................... SUCCESS [24.309s]
[INFO] Spark Project Tools ............................... SUCCESS [3.147s]
[INFO] Spark Project Catalyst ............................ SUCCESS [20.148s]
[INFO] Spark Project SQL ................................. SUCCESS [18.560s]
*[INFO] Spark Project Hive ................................ FAILURE
[33.962s]*

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-dependency-plugin:2.4:copy-dependencies
(copy-dependencies) on project spark-hive_2.10: Execution copy-dependencies
of goal
org.apache.maven.plugins:maven-dependency-plugin:2.4:copy-dependencies
failed: Plugin org.apache.maven.plugins:maven-dependency-plugin:2.4 or one
of its dependencies could not be resolved: Could not find artifact
commons-logging:commons-logging:jar:1.0.4 -> [Help 1]

Anyone who is presently building with -Phive and has a suggestion for this?

Re: Current way to include hive in a build

Posted by Stephen Boesch <ja...@gmail.com>.
Thanks v much Patrick and Sean.  I have the build working now as follows:

 mvn -Pyarn -Pcdh5 -Phive -DskipTests clean package

in Addition, I am in the midst of running some tests and so far so good.


The pom.xml changes:

Added to main/parent directory pom.xml:

    <profile>
      <id>cdh5</id>
      <properties>
        <hadoop.version>2.3.0-cdh5.0.0</hadoop.version>
        <yarn.version>2.3.0-cdh5.0.0</yarn.version>
        <zookeeper.version>3.4.5-cdh5.0.0</zookeeper.version>
        <protobuf.version>2.5.0</protobuf.version>
        <jets3t.version>0.9.0</jets3t.version>
        <hbase.version>0.96.1.1-cdh5.0.0</hbase.version>
      </properties>
    </profile>

Added four dependencies into  *examples/*pom.xml:

One each for :  (hbase-common, hbase-client, hbase-protocol, hbase-server).
  Here is the one for hbase-common:

    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-common</artifactId>
      <version>${hbase.version}</version>
      <exclusions>
        <exclusion>
          <groupId>asm</groupId>
          <artifactId>asm</artifactId>
        </exclusion>
        <exclusion>
          <groupId>org.jboss.netty</groupId>
          <artifactId>netty</artifactId>
        </exclusion>
        <exclusion>
          <groupId>io.netty</groupId>
          <artifactId>netty</artifactId>
        </exclusion>
        <exclusion>
          <groupId>commons-logging</groupId>
          <artifactId>commons-logging</artifactId>
        </exclusion>
        <exclusion>
          <groupId>org.jruby</groupId>
          <artifactId>jruby-complete</artifactId>
        </exclusion>
      </exclusions>
    </dependency>

Duplicate the above for:

      <artifactId>hbase-client</artifactId>
..
      <artifactId>hbase-protocol</artifactId>
..
      <artifactId>hbase-server</artifactId>
..



2014-07-18 3:16 GMT-07:00 Sean Owen <so...@cloudera.com>:

> This build invocation works just as you have it, for me. (At least, it
> gets through Hive; Examples fails for a different unrelated reason.)
>
> commons-logging 1.0.4 exists in Maven for sure. Maybe there is some
> temporary problem accessing Maven's repo?
>
> On Fri, Jul 18, 2014 at 12:00 AM, Stephen Boesch <ja...@gmail.com>
> wrote:
> > Added to pom.xml:
> >
> >    <profile>
> >       <id>cdh5</id>
> >       <activation>
> >         <activeByDefault>false</activeByDefault>
> >       </activation>
> >       <properties>
> >         <hadoop.version>2.3.0-cdh5.0.0</hadoop.version>
> >         <yarn.version>2.3.0-cdh5.0.0</yarn.version>
> >         <hbase.version>0.96.1.1-cdh5.0.0</hbase.version>
> >         <zookeeper.version>3.4.5-cdh5.0.0</zookeeper.version>
> >       </properties>
> >     </profile>
> >
> > *mvn -Pyarn -Pcdh5 -Phive -Dhadoop.version=2.3.0-cdh5.0.1
> > -Dyarn.version=2.3.0-cdh5.0.0 -DskipTests clean package*
> >
> >
> > [INFO]
> > ------------------------------------------------------------------------
> > [INFO] Reactor Summary:
> > [INFO]
> > [INFO] Spark Project Parent POM .......................... SUCCESS
> [3.165s]
> > [INFO] Spark Project Core ................................ SUCCESS
> > [2:39.504s]
> > [INFO] Spark Project Bagel ............................... SUCCESS
> [7.596s]
> > [INFO] Spark Project GraphX .............................. SUCCESS
> [22.027s]
> > [INFO] Spark Project ML Library .......................... SUCCESS
> [36.284s]
> > [INFO] Spark Project Streaming ........................... SUCCESS
> [24.309s]
> > [INFO] Spark Project Tools ............................... SUCCESS
> [3.147s]
> > [INFO] Spark Project Catalyst ............................ SUCCESS
> [20.148s]
> > [INFO] Spark Project SQL ................................. SUCCESS
> [18.560s]
> > *[INFO] Spark Project Hive ................................ FAILURE
> > [33.962s]*
> >
> > [ERROR] Failed to execute goal
> > org.apache.maven.plugins:maven-dependency-plugin:2.4:copy-dependencies
> > (copy-dependencies) on project spark-hive_2.10: Execution
> copy-dependencies
> > of goal
> > org.apache.maven.plugins:maven-dependency-plugin:2.4:copy-dependencies
> > failed: Plugin org.apache.maven.plugins:maven-dependency-plugin:2.4 or
> one
> > of its dependencies could not be resolved: Could not find artifact
> > commons-logging:commons-logging:jar:1.0.4 -> [Help 1]
> >
> > Anyone who is presently building with -Phive and has a suggestion for
> this?
>

Re: Current way to include hive in a build

Posted by Sean Owen <so...@cloudera.com>.
This build invocation works just as you have it, for me. (At least, it
gets through Hive; Examples fails for a different unrelated reason.)

commons-logging 1.0.4 exists in Maven for sure. Maybe there is some
temporary problem accessing Maven's repo?

On Fri, Jul 18, 2014 at 12:00 AM, Stephen Boesch <ja...@gmail.com> wrote:
> Added to pom.xml:
>
>    <profile>
>       <id>cdh5</id>
>       <activation>
>         <activeByDefault>false</activeByDefault>
>       </activation>
>       <properties>
>         <hadoop.version>2.3.0-cdh5.0.0</hadoop.version>
>         <yarn.version>2.3.0-cdh5.0.0</yarn.version>
>         <hbase.version>0.96.1.1-cdh5.0.0</hbase.version>
>         <zookeeper.version>3.4.5-cdh5.0.0</zookeeper.version>
>       </properties>
>     </profile>
>
> *mvn -Pyarn -Pcdh5 -Phive -Dhadoop.version=2.3.0-cdh5.0.1
> -Dyarn.version=2.3.0-cdh5.0.0 -DskipTests clean package*
>
>
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Spark Project Parent POM .......................... SUCCESS [3.165s]
> [INFO] Spark Project Core ................................ SUCCESS
> [2:39.504s]
> [INFO] Spark Project Bagel ............................... SUCCESS [7.596s]
> [INFO] Spark Project GraphX .............................. SUCCESS [22.027s]
> [INFO] Spark Project ML Library .......................... SUCCESS [36.284s]
> [INFO] Spark Project Streaming ........................... SUCCESS [24.309s]
> [INFO] Spark Project Tools ............................... SUCCESS [3.147s]
> [INFO] Spark Project Catalyst ............................ SUCCESS [20.148s]
> [INFO] Spark Project SQL ................................. SUCCESS [18.560s]
> *[INFO] Spark Project Hive ................................ FAILURE
> [33.962s]*
>
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-dependency-plugin:2.4:copy-dependencies
> (copy-dependencies) on project spark-hive_2.10: Execution copy-dependencies
> of goal
> org.apache.maven.plugins:maven-dependency-plugin:2.4:copy-dependencies
> failed: Plugin org.apache.maven.plugins:maven-dependency-plugin:2.4 or one
> of its dependencies could not be resolved: Could not find artifact
> commons-logging:commons-logging:jar:1.0.4 -> [Help 1]
>
> Anyone who is presently building with -Phive and has a suggestion for this?

Re: Current way to include hive in a build

Posted by Patrick Wendell <pw...@gmail.com>.
Hey Stephen,

The only change the build was that we ask users to run -Phive and
-Pyarn of --with-hive and --with-yarn (which internally just set
-Phive and -Pyarn). I don't think this should affect the dependency
graph.

Just to test this, what happens if you run *without* the CDH profile
and build with hadoop version 2.3.0? Does that work?

- Patrick

On Thu, Jul 17, 2014 at 4:00 PM, Stephen Boesch <ja...@gmail.com> wrote:
> Having looked at trunk make-distribution.sh the --with-hive and --with-yarn
> are now deprecated.
>
> Here is the way I have built it:
>
> Added to pom.xml:
>
>    <profile>
>       <id>cdh5</id>
>       <activation>
>         <activeByDefault>false</activeByDefault>
>       </activation>
>       <properties>
>         <hadoop.version>2.3.0-cdh5.0.0</hadoop.version>
>         <yarn.version>2.3.0-cdh5.0.0</yarn.version>
>         <hbase.version>0.96.1.1-cdh5.0.0</hbase.version>
>         <zookeeper.version>3.4.5-cdh5.0.0</zookeeper.version>
>       </properties>
>     </profile>
>
> *mvn -Pyarn -Pcdh5 -Phive -Dhadoop.version=2.3.0-cdh5.0.1
> -Dyarn.version=2.3.0-cdh5.0.0 -DskipTests clean package*
>
>
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Spark Project Parent POM .......................... SUCCESS [3.165s]
> [INFO] Spark Project Core ................................ SUCCESS
> [2:39.504s]
> [INFO] Spark Project Bagel ............................... SUCCESS [7.596s]
> [INFO] Spark Project GraphX .............................. SUCCESS [22.027s]
> [INFO] Spark Project ML Library .......................... SUCCESS [36.284s]
> [INFO] Spark Project Streaming ........................... SUCCESS [24.309s]
> [INFO] Spark Project Tools ............................... SUCCESS [3.147s]
> [INFO] Spark Project Catalyst ............................ SUCCESS [20.148s]
> [INFO] Spark Project SQL ................................. SUCCESS [18.560s]
> *[INFO] Spark Project Hive ................................ FAILURE
> [33.962s]*
>
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-dependency-plugin:2.4:copy-dependencies
> (copy-dependencies) on project spark-hive_2.10: Execution copy-dependencies
> of goal
> org.apache.maven.plugins:maven-dependency-plugin:2.4:copy-dependencies
> failed: Plugin org.apache.maven.plugins:maven-dependency-plugin:2.4 or one
> of its dependencies could not be resolved: Could not find artifact
> commons-logging:commons-logging:jar:1.0.4 -> [Help 1]
>
> Anyone who is presently building with -Phive and has a suggestion for this?