You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by David Howell <da...@zipmoney.com.au> on 2017/04/13 01:41:45 UTC

Dependency management

Hi users,
I hope this is a simple one and you can help me 😊
I am having trouble adding dependency to Zeppelin Notebook (0.7.0) on AWS EMR (emr-5.4.0). I notice that the %dep interpreter is not available on AWS EMR so I can’t use that option.

I follow these instructions to add the dependency: https://zeppelin.apache.org/docs/latest/manual/dependencymanagement.html

I want to add the databricks spark-xml package for importing xml files to dataframes:  https://github.com/databricks/spark-xml

This is the groupId:artifactId:version:
com.databricks:spark-xml_2.11:0.4.1

In Zeppelin, when I go to edit spark interpreter,
*I enter  com.databricks:spark-xml_2.11:0.4.1 to the artifact field
*click save
*and then when I click OK to this dialog “Do you want to update this interpreter and restart with new settings – cancel | OK” click OK does nothing, the dialog stays on screen.

I assume this is writing dependency to spark group in the interpreter.json, is that correct? I tried altering write permissions for that file but didn’t help.

I confirm this is correct for my Spark/Scala version by running spark-shell, and since this works I assume I don’t need to add any additional maven repo.
Maybe I do need new repo?
Maybe I need to put the jar in my local repo? Interpreter.json says my local repo is /var/lib/zeppelin/.m2/repository but this directory does not exist.


I can use this package from spark shell successfully:

$spark-shell --packages com.databricks:spark-xml_2.11:0.4.1
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val df = sqlContext.read
                    .format("com.databricks.spark.xml")
…



[zipMoney Logo]





David Howell
Data Engineering

+61 477 150 379



[Facebook link]<https://www.facebook.com/ZipmoneyAU/?fref=ts>

[Twitter link]<https://twitter.com/zipmoneyau>

[Instagram link]<https://www.instagram.com/zipmoneyau/?hl=en>

[Linkedin link]<https://www.linkedin.com/company/zipmoney>

Re: Dependency management

Posted by moon soo Lee <mo...@apache.org>.

Hi,

Thanks for reporting the problem.

Downloaded dependency will be stored under 'local-repo' directory (by
default). For example after i add com.databricks:spark-xml_2.11:0.4.1 in
spark interpreter setting,

moon$ ls local-repo/2CD5YP3GK/
scala-library-2.11.7.jar spark-xml_2.11-0.4.1.jar

I see two files downloaded under ZEPPELIN_HOME/local-repo/[INTERPRETER_ID]
directory.

Hope this helps
Thanks,
moon

On Thu, Apr 13, 2017 at 10:42 AM David Howell <da...@zipmoney.com.au>
wrote:

> Hi users,
>
> I hope this is a simple one and you can help me 😊
>
> I am having trouble adding dependency to Zeppelin Notebook (0.7.0) on AWS
> EMR (emr-5.4.0). I notice that the %dep interpreter is not available on AWS
> EMR so I can’t use that option.
>
>
>
> I follow these instructions to add the dependency:
> https://zeppelin.apache.org/docs/latest/manual/dependencymanagement.html
>
>
>
> I want to add the databricks spark-xml package for importing xml files to
> dataframes:  https://github.com/databricks/spark-xml
>
>
>
> This is the groupId:artifactId:version:
>
> com.databricks:spark-xml_2.11:0.4.1
>
>
>
> In Zeppelin, when I go to edit spark interpreter,
>
> *I enter  com.databricks:spark-xml_2.11:0.4.1 to the artifact field
>
> *click save
>
> *and then when I click OK to this dialog “Do you want to update this
> interpreter and restart with new settings – cancel | OK” click OK does
> nothing, the dialog stays on screen.
>
>
>
> I assume this is writing dependency to spark group in the
> interpreter.json, is that correct? I tried altering write permissions for
> that file but didn’t help.
>
>
>
> I confirm this is correct for my Spark/Scala version by running
> spark-shell, and since this works I assume I don’t need to add any
> additional maven repo.
>
> Maybe I do need new repo?
>
> Maybe I need to put the jar in my local repo? Interpreter.json says my
> local repo is /var/lib/zeppelin/.m2/repository but this directory does not
> exist.
>
>
>
>
>
> I can use this package from spark shell successfully:
>
>
>
> $spark-shell --packages com.databricks:spark-xml_2.11:0.4.1
>
> import org.apache.spark.sql.SQLContext
>
> val sqlContext = new SQLContext(sc)
>
> val df = sqlContext.read
>
>                     .format("com.databricks.spark.xml")
>
> …
>
>
>
>
>
>
>
> [image: image002.png]
>
>
>
>
>
> *David Howell*
>
> *Data Engineering*
>
>
> +61 477 150 379 <+61%20477%20150%20379>
>
>
>
> <https://www.facebook.com/ZipmoneyAU/?fref=ts>
> [image: image004.png]
>
> <https://twitter.com/zipmoneyau>
> [image: image006.png]
>
> <https://www.instagram.com/zipmoneyau/?hl=en>
> [image: image008.png]
>
> <https://www.linkedin.com/company/zipmoney>
> [image: image010.png]
>
>
>
>
>