You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by David Howell <da...@zipmoney.com.au> on 2017/04/13 01:41:45 UTC
Dependency management
Hi users,
I hope this is a simple one and you can help me 😊
I am having trouble adding dependency to Zeppelin Notebook (0.7.0) on AWS EMR (emr-5.4.0). I notice that the %dep interpreter is not available on AWS EMR so I can’t use that option.
I follow these instructions to add the dependency: https://zeppelin.apache.org/docs/latest/manual/dependencymanagement.html
I want to add the databricks spark-xml package for importing xml files to dataframes: https://github.com/databricks/spark-xml
This is the groupId:artifactId:version:
com.databricks:spark-xml_2.11:0.4.1
In Zeppelin, when I go to edit spark interpreter,
*I enter com.databricks:spark-xml_2.11:0.4.1 to the artifact field
*click save
*and then when I click OK to this dialog “Do you want to update this interpreter and restart with new settings – cancel | OK” click OK does nothing, the dialog stays on screen.
I assume this is writing dependency to spark group in the interpreter.json, is that correct? I tried altering write permissions for that file but didn’t help.
I confirm this is correct for my Spark/Scala version by running spark-shell, and since this works I assume I don’t need to add any additional maven repo.
Maybe I do need new repo?
Maybe I need to put the jar in my local repo? Interpreter.json says my local repo is /var/lib/zeppelin/.m2/repository but this directory does not exist.
I can use this package from spark shell successfully:
$spark-shell --packages com.databricks:spark-xml_2.11:0.4.1
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val df = sqlContext.read
.format("com.databricks.spark.xml")
…
[zipMoney Logo]
David Howell
Data Engineering
+61 477 150 379
[Facebook link]<https://www.facebook.com/ZipmoneyAU/?fref=ts>
[Twitter link]<https://twitter.com/zipmoneyau>
[Instagram link]<https://www.instagram.com/zipmoneyau/?hl=en>
[Linkedin link]<https://www.linkedin.com/company/zipmoney>
Re: Dependency management
Posted by moon soo Lee <mo...@apache.org>.
Hi,
Thanks for reporting the problem.
Downloaded dependency will be stored under 'local-repo' directory (by
default). For example after i add com.databricks:spark-xml_2.11:0.4.1 in
spark interpreter setting,
moon$ ls local-repo/2CD5YP3GK/
scala-library-2.11.7.jar spark-xml_2.11-0.4.1.jar
I see two files downloaded under ZEPPELIN_HOME/local-repo/[INTERPRETER_ID]
directory.
Hope this helps
Thanks,
moon
On Thu, Apr 13, 2017 at 10:42 AM David Howell <da...@zipmoney.com.au>
wrote:
> Hi users,
>
> I hope this is a simple one and you can help me 😊
>
> I am having trouble adding dependency to Zeppelin Notebook (0.7.0) on AWS
> EMR (emr-5.4.0). I notice that the %dep interpreter is not available on AWS
> EMR so I can’t use that option.
>
>
>
> I follow these instructions to add the dependency:
> https://zeppelin.apache.org/docs/latest/manual/dependencymanagement.html
>
>
>
> I want to add the databricks spark-xml package for importing xml files to
> dataframes: https://github.com/databricks/spark-xml
>
>
>
> This is the groupId:artifactId:version:
>
> com.databricks:spark-xml_2.11:0.4.1
>
>
>
> In Zeppelin, when I go to edit spark interpreter,
>
> *I enter com.databricks:spark-xml_2.11:0.4.1 to the artifact field
>
> *click save
>
> *and then when I click OK to this dialog “Do you want to update this
> interpreter and restart with new settings – cancel | OK” click OK does
> nothing, the dialog stays on screen.
>
>
>
> I assume this is writing dependency to spark group in the
> interpreter.json, is that correct? I tried altering write permissions for
> that file but didn’t help.
>
>
>
> I confirm this is correct for my Spark/Scala version by running
> spark-shell, and since this works I assume I don’t need to add any
> additional maven repo.
>
> Maybe I do need new repo?
>
> Maybe I need to put the jar in my local repo? Interpreter.json says my
> local repo is /var/lib/zeppelin/.m2/repository but this directory does not
> exist.
>
>
>
>
>
> I can use this package from spark shell successfully:
>
>
>
> $spark-shell --packages com.databricks:spark-xml_2.11:0.4.1
>
> import org.apache.spark.sql.SQLContext
>
> val sqlContext = new SQLContext(sc)
>
> val df = sqlContext.read
>
> .format("com.databricks.spark.xml")
>
> …
>
>
>
>
>
>
>
> [image: image002.png]
>
>
>
>
>
> *David Howell*
>
> *Data Engineering*
>
>
> +61 477 150 379 <+61%20477%20150%20379>
>
>
>
> <https://www.facebook.com/ZipmoneyAU/?fref=ts>
> [image: image004.png]
>
> <https://twitter.com/zipmoneyau>
> [image: image006.png]
>
> <https://www.instagram.com/zipmoneyau/?hl=en>
> [image: image008.png]
>
> <https://www.linkedin.com/company/zipmoney>
> [image: image010.png]
>
>
>
>
>