You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Marco Mistroni <mm...@gmail.com> on 2016/09/03 19:13:06 UTC
Re: Help with Jupyter Notebook Settup on CDH using Anaconda
Hi
please paste the exception
for Spark vs Jupyter, you might want to sign up for this.
It'll give you jupyter and spark...and presumably the spark-csv is already
part of it ?
https://community.cloud.databricks.com/login.html
hth
marco
On Sat, Sep 3, 2016 at 8:10 PM, Arif,Mubaraka <ar...@heb.com> wrote:
> On the on-premise *Cloudera Hadoop 5.7.2* I have installed the anaconda
> package and trying to *setup Jupyter notebook *to work with spark1.6.
>
>
>
> I have ran into problems when I trying to use the package
> *com.databricks:spark-csv_2.10:1.4.0* for *reading and inferring the
> schema of the csv file using python spark*.
>
>
>
> I have installed the* jar file - spark-csv_2.10-1.4.0.jar *in
> */var/opt/teradata/cloudera/parcels/CDH-5.7.2-1.cdh5.7.2.p0.18/jar* and c
> *onfigurations* are set as :
>
>
>
> export PYSPARK_DRIVER_PYTHON=/var/opt/teradata/cloudera/parcels/
> Anaconda-4.0.0/bin/jupyter
> export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=False
> --NotebookApp.ip='*' --NotebookApp.port=8083"
> export PYSPARK_PYTHON=/var/opt/teradata/cloudera/parcels/
> Anaconda-4.0.0/bin/python
>
>
>
> When I run pyspark from the command line with packages option, like :
>
>
>
> *$pyspark --packages com.databricks:spark-csv_2.10:1.4.0 *
>
>
>
> It throws the error and fails to recognize the added dependency.
>
>
>
> Any ideas on how to resolve this error is much appreciated.
>
>
>
> Also, any ideas on the experience in installing and running Jupyter
> notebook with anaconda and spark please share.
>
>
>
> thanks,
>
> Muby
>
>
>
>
> --------------------------------------------------------------------- To
> unsubscribe e-mail: user-unsubscribe@spark.apache.org