You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Marco Mistroni <mm...@gmail.com> on 2016/09/03 19:13:06 UTC

Re: Help with Jupyter Notebook Settup on CDH using Anaconda

Hi
  please paste the exception
for Spark vs Jupyter, you might want to sign up for  this.
It'll give you jupyter  and spark...and presumably the spark-csv is already
part of it ?

https://community.cloud.databricks.com/login.html

hth
 marco



On Sat, Sep 3, 2016 at 8:10 PM, Arif,Mubaraka <ar...@heb.com> wrote:

> On the on-premise *Cloudera Hadoop 5.7.2* I have installed the anaconda
> package and trying to *setup Jupyter notebook *to work with spark1.6.
>
>
>
> I have ran into problems when I trying to use the package
> *com.databricks:spark-csv_2.10:1.4.0* for *reading and inferring the
> schema of the csv file using python spark*.
>
>
>
> I have installed the* jar file - spark-csv_2.10-1.4.0.jar *in
> */var/opt/teradata/cloudera/parcels/CDH-5.7.2-1.cdh5.7.2.p0.18/jar* and c
> *onfigurations* are set as  :
>
>
>
> export PYSPARK_DRIVER_PYTHON=/var/opt/teradata/cloudera/parcels/
> Anaconda-4.0.0/bin/jupyter
> export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=False
> --NotebookApp.ip='*' --NotebookApp.port=8083"
> export PYSPARK_PYTHON=/var/opt/teradata/cloudera/parcels/
> Anaconda-4.0.0/bin/python
>
>
>
> When I run pyspark from the command line with packages option, like :
>
>
>
> *$pyspark --packages com.databricks:spark-csv_2.10:1.4.0 *
>
>
>
> It throws the error and fails to recognize the added dependency.
>
>
>
> Any ideas on how to resolve this error is much appreciated.
>
>
>
> Also, any ideas on the experience in installing and running Jupyter
> notebook with anaconda and spark please share.
>
>
>
> thanks,
>
> Muby
>
>
>
>
> --------------------------------------------------------------------- To
> unsubscribe e-mail: user-unsubscribe@spark.apache.org