You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by Lian Jiang <ji...@gmail.com> on 2018/08/27 21:41:31 UTC

Python script calls R script in Zeppelin on Hadoop

Hi,

We are using HDP3.0 (using zeppelin 0.8.0) and are migrating Jupyter
notebooks to Zeppelin. One issue we came across is that a python script
calling R script does not work in Zeppelin.

%livy2.pyspark
import os
sc.addPyFile("hdfs:///user/zeppelin/my.py")
import my
my.test()

my.test() calls R script like: ['Rscript', 'myR.r']

Fatal error: cannot open file 'myR.r': No such file or directory

When running this notebook in jupyter, both my.py and myR.r exist in the
same folder. I understand the story changes on hadoop because the scripts
run in containers.

My question:
Is this scenario supported in zeppelin? How to add a R script into a python
spark context so that the Python script can find the R script? Appreciate!

Re: Python script calls R script in Zeppelin on Hadoop

Posted by Lian Jiang <ji...@gmail.com>.

Thanks Jeff.

Problem solved by installing the R packages into /usr/lib64/R/library (the
default lib path) on each datanode. Your clue help!

On Wed, Aug 29, 2018 at 7:40 PM Jeff Zhang <zj...@gmail.com> wrote:

>
> I am not sure what's wrong. maybe you can ssh to that machine and run this
> r script manually first to verify what's wrong.
>
>
>
> Lian Jiang <ji...@gmail.com>于2018年8月30日周四 上午10:34写道：
>
>> Jeff,
>>
>> R is installed on namenode and all data nodes. The R packages have been
>> copied to them all too. I am not sure if an R script launched by pyspark's subprocess
>> can access spark context or not. If not, using addFiles to add R packages
>> into spark context will not help test.r install the packages. Thanks for
>> clue.
>>
>>
>>
>> On Wed, Aug 29, 2018 at 7:24 PM Jeff Zhang <zj...@gmail.com> wrote:
>>
>>>
>>> You need to make sure the spark driver machine have this package
>>> installed. And since you are using yarn-cluster mode via livy, you have to
>>> install this packages on all nodes because the spark driver could be
>>> launched in any node of this cluster.
>>>
>>>
>>>
>>> Lian Jiang <ji...@gmail.com>于2018年8月30日周四 上午1:46写道：
>>>
>>>> After calling a sample R script, we found another issue when running a
>>>> real R script. This R script failed to load changepoint library.
>>>>
>>>> I tried:
>>>>
>>>> %livy2.sparkr
>>>> install.packages("changepoint", repos="file:///mnt/data/tmp/r")
>>>> library(changepoint) // I see "Successfully loaded changepoint package
>>>> version 2.2.2"
>>>>
>>>> %livy2.pyspark
>>>> from pyspark import SparkFiles
>>>> import subprocess
>>>>
>>>> sc.addFile("hdfs:///user/zeppelin/test.r")
>>>> testpath = SparkFiles.get('test.r')
>>>> stdoutdata = subprocess.getoutput("Rscript " + testpath)
>>>> print(stdoutdata)
>>>>
>>>> The error: Error in library(changepoint) : there is no package called
>>>> ‘changepoint’
>>>>
>>>> test.r is simply:
>>>>
>>>> library(changepoint)
>>>>
>>>> Any idea how to make changepoint available for the R script? Thanks.
>>>>
>>>>
>>>>
>>>> On Tue, Aug 28, 2018 at 10:07 PM Lian Jiang <ji...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks Jeff.
>>>>>
>>>>> This worked:
>>>>>
>>>>> %livy2.pyspark
>>>>> from pyspark import SparkFiles
>>>>> import subprocess
>>>>>
>>>>> sc.addFile("hdfs:///user/zeppelin/ocic/test.r")
>>>>> testpath = SparkFiles.get('test.r')
>>>>> stdoutdata = subprocess.getoutput("Rscript " + testpath)
>>>>> print(stdoutdata)
>>>>>
>>>>> Cheers!
>>>>>
>>>>> On Tue, Aug 28, 2018 at 6:09 PM Jeff Zhang <zj...@gmail.com> wrote:
>>>>>
>>>>>> Do you run it under yarn-cluster mode ? Then you must ensure your
>>>>>> rscript shipped to that driver (via sc.addFile or setting livy.spark.files)
>>>>>>
>>>>>> And also you need to make sure you have R installed in all hosts of
>>>>>> yarn cluster because the driver may run any node of this cluster.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Lian Jiang <ji...@gmail.com>于2018年8月29日周三 上午1:35写道：
>>>>>>
>>>>>>> Thanks Lucas. We tried and got the same error. Below is the code:
>>>>>>>
>>>>>>> %livy2.pyspark
>>>>>>> import subprocess
>>>>>>> sc.addFile("hdfs:///user/zeppelin/test.r")
>>>>>>> stdoutdata = subprocess.getoutput("Rscript test.r")
>>>>>>> print(stdoutdata)
>>>>>>>
>>>>>>> Fatal error: cannot open file 'test.r': No such file or directory
>>>>>>>
>>>>>>>
>>>>>>> sc.addFile adds test.r to spark context. However, subprocess does
>>>>>>> not use spark context.
>>>>>>>
>>>>>>> Hdfs path does not work either: subprocess.getoutput("Rscript
>>>>>>> hdfs:///user/zeppelin/test.r")
>>>>>>>
>>>>>>> Any idea how to make python call R script? Appreciate!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 28, 2018 at 1:13 AM Partridge, Lucas (GE Aviation) <
>>>>>>> Lucas.Partridge@ge.com> wrote:
>>>>>>>
>>>>>>>> Have you tried SparkContext.addFile() (not addPyFile()) to add your
>>>>>>>> R script?
>>>>>>>>
>>>>>>>>
>>>>>>>> https://spark.apache.org/docs/2.2.0/api/python/pyspark.html#pyspark.SparkContext.addFile
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *From:* Lian Jiang <ji...@gmail.com>
>>>>>>>> *Sent:* 27 August 2018 22:42
>>>>>>>> *To:* users@zeppelin.apache.org
>>>>>>>> *Subject:* EXT: Python script calls R script in Zeppelin on Hadoop
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> We are using HDP3.0 (using zeppelin 0.8.0) and are migrating
>>>>>>>> Jupyter notebooks to Zeppelin. One issue we came across is that a python
>>>>>>>> script calling R script does not work in Zeppelin.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> %livy2.pyspark
>>>>>>>>
>>>>>>>> import os
>>>>>>>>
>>>>>>>> sc.addPyFile("hdfs:///user/zeppelin/my.py")
>>>>>>>>
>>>>>>>> import my
>>>>>>>>
>>>>>>>> my.test()
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> my.test() calls R script like: ['Rscript', 'myR.r']
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Fatal error: cannot open file 'myR.r': No such file or directory
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> When running this notebook in jupyter, both my.py and myR.r exist
>>>>>>>> in the same folder. I understand the story changes on hadoop because the
>>>>>>>> scripts run in containers.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> My question:
>>>>>>>>
>>>>>>>> Is this scenario supported in zeppelin? How to add a R script into
>>>>>>>> a python spark context so that the Python script can find the R script?
>>>>>>>> Appreciate!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>

Re: Python script calls R script in Zeppelin on Hadoop

Posted by Jeff Zhang <zj...@gmail.com>.

I am not sure what's wrong. maybe you can ssh to that machine and run this
r script manually first to verify what's wrong.



Lian Jiang <ji...@gmail.com>于2018年8月30日周四 上午10:34写道：

> Jeff,
>
> R is installed on namenode and all data nodes. The R packages have been
> copied to them all too. I am not sure if an R script launched by pyspark's subprocess
> can access spark context or not. If not, using addFiles to add R packages
> into spark context will not help test.r install the packages. Thanks for
> clue.
>
>
>
> On Wed, Aug 29, 2018 at 7:24 PM Jeff Zhang <zj...@gmail.com> wrote:
>
>>
>> You need to make sure the spark driver machine have this package
>> installed. And since you are using yarn-cluster mode via livy, you have to
>> install this packages on all nodes because the spark driver could be
>> launched in any node of this cluster.
>>
>>
>>
>> Lian Jiang <ji...@gmail.com>于2018年8月30日周四 上午1:46写道：
>>
>>> After calling a sample R script, we found another issue when running a
>>> real R script. This R script failed to load changepoint library.
>>>
>>> I tried:
>>>
>>> %livy2.sparkr
>>> install.packages("changepoint", repos="file:///mnt/data/tmp/r")
>>> library(changepoint) // I see "Successfully loaded changepoint package
>>> version 2.2.2"
>>>
>>> %livy2.pyspark
>>> from pyspark import SparkFiles
>>> import subprocess
>>>
>>> sc.addFile("hdfs:///user/zeppelin/test.r")
>>> testpath = SparkFiles.get('test.r')
>>> stdoutdata = subprocess.getoutput("Rscript " + testpath)
>>> print(stdoutdata)
>>>
>>> The error: Error in library(changepoint) : there is no package called
>>> ‘changepoint’
>>>
>>> test.r is simply:
>>>
>>> library(changepoint)
>>>
>>> Any idea how to make changepoint available for the R script? Thanks.
>>>
>>>
>>>
>>> On Tue, Aug 28, 2018 at 10:07 PM Lian Jiang <ji...@gmail.com>
>>> wrote:
>>>
>>>> Thanks Jeff.
>>>>
>>>> This worked:
>>>>
>>>> %livy2.pyspark
>>>> from pyspark import SparkFiles
>>>> import subprocess
>>>>
>>>> sc.addFile("hdfs:///user/zeppelin/ocic/test.r")
>>>> testpath = SparkFiles.get('test.r')
>>>> stdoutdata = subprocess.getoutput("Rscript " + testpath)
>>>> print(stdoutdata)
>>>>
>>>> Cheers!
>>>>
>>>> On Tue, Aug 28, 2018 at 6:09 PM Jeff Zhang <zj...@gmail.com> wrote:
>>>>
>>>>> Do you run it under yarn-cluster mode ? Then you must ensure your
>>>>> rscript shipped to that driver (via sc.addFile or setting livy.spark.files)
>>>>>
>>>>> And also you need to make sure you have R installed in all hosts of
>>>>> yarn cluster because the driver may run any node of this cluster.
>>>>>
>>>>>
>>>>>
>>>>> Lian Jiang <ji...@gmail.com>于2018年8月29日周三 上午1:35写道：
>>>>>
>>>>>> Thanks Lucas. We tried and got the same error. Below is the code:
>>>>>>
>>>>>> %livy2.pyspark
>>>>>> import subprocess
>>>>>> sc.addFile("hdfs:///user/zeppelin/test.r")
>>>>>> stdoutdata = subprocess.getoutput("Rscript test.r")
>>>>>> print(stdoutdata)
>>>>>>
>>>>>> Fatal error: cannot open file 'test.r': No such file or directory
>>>>>>
>>>>>>
>>>>>> sc.addFile adds test.r to spark context. However, subprocess does not
>>>>>> use spark context.
>>>>>>
>>>>>> Hdfs path does not work either: subprocess.getoutput("Rscript
>>>>>> hdfs:///user/zeppelin/test.r")
>>>>>>
>>>>>> Any idea how to make python call R script? Appreciate!
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 28, 2018 at 1:13 AM Partridge, Lucas (GE Aviation) <
>>>>>> Lucas.Partridge@ge.com> wrote:
>>>>>>
>>>>>>> Have you tried SparkContext.addFile() (not addPyFile()) to add your
>>>>>>> R script?
>>>>>>>
>>>>>>>
>>>>>>> https://spark.apache.org/docs/2.2.0/api/python/pyspark.html#pyspark.SparkContext.addFile
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *From:* Lian Jiang <ji...@gmail.com>
>>>>>>> *Sent:* 27 August 2018 22:42
>>>>>>> *To:* users@zeppelin.apache.org
>>>>>>> *Subject:* EXT: Python script calls R script in Zeppelin on Hadoop
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> We are using HDP3.0 (using zeppelin 0.8.0) and are migrating Jupyter
>>>>>>> notebooks to Zeppelin. One issue we came across is that a python script
>>>>>>> calling R script does not work in Zeppelin.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> %livy2.pyspark
>>>>>>>
>>>>>>> import os
>>>>>>>
>>>>>>> sc.addPyFile("hdfs:///user/zeppelin/my.py")
>>>>>>>
>>>>>>> import my
>>>>>>>
>>>>>>> my.test()
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> my.test() calls R script like: ['Rscript', 'myR.r']
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Fatal error: cannot open file 'myR.r': No such file or directory
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> When running this notebook in jupyter, both my.py and myR.r exist in
>>>>>>> the same folder. I understand the story changes on hadoop because the
>>>>>>> scripts run in containers.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> My question:
>>>>>>>
>>>>>>> Is this scenario supported in zeppelin? How to add a R script into a
>>>>>>> python spark context so that the Python script can find the R script?
>>>>>>> Appreciate!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>

Re: Python script calls R script in Zeppelin on Hadoop

Posted by Lian Jiang <ji...@gmail.com>.

Jeff,

R is installed on namenode and all data nodes. The R packages have been
copied to them all too. I am not sure if an R script launched by
pyspark's subprocess
can access spark context or not. If not, using addFiles to add R packages
into spark context will not help test.r install the packages. Thanks for
clue.



On Wed, Aug 29, 2018 at 7:24 PM Jeff Zhang <zj...@gmail.com> wrote:

>
> You need to make sure the spark driver machine have this package
> installed. And since you are using yarn-cluster mode via livy, you have to
> install this packages on all nodes because the spark driver could be
> launched in any node of this cluster.
>
>
>
> Lian Jiang <ji...@gmail.com>于2018年8月30日周四 上午1:46写道：
>
>> After calling a sample R script, we found another issue when running a
>> real R script. This R script failed to load changepoint library.
>>
>> I tried:
>>
>> %livy2.sparkr
>> install.packages("changepoint", repos="file:///mnt/data/tmp/r")
>> library(changepoint) // I see "Successfully loaded changepoint package
>> version 2.2.2"
>>
>> %livy2.pyspark
>> from pyspark import SparkFiles
>> import subprocess
>>
>> sc.addFile("hdfs:///user/zeppelin/test.r")
>> testpath = SparkFiles.get('test.r')
>> stdoutdata = subprocess.getoutput("Rscript " + testpath)
>> print(stdoutdata)
>>
>> The error: Error in library(changepoint) : there is no package called
>> ‘changepoint’
>>
>> test.r is simply:
>>
>> library(changepoint)
>>
>> Any idea how to make changepoint available for the R script? Thanks.
>>
>>
>>
>> On Tue, Aug 28, 2018 at 10:07 PM Lian Jiang <ji...@gmail.com>
>> wrote:
>>
>>> Thanks Jeff.
>>>
>>> This worked:
>>>
>>> %livy2.pyspark
>>> from pyspark import SparkFiles
>>> import subprocess
>>>
>>> sc.addFile("hdfs:///user/zeppelin/ocic/test.r")
>>> testpath = SparkFiles.get('test.r')
>>> stdoutdata = subprocess.getoutput("Rscript " + testpath)
>>> print(stdoutdata)
>>>
>>> Cheers!
>>>
>>> On Tue, Aug 28, 2018 at 6:09 PM Jeff Zhang <zj...@gmail.com> wrote:
>>>
>>>> Do you run it under yarn-cluster mode ? Then you must ensure your
>>>> rscript shipped to that driver (via sc.addFile or setting livy.spark.files)
>>>>
>>>> And also you need to make sure you have R installed in all hosts of
>>>> yarn cluster because the driver may run any node of this cluster.
>>>>
>>>>
>>>>
>>>> Lian Jiang <ji...@gmail.com>于2018年8月29日周三 上午1:35写道：
>>>>
>>>>> Thanks Lucas. We tried and got the same error. Below is the code:
>>>>>
>>>>> %livy2.pyspark
>>>>> import subprocess
>>>>> sc.addFile("hdfs:///user/zeppelin/test.r")
>>>>> stdoutdata = subprocess.getoutput("Rscript test.r")
>>>>> print(stdoutdata)
>>>>>
>>>>> Fatal error: cannot open file 'test.r': No such file or directory
>>>>>
>>>>>
>>>>> sc.addFile adds test.r to spark context. However, subprocess does not
>>>>> use spark context.
>>>>>
>>>>> Hdfs path does not work either: subprocess.getoutput("Rscript
>>>>> hdfs:///user/zeppelin/test.r")
>>>>>
>>>>> Any idea how to make python call R script? Appreciate!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Aug 28, 2018 at 1:13 AM Partridge, Lucas (GE Aviation) <
>>>>> Lucas.Partridge@ge.com> wrote:
>>>>>
>>>>>> Have you tried SparkContext.addFile() (not addPyFile()) to add your R
>>>>>> script?
>>>>>>
>>>>>>
>>>>>> https://spark.apache.org/docs/2.2.0/api/python/pyspark.html#pyspark.SparkContext.addFile
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* Lian Jiang <ji...@gmail.com>
>>>>>> *Sent:* 27 August 2018 22:42
>>>>>> *To:* users@zeppelin.apache.org
>>>>>> *Subject:* EXT: Python script calls R script in Zeppelin on Hadoop
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>>
>>>>>> We are using HDP3.0 (using zeppelin 0.8.0) and are migrating Jupyter
>>>>>> notebooks to Zeppelin. One issue we came across is that a python script
>>>>>> calling R script does not work in Zeppelin.
>>>>>>
>>>>>>
>>>>>>
>>>>>> %livy2.pyspark
>>>>>>
>>>>>> import os
>>>>>>
>>>>>> sc.addPyFile("hdfs:///user/zeppelin/my.py")
>>>>>>
>>>>>> import my
>>>>>>
>>>>>> my.test()
>>>>>>
>>>>>>
>>>>>>
>>>>>> my.test() calls R script like: ['Rscript', 'myR.r']
>>>>>>
>>>>>>
>>>>>>
>>>>>> Fatal error: cannot open file 'myR.r': No such file or directory
>>>>>>
>>>>>>
>>>>>>
>>>>>> When running this notebook in jupyter, both my.py and myR.r exist in
>>>>>> the same folder. I understand the story changes on hadoop because the
>>>>>> scripts run in containers.
>>>>>>
>>>>>>
>>>>>>
>>>>>> My question:
>>>>>>
>>>>>> Is this scenario supported in zeppelin? How to add a R script into a
>>>>>> python spark context so that the Python script can find the R script?
>>>>>> Appreciate!
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>

Re: Python script calls R script in Zeppelin on Hadoop

Posted by Jeff Zhang <zj...@gmail.com>.

You need to make sure the spark driver machine have this package installed.
And since you are using yarn-cluster mode via livy, you have to install
this packages on all nodes because the spark driver could be launched in
any node of this cluster.



Lian Jiang <ji...@gmail.com>于2018年8月30日周四 上午1:46写道：

> After calling a sample R script, we found another issue when running a
> real R script. This R script failed to load changepoint library.
>
> I tried:
>
> %livy2.sparkr
> install.packages("changepoint", repos="file:///mnt/data/tmp/r")
> library(changepoint) // I see "Successfully loaded changepoint package
> version 2.2.2"
>
> %livy2.pyspark
> from pyspark import SparkFiles
> import subprocess
>
> sc.addFile("hdfs:///user/zeppelin/test.r")
> testpath = SparkFiles.get('test.r')
> stdoutdata = subprocess.getoutput("Rscript " + testpath)
> print(stdoutdata)
>
> The error: Error in library(changepoint) : there is no package called
> ‘changepoint’
>
> test.r is simply:
>
> library(changepoint)
>
> Any idea how to make changepoint available for the R script? Thanks.
>
>
>
> On Tue, Aug 28, 2018 at 10:07 PM Lian Jiang <ji...@gmail.com> wrote:
>
>> Thanks Jeff.
>>
>> This worked:
>>
>> %livy2.pyspark
>> from pyspark import SparkFiles
>> import subprocess
>>
>> sc.addFile("hdfs:///user/zeppelin/ocic/test.r")
>> testpath = SparkFiles.get('test.r')
>> stdoutdata = subprocess.getoutput("Rscript " + testpath)
>> print(stdoutdata)
>>
>> Cheers!
>>
>> On Tue, Aug 28, 2018 at 6:09 PM Jeff Zhang <zj...@gmail.com> wrote:
>>
>>> Do you run it under yarn-cluster mode ? Then you must ensure your
>>> rscript shipped to that driver (via sc.addFile or setting livy.spark.files)
>>>
>>> And also you need to make sure you have R installed in all hosts of yarn
>>> cluster because the driver may run any node of this cluster.
>>>
>>>
>>>
>>> Lian Jiang <ji...@gmail.com>于2018年8月29日周三 上午1:35写道：
>>>
>>>> Thanks Lucas. We tried and got the same error. Below is the code:
>>>>
>>>> %livy2.pyspark
>>>> import subprocess
>>>> sc.addFile("hdfs:///user/zeppelin/test.r")
>>>> stdoutdata = subprocess.getoutput("Rscript test.r")
>>>> print(stdoutdata)
>>>>
>>>> Fatal error: cannot open file 'test.r': No such file or directory
>>>>
>>>>
>>>> sc.addFile adds test.r to spark context. However, subprocess does not
>>>> use spark context.
>>>>
>>>> Hdfs path does not work either: subprocess.getoutput("Rscript
>>>> hdfs:///user/zeppelin/test.r")
>>>>
>>>> Any idea how to make python call R script? Appreciate!
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Aug 28, 2018 at 1:13 AM Partridge, Lucas (GE Aviation) <
>>>> Lucas.Partridge@ge.com> wrote:
>>>>
>>>>> Have you tried SparkContext.addFile() (not addPyFile()) to add your R
>>>>> script?
>>>>>
>>>>>
>>>>> https://spark.apache.org/docs/2.2.0/api/python/pyspark.html#pyspark.SparkContext.addFile
>>>>>
>>>>>
>>>>>
>>>>> *From:* Lian Jiang <ji...@gmail.com>
>>>>> *Sent:* 27 August 2018 22:42
>>>>> *To:* users@zeppelin.apache.org
>>>>> *Subject:* EXT: Python script calls R script in Zeppelin on Hadoop
>>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> We are using HDP3.0 (using zeppelin 0.8.0) and are migrating Jupyter
>>>>> notebooks to Zeppelin. One issue we came across is that a python script
>>>>> calling R script does not work in Zeppelin.
>>>>>
>>>>>
>>>>>
>>>>> %livy2.pyspark
>>>>>
>>>>> import os
>>>>>
>>>>> sc.addPyFile("hdfs:///user/zeppelin/my.py")
>>>>>
>>>>> import my
>>>>>
>>>>> my.test()
>>>>>
>>>>>
>>>>>
>>>>> my.test() calls R script like: ['Rscript', 'myR.r']
>>>>>
>>>>>
>>>>>
>>>>> Fatal error: cannot open file 'myR.r': No such file or directory
>>>>>
>>>>>
>>>>>
>>>>> When running this notebook in jupyter, both my.py and myR.r exist in
>>>>> the same folder. I understand the story changes on hadoop because the
>>>>> scripts run in containers.
>>>>>
>>>>>
>>>>>
>>>>> My question:
>>>>>
>>>>> Is this scenario supported in zeppelin? How to add a R script into a
>>>>> python spark context so that the Python script can find the R script?
>>>>> Appreciate!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>

Re: Python script calls R script in Zeppelin on Hadoop

Posted by Lian Jiang <ji...@gmail.com>.

After calling a sample R script, we found another issue when running a real
R script. This R script failed to load changepoint library.

I tried:

%livy2.sparkr
install.packages("changepoint", repos="file:///mnt/data/tmp/r")
library(changepoint) // I see "Successfully loaded changepoint package
version 2.2.2"

%livy2.pyspark
from pyspark import SparkFiles
import subprocess

sc.addFile("hdfs:///user/zeppelin/test.r")
testpath = SparkFiles.get('test.r')
stdoutdata = subprocess.getoutput("Rscript " + testpath)
print(stdoutdata)

The error: Error in library(changepoint) : there is no package called
‘changepoint’

test.r is simply:

library(changepoint)

Any idea how to make changepoint available for the R script? Thanks.



On Tue, Aug 28, 2018 at 10:07 PM Lian Jiang <ji...@gmail.com> wrote:

> Thanks Jeff.
>
> This worked:
>
> %livy2.pyspark
> from pyspark import SparkFiles
> import subprocess
>
> sc.addFile("hdfs:///user/zeppelin/ocic/test.r")
> testpath = SparkFiles.get('test.r')
> stdoutdata = subprocess.getoutput("Rscript " + testpath)
> print(stdoutdata)
>
> Cheers!
>
> On Tue, Aug 28, 2018 at 6:09 PM Jeff Zhang <zj...@gmail.com> wrote:
>
>> Do you run it under yarn-cluster mode ? Then you must ensure your rscript
>> shipped to that driver (via sc.addFile or setting livy.spark.files)
>>
>> And also you need to make sure you have R installed in all hosts of yarn
>> cluster because the driver may run any node of this cluster.
>>
>>
>>
>> Lian Jiang <ji...@gmail.com>于2018年8月29日周三 上午1:35写道：
>>
>>> Thanks Lucas. We tried and got the same error. Below is the code:
>>>
>>> %livy2.pyspark
>>> import subprocess
>>> sc.addFile("hdfs:///user/zeppelin/test.r")
>>> stdoutdata = subprocess.getoutput("Rscript test.r")
>>> print(stdoutdata)
>>>
>>> Fatal error: cannot open file 'test.r': No such file or directory
>>>
>>>
>>> sc.addFile adds test.r to spark context. However, subprocess does not
>>> use spark context.
>>>
>>> Hdfs path does not work either: subprocess.getoutput("Rscript
>>> hdfs:///user/zeppelin/test.r")
>>>
>>> Any idea how to make python call R script? Appreciate!
>>>
>>>
>>>
>>>
>>> On Tue, Aug 28, 2018 at 1:13 AM Partridge, Lucas (GE Aviation) <
>>> Lucas.Partridge@ge.com> wrote:
>>>
>>>> Have you tried SparkContext.addFile() (not addPyFile()) to add your R
>>>> script?
>>>>
>>>>
>>>> https://spark.apache.org/docs/2.2.0/api/python/pyspark.html#pyspark.SparkContext.addFile
>>>>
>>>>
>>>>
>>>> *From:* Lian Jiang <ji...@gmail.com>
>>>> *Sent:* 27 August 2018 22:42
>>>> *To:* users@zeppelin.apache.org
>>>> *Subject:* EXT: Python script calls R script in Zeppelin on Hadoop
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> We are using HDP3.0 (using zeppelin 0.8.0) and are migrating Jupyter
>>>> notebooks to Zeppelin. One issue we came across is that a python script
>>>> calling R script does not work in Zeppelin.
>>>>
>>>>
>>>>
>>>> %livy2.pyspark
>>>>
>>>> import os
>>>>
>>>> sc.addPyFile("hdfs:///user/zeppelin/my.py")
>>>>
>>>> import my
>>>>
>>>> my.test()
>>>>
>>>>
>>>>
>>>> my.test() calls R script like: ['Rscript', 'myR.r']
>>>>
>>>>
>>>>
>>>> Fatal error: cannot open file 'myR.r': No such file or directory
>>>>
>>>>
>>>>
>>>> When running this notebook in jupyter, both my.py and myR.r exist in
>>>> the same folder. I understand the story changes on hadoop because the
>>>> scripts run in containers.
>>>>
>>>>
>>>>
>>>> My question:
>>>>
>>>> Is this scenario supported in zeppelin? How to add a R script into a
>>>> python spark context so that the Python script can find the R script?
>>>> Appreciate!
>>>>
>>>>
>>>>
>>>>
>>>>
>>>

Re: Python script calls R script in Zeppelin on Hadoop

Posted by Lian Jiang <ji...@gmail.com>.

Thanks Jeff.

This worked:

%livy2.pyspark
from pyspark import SparkFiles
import subprocess

sc.addFile("hdfs:///user/zeppelin/ocic/test.r")
testpath = SparkFiles.get('test.r')
stdoutdata = subprocess.getoutput("Rscript " + testpath)
print(stdoutdata)

Cheers!

On Tue, Aug 28, 2018 at 6:09 PM Jeff Zhang <zj...@gmail.com> wrote:

> Do you run it under yarn-cluster mode ? Then you must ensure your rscript
> shipped to that driver (via sc.addFile or setting livy.spark.files)
>
> And also you need to make sure you have R installed in all hosts of yarn
> cluster because the driver may run any node of this cluster.
>
>
>
> Lian Jiang <ji...@gmail.com>于2018年8月29日周三 上午1:35写道：
>
>> Thanks Lucas. We tried and got the same error. Below is the code:
>>
>> %livy2.pyspark
>> import subprocess
>> sc.addFile("hdfs:///user/zeppelin/test.r")
>> stdoutdata = subprocess.getoutput("Rscript test.r")
>> print(stdoutdata)
>>
>> Fatal error: cannot open file 'test.r': No such file or directory
>>
>>
>> sc.addFile adds test.r to spark context. However, subprocess does not use
>> spark context.
>>
>> Hdfs path does not work either: subprocess.getoutput("Rscript
>> hdfs:///user/zeppelin/test.r")
>>
>> Any idea how to make python call R script? Appreciate!
>>
>>
>>
>>
>> On Tue, Aug 28, 2018 at 1:13 AM Partridge, Lucas (GE Aviation) <
>> Lucas.Partridge@ge.com> wrote:
>>
>>> Have you tried SparkContext.addFile() (not addPyFile()) to add your R
>>> script?
>>>
>>>
>>> https://spark.apache.org/docs/2.2.0/api/python/pyspark.html#pyspark.SparkContext.addFile
>>>
>>>
>>>
>>> *From:* Lian Jiang <ji...@gmail.com>
>>> *Sent:* 27 August 2018 22:42
>>> *To:* users@zeppelin.apache.org
>>> *Subject:* EXT: Python script calls R script in Zeppelin on Hadoop
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> We are using HDP3.0 (using zeppelin 0.8.0) and are migrating Jupyter
>>> notebooks to Zeppelin. One issue we came across is that a python script
>>> calling R script does not work in Zeppelin.
>>>
>>>
>>>
>>> %livy2.pyspark
>>>
>>> import os
>>>
>>> sc.addPyFile("hdfs:///user/zeppelin/my.py")
>>>
>>> import my
>>>
>>> my.test()
>>>
>>>
>>>
>>> my.test() calls R script like: ['Rscript', 'myR.r']
>>>
>>>
>>>
>>> Fatal error: cannot open file 'myR.r': No such file or directory
>>>
>>>
>>>
>>> When running this notebook in jupyter, both my.py and myR.r exist in the
>>> same folder. I understand the story changes on hadoop because the scripts
>>> run in containers.
>>>
>>>
>>>
>>> My question:
>>>
>>> Is this scenario supported in zeppelin? How to add a R script into a
>>> python spark context so that the Python script can find the R script?
>>> Appreciate!
>>>
>>>
>>>
>>>
>>>
>>

Re: Python script calls R script in Zeppelin on Hadoop

Posted by Jeff Zhang <zj...@gmail.com>.

Do you run it under yarn-cluster mode ? Then you must ensure your rscript
shipped to that driver (via sc.addFile or setting livy.spark.files)

And also you need to make sure you have R installed in all hosts of yarn
cluster because the driver may run any node of this cluster.



Lian Jiang <ji...@gmail.com>于2018年8月29日周三 上午1:35写道：

> Thanks Lucas. We tried and got the same error. Below is the code:
>
> %livy2.pyspark
> import subprocess
> sc.addFile("hdfs:///user/zeppelin/test.r")
> stdoutdata = subprocess.getoutput("Rscript test.r")
> print(stdoutdata)
>
> Fatal error: cannot open file 'test.r': No such file or directory
>
>
> sc.addFile adds test.r to spark context. However, subprocess does not use
> spark context.
>
> Hdfs path does not work either: subprocess.getoutput("Rscript
> hdfs:///user/zeppelin/test.r")
>
> Any idea how to make python call R script? Appreciate!
>
>
>
>
> On Tue, Aug 28, 2018 at 1:13 AM Partridge, Lucas (GE Aviation) <
> Lucas.Partridge@ge.com> wrote:
>
>> Have you tried SparkContext.addFile() (not addPyFile()) to add your R
>> script?
>>
>>
>> https://spark.apache.org/docs/2.2.0/api/python/pyspark.html#pyspark.SparkContext.addFile
>>
>>
>>
>> *From:* Lian Jiang <ji...@gmail.com>
>> *Sent:* 27 August 2018 22:42
>> *To:* users@zeppelin.apache.org
>> *Subject:* EXT: Python script calls R script in Zeppelin on Hadoop
>>
>>
>>
>> Hi,
>>
>>
>>
>> We are using HDP3.0 (using zeppelin 0.8.0) and are migrating Jupyter
>> notebooks to Zeppelin. One issue we came across is that a python script
>> calling R script does not work in Zeppelin.
>>
>>
>>
>> %livy2.pyspark
>>
>> import os
>>
>> sc.addPyFile("hdfs:///user/zeppelin/my.py")
>>
>> import my
>>
>> my.test()
>>
>>
>>
>> my.test() calls R script like: ['Rscript', 'myR.r']
>>
>>
>>
>> Fatal error: cannot open file 'myR.r': No such file or directory
>>
>>
>>
>> When running this notebook in jupyter, both my.py and myR.r exist in the
>> same folder. I understand the story changes on hadoop because the scripts
>> run in containers.
>>
>>
>>
>> My question:
>>
>> Is this scenario supported in zeppelin? How to add a R script into a
>> python spark context so that the Python script can find the R script?
>> Appreciate!
>>
>>
>>
>>
>>
>

Re: Python script calls R script in Zeppelin on Hadoop

Posted by Lian Jiang <ji...@gmail.com>.

Thanks Lucas. We tried and got the same error. Below is the code:

%livy2.pyspark
import subprocess
sc.addFile("hdfs:///user/zeppelin/test.r")
stdoutdata = subprocess.getoutput("Rscript test.r")
print(stdoutdata)

Fatal error: cannot open file 'test.r': No such file or directory


sc.addFile adds test.r to spark context. However, subprocess does not use
spark context.

Hdfs path does not work either: subprocess.getoutput("Rscript
hdfs:///user/zeppelin/test.r")

Any idea how to make python call R script? Appreciate!




On Tue, Aug 28, 2018 at 1:13 AM Partridge, Lucas (GE Aviation) <
Lucas.Partridge@ge.com> wrote:

> Have you tried SparkContext.addFile() (not addPyFile()) to add your R
> script?
>
>
> https://spark.apache.org/docs/2.2.0/api/python/pyspark.html#pyspark.SparkContext.addFile
>
>
>
> *From:* Lian Jiang <ji...@gmail.com>
> *Sent:* 27 August 2018 22:42
> *To:* users@zeppelin.apache.org
> *Subject:* EXT: Python script calls R script in Zeppelin on Hadoop
>
>
>
> Hi,
>
>
>
> We are using HDP3.0 (using zeppelin 0.8.0) and are migrating Jupyter
> notebooks to Zeppelin. One issue we came across is that a python script
> calling R script does not work in Zeppelin.
>
>
>
> %livy2.pyspark
>
> import os
>
> sc.addPyFile("hdfs:///user/zeppelin/my.py")
>
> import my
>
> my.test()
>
>
>
> my.test() calls R script like: ['Rscript', 'myR.r']
>
>
>
> Fatal error: cannot open file 'myR.r': No such file or directory
>
>
>
> When running this notebook in jupyter, both my.py and myR.r exist in the
> same folder. I understand the story changes on hadoop because the scripts
> run in containers.
>
>
>
> My question:
>
> Is this scenario supported in zeppelin? How to add a R script into a
> python spark context so that the Python script can find the R script?
> Appreciate!
>
>
>
>
>

RE: Python script calls R script in Zeppelin on Hadoop

Posted by "Partridge, Lucas (GE Aviation)" <Lu...@ge.com>.

Have you tried SparkContext.addFile() (not addPyFile()) to add your R script?
https://spark.apache.org/docs/2.2.0/api/python/pyspark.html#pyspark.SparkContext.addFile

From: Lian Jiang <ji...@gmail.com>
Sent: 27 August 2018 22:42
To: users@zeppelin.apache.org
Subject: EXT: Python script calls R script in Zeppelin on Hadoop

Hi,

We are using HDP3.0 (using zeppelin 0.8.0) and are migrating Jupyter notebooks to Zeppelin. One issue we came across is that a python script calling R script does not work in Zeppelin.

%livy2.pyspark
import os
sc.addPyFile("hdfs:///user/zeppelin/my.py")
import my
my.test()

my.test() calls R script like: ['Rscript', 'myR.r']

Fatal error: cannot open file 'myR.r': No such file or directory

When running this notebook in jupyter, both my.py and myR.r exist in the same folder. I understand the story changes on hadoop because the scripts run in containers.

My question:
Is this scenario supported in zeppelin? How to add a R script into a python spark context so that the Python script can find the R script? Appreciate!