You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Sourav Mazumder <so...@gmail.com> on 2015/07/01 19:03:33 UTC

Re: sparkR could not find function "textFile"

Hi,

Piggybacking on this discussion.

I'm trying to achieve the same, reading a csv file, from RStudio. Where I'm
stuck is how to supply some additional package from RStudio to spark.init()
as sparkR.init does() not provide an option to specify additional package.

I tried following codefrom RStudio. It is giving me error "Error in
callJMethod(sqlContext, "load", source, options) :
  Invalid jobj 1. If SparkR was restarted, Spark operations need to be
re-executed."

------
Sys.setenv(SPARK_HOME="C:\\spark-1.4.0-bin-hadoop2.6")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"),.libPaths()))
library(SparkR)

sparkR.stop()

sc <- sparkR.init(master="local[2]", sparkEnvir =
list(spark.executor.memory="1G"),
sparkJars="C:\\spark-1.4.0-bin-hadoop2.6\\lib\\spark-csv_2.11-1.1.0.jar")
/* I have downloaded this spark-csv jar and kept it in lib folder of Spark
*/

sqlContext <- sparkRSQL.init(sc)

plutoMN <- read.df(sqlContext,
"C:\\Users\\Sourav\\Work\\SparkDataScience\\PlutoMN.csv", source =
"com.databricks.spark.csv").

------

However, I also tried this from shell as 'sparkR --package
com.databricks:spark-csv_2.11:1.1.0". This time I used the following code
and it works all fine.

sqlContext <- sparkRSQL.init(sc)

plutoMN <- read.df(sqlContext,
"C:\\Users\\Sourav\\Work\\SparkDataScience\\PlutoMN.csv", source =
"com.databricks.spark.csv").

Any idea how to achieve the same from RStudio ?

Regards,




On Thu, Jun 25, 2015 at 2:38 PM, Wei Zhou <zh...@gmail.com> wrote:

> I tried out the solution using spark-csv package, and it worked fine now
> :) Thanks. Yes, I'm playing with a file with all columns as String, but the
> real data I want to process are all doubles. I'm just exploring what sparkR
> can do versus regular scala spark, as I am by heart a R person.
>
> 2015-06-25 14:26 GMT-07:00 Eskilson,Aleksander <Al...@cerner.com>:
>
>>  Sure, I had a similar question that Shivaram was able fast for me, the
>> solution is implemented using a separate DataBrick’s library. Check out
>> this thread from the email archives [1], and the read.df() command [2]. CSV
>> files can be a bit tricky, especially with inferring their schemas. Are you
>> using just strings as your column types right now?
>>
>>  Alek
>>
>>  [1] --
>> http://apache-spark-developers-list.1001551.n3.nabble.com/CSV-Support-in-SparkR-td12559.html
>> [2] -- https://spark.apache.org/docs/latest/api/R/read.df.html
>>
>>   From: Wei Zhou <zh...@gmail.com>
>> Date: Thursday, June 25, 2015 at 4:15 PM
>> To: "shivaram@eecs.berkeley.edu" <sh...@eecs.berkeley.edu>
>> Cc: Aleksander Eskilson <Al...@cerner.com>, "
>> user@spark.apache.org" <us...@spark.apache.org>
>> Subject: Re: sparkR could not find function "textFile"
>>
>>   Thanks to both Shivaram and Alek. Then if I want to create DataFrame
>> from comma separated flat files, what would you recommend me to do? One way
>> I can think of is first reading the data as you would do in r, using
>> read.table(), and then create spark DataFrame out of that R dataframe, but
>> it is obviously not scalable.
>>
>>
>> 2015-06-25 13:59 GMT-07:00 Shivaram Venkataraman <
>> shivaram@eecs.berkeley.edu>:
>>
>>> The `head` function is not supported for the RRDD that is returned by
>>> `textFile`. You can run `take(lines, 5L)`. I should add a warning here that
>>> the RDD API in SparkR is private because we might not support it in the
>>> upcoming releases. So if you can use the DataFrame API for your application
>>> you should try that out.
>>>
>>>  Thanks
>>>  Shivaram
>>>
>>> On Thu, Jun 25, 2015 at 1:49 PM, Wei Zhou <zh...@gmail.com> wrote:
>>>
>>>> Hi Alek,
>>>>
>>>>  Just a follow up question. This is what I did in sparkR shell:
>>>>
>>>>  lines <- SparkR:::textFile(sc, "./README.md")
>>>>  head(lines)
>>>>
>>>>  And I am getting error:
>>>>
>>>> "Error in x[seq_len(n)] : object of type 'S4' is not subsettable"
>>>>
>>>> I'm wondering what did I do wrong. Thanks in advance.
>>>>
>>>> Wei
>>>>
>>>> 2015-06-25 13:44 GMT-07:00 Wei Zhou <zh...@gmail.com>:
>>>>
>>>>> Hi Alek,
>>>>>
>>>>>  Thanks for the explanation, it is very helpful.
>>>>>
>>>>>  Cheers,
>>>>> Wei
>>>>>
>>>>> 2015-06-25 13:40 GMT-07:00 Eskilson,Aleksander <
>>>>> Alek.Eskilson@cerner.com>:
>>>>>
>>>>>>  Hi there,
>>>>>>
>>>>>>  The tutorial you’re reading there was written before the merge of
>>>>>> SparkR for Spark 1.4.0
>>>>>> For the merge, the RDD API (which includes the textFile() function)
>>>>>> was made private, as the devs felt many of its functions were too low
>>>>>> level. They focused instead on finishing the DataFrame API which supports
>>>>>> local, HDFS, and Hive/HBase file reads. In the meantime, the devs are
>>>>>> trying to determine which functions of the RDD API, if any, should be made
>>>>>> public again. You can see the rationale behind this decision on the issue’s
>>>>>> JIRA [1].
>>>>>>
>>>>>>  You can still make use of those now private RDD functions by
>>>>>> prepending the function call with the SparkR private namespace, for
>>>>>> example, you’d use
>>>>>> SparkR:::textFile(…).
>>>>>>
>>>>>>  Hope that helps,
>>>>>> Alek
>>>>>>
>>>>>>  [1] -- https://issues.apache.org/jira/browse/SPARK-7230
>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D7230&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=x60a-3ztBe4XOw2bOnEI9-Mc6mENXT8PVxYvsmTLVG8&s=HpX1Cpayu5Mwu9JVt2znimJyUwtV3vcPurUO9ZJhASo&e=>
>>>>>>
>>>>>>   From: Wei Zhou <zh...@gmail.com>
>>>>>> Date: Thursday, June 25, 2015 at 3:33 PM
>>>>>> To: "user@spark.apache.org" <us...@spark.apache.org>
>>>>>> Subject: sparkR could not find function "textFile"
>>>>>>
>>>>>>   Hi all,
>>>>>>
>>>>>>  I am exploring sparkR by activating the shell and following the
>>>>>> tutorial here https://amplab-extras.github.io/SparkR-pkg/
>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__amplab-2Dextras.github.io_SparkR-2Dpkg_&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=aL4A2Pv9tHbhgJUX-EnuYx2HntTnrqVpegm6Ag-FwnQ&s=qfOET1UvP0ECAKgnTJw8G13sFTi_PhiJ8Q89fMSgH_Q&e=>
>>>>>>
>>>>>>  And when I tried to read in a local file with textFile(sc,
>>>>>> "file_location"), it gives an error could not find function "textFile".
>>>>>>
>>>>>>  By reading through sparkR doc for 1.4, it seems that we need
>>>>>> sqlContext to import data, for example.
>>>>>>
>>>>>> people <- read.df(sqlContext, "./examples/src/main/resources/people.json", "json"
>>>>>>
>>>>>> )
>>>>>> And we need to specify the file type.
>>>>>>
>>>>>>  My question is does sparkR stop supporting general type file
>>>>>> importing? If not, would appreciate any help on how to do this.
>>>>>>
>>>>>>  PS, I am trying to recreate the word count example in sparkR, and
>>>>>> want to import README.md file, or just any file into sparkR.
>>>>>>
>>>>>>  Thanks in advance.
>>>>>>
>>>>>>  Best,
>>>>>> Wei
>>>>>>
>>>>>>     CONFIDENTIALITY NOTICE This message and any included attachments
>>>>>> are from Cerner Corporation and are intended only for the addressee. The
>>>>>> information contained in this message is confidential and may constitute
>>>>>> inside or non-public information under international, federal, or state
>>>>>> securities laws. Unauthorized forwarding, printing, copying, distribution,
>>>>>> or use of such information is strictly prohibited and may be unlawful. If
>>>>>> you are not the addressee, please promptly delete this message and notify
>>>>>> the sender of the delivery error by e-mail or you may call Cerner's
>>>>>> corporate offices in Kansas City, Missouri, U.S.A at (+1)
>>>>>> (816)221-1024.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: sparkR could not find function "textFile"

Posted by Sourav Mazumder <so...@gmail.com>.

Thanks Shivram. Your suggestion in stack overflow regarding this did work.

Thanks again.

Regards,
Sourav

On Wed, Jul 1, 2015 at 10:21 AM, Shivaram Venkataraman <
shivaram@eecs.berkeley.edu> wrote:

> You can check my comment below the answer at
> http://stackoverflow.com/a/30959388/4577954. BTW we added a new option to
> sparkR.init to pass in packages and that should be a part of 1.5
>
> Shivaram
>
> On Wed, Jul 1, 2015 at 10:03 AM, Sourav Mazumder <
> sourav.mazumder00@gmail.com> wrote:
>
>> Hi,
>>
>> Piggybacking on this discussion.
>>
>> I'm trying to achieve the same, reading a csv file, from RStudio. Where
>> I'm stuck is how to supply some additional package from RStudio to
>> spark.init() as sparkR.init does() not provide an option to specify
>> additional package.
>>
>> I tried following codefrom RStudio. It is giving me error "Error in
>> callJMethod(sqlContext, "load", source, options) :
>>   Invalid jobj 1. If SparkR was restarted, Spark operations need to be
>> re-executed."
>>
>> ------
>> Sys.setenv(SPARK_HOME="C:\\spark-1.4.0-bin-hadoop2.6")
>> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"),.libPaths()))
>> library(SparkR)
>>
>> sparkR.stop()
>>
>> sc <- sparkR.init(master="local[2]", sparkEnvir =
>> list(spark.executor.memory="1G"),
>> sparkJars="C:\\spark-1.4.0-bin-hadoop2.6\\lib\\spark-csv_2.11-1.1.0.jar")
>> /* I have downloaded this spark-csv jar and kept it in lib folder of Spark
>> */
>>
>> sqlContext <- sparkRSQL.init(sc)
>>
>> plutoMN <- read.df(sqlContext,
>> "C:\\Users\\Sourav\\Work\\SparkDataScience\\PlutoMN.csv", source =
>> "com.databricks.spark.csv").
>>
>> ------
>>
>> However, I also tried this from shell as 'sparkR --package
>> com.databricks:spark-csv_2.11:1.1.0". This time I used the following code
>> and it works all fine.
>>
>> sqlContext <- sparkRSQL.init(sc)
>>
>> plutoMN <- read.df(sqlContext,
>> "C:\\Users\\Sourav\\Work\\SparkDataScience\\PlutoMN.csv", source =
>> "com.databricks.spark.csv").
>>
>> Any idea how to achieve the same from RStudio ?
>>
>> Regards,
>>
>>
>>
>>
>> On Thu, Jun 25, 2015 at 2:38 PM, Wei Zhou <zh...@gmail.com> wrote:
>>
>>> I tried out the solution using spark-csv package, and it worked fine now
>>> :) Thanks. Yes, I'm playing with a file with all columns as String, but the
>>> real data I want to process are all doubles. I'm just exploring what sparkR
>>> can do versus regular scala spark, as I am by heart a R person.
>>>
>>> 2015-06-25 14:26 GMT-07:00 Eskilson,Aleksander <Alek.Eskilson@cerner.com
>>> >:
>>>
>>>>  Sure, I had a similar question that Shivaram was able fast for me,
>>>> the solution is implemented using a separate DataBrick’s library. Check out
>>>> this thread from the email archives [1], and the read.df() command [2]. CSV
>>>> files can be a bit tricky, especially with inferring their schemas. Are you
>>>> using just strings as your column types right now?
>>>>
>>>>  Alek
>>>>
>>>>  [1] --
>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/CSV-Support-in-SparkR-td12559.html
>>>> [2] -- https://spark.apache.org/docs/latest/api/R/read.df.html
>>>>
>>>>   From: Wei Zhou <zh...@gmail.com>
>>>> Date: Thursday, June 25, 2015 at 4:15 PM
>>>> To: "shivaram@eecs.berkeley.edu" <sh...@eecs.berkeley.edu>
>>>> Cc: Aleksander Eskilson <Al...@cerner.com>, "
>>>> user@spark.apache.org" <us...@spark.apache.org>
>>>> Subject: Re: sparkR could not find function "textFile"
>>>>
>>>>   Thanks to both Shivaram and Alek. Then if I want to create DataFrame
>>>> from comma separated flat files, what would you recommend me to do? One way
>>>> I can think of is first reading the data as you would do in r, using
>>>> read.table(), and then create spark DataFrame out of that R dataframe, but
>>>> it is obviously not scalable.
>>>>
>>>>
>>>> 2015-06-25 13:59 GMT-07:00 Shivaram Venkataraman <
>>>> shivaram@eecs.berkeley.edu>:
>>>>
>>>>> The `head` function is not supported for the RRDD that is returned by
>>>>> `textFile`. You can run `take(lines, 5L)`. I should add a warning here that
>>>>> the RDD API in SparkR is private because we might not support it in the
>>>>> upcoming releases. So if you can use the DataFrame API for your application
>>>>> you should try that out.
>>>>>
>>>>>  Thanks
>>>>>  Shivaram
>>>>>
>>>>> On Thu, Jun 25, 2015 at 1:49 PM, Wei Zhou <zh...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Alek,
>>>>>>
>>>>>>  Just a follow up question. This is what I did in sparkR shell:
>>>>>>
>>>>>>  lines <- SparkR:::textFile(sc, "./README.md")
>>>>>>  head(lines)
>>>>>>
>>>>>>  And I am getting error:
>>>>>>
>>>>>> "Error in x[seq_len(n)] : object of type 'S4' is not subsettable"
>>>>>>
>>>>>> I'm wondering what did I do wrong. Thanks in advance.
>>>>>>
>>>>>> Wei
>>>>>>
>>>>>> 2015-06-25 13:44 GMT-07:00 Wei Zhou <zh...@gmail.com>:
>>>>>>
>>>>>>> Hi Alek,
>>>>>>>
>>>>>>>  Thanks for the explanation, it is very helpful.
>>>>>>>
>>>>>>>  Cheers,
>>>>>>> Wei
>>>>>>>
>>>>>>> 2015-06-25 13:40 GMT-07:00 Eskilson,Aleksander <
>>>>>>> Alek.Eskilson@cerner.com>:
>>>>>>>
>>>>>>>>  Hi there,
>>>>>>>>
>>>>>>>>  The tutorial you’re reading there was written before the merge of
>>>>>>>> SparkR for Spark 1.4.0
>>>>>>>> For the merge, the RDD API (which includes the textFile() function)
>>>>>>>> was made private, as the devs felt many of its functions were too low
>>>>>>>> level. They focused instead on finishing the DataFrame API which supports
>>>>>>>> local, HDFS, and Hive/HBase file reads. In the meantime, the devs are
>>>>>>>> trying to determine which functions of the RDD API, if any, should be made
>>>>>>>> public again. You can see the rationale behind this decision on the issue’s
>>>>>>>> JIRA [1].
>>>>>>>>
>>>>>>>>  You can still make use of those now private RDD functions by
>>>>>>>> prepending the function call with the SparkR private namespace, for
>>>>>>>> example, you’d use
>>>>>>>> SparkR:::textFile(…).
>>>>>>>>
>>>>>>>>  Hope that helps,
>>>>>>>> Alek
>>>>>>>>
>>>>>>>>  [1] -- https://issues.apache.org/jira/browse/SPARK-7230
>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D7230&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=x60a-3ztBe4XOw2bOnEI9-Mc6mENXT8PVxYvsmTLVG8&s=HpX1Cpayu5Mwu9JVt2znimJyUwtV3vcPurUO9ZJhASo&e=>
>>>>>>>>
>>>>>>>>   From: Wei Zhou <zh...@gmail.com>
>>>>>>>> Date: Thursday, June 25, 2015 at 3:33 PM
>>>>>>>> To: "user@spark.apache.org" <us...@spark.apache.org>
>>>>>>>> Subject: sparkR could not find function "textFile"
>>>>>>>>
>>>>>>>>   Hi all,
>>>>>>>>
>>>>>>>>  I am exploring sparkR by activating the shell and following the
>>>>>>>> tutorial here https://amplab-extras.github.io/SparkR-pkg/
>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__amplab-2Dextras.github.io_SparkR-2Dpkg_&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=aL4A2Pv9tHbhgJUX-EnuYx2HntTnrqVpegm6Ag-FwnQ&s=qfOET1UvP0ECAKgnTJw8G13sFTi_PhiJ8Q89fMSgH_Q&e=>
>>>>>>>>
>>>>>>>>  And when I tried to read in a local file with textFile(sc,
>>>>>>>> "file_location"), it gives an error could not find function "textFile".
>>>>>>>>
>>>>>>>>  By reading through sparkR doc for 1.4, it seems that we need
>>>>>>>> sqlContext to import data, for example.
>>>>>>>>
>>>>>>>> people <- read.df(sqlContext, "./examples/src/main/resources/people.json", "json"
>>>>>>>>
>>>>>>>> )
>>>>>>>> And we need to specify the file type.
>>>>>>>>
>>>>>>>>  My question is does sparkR stop supporting general type file
>>>>>>>> importing? If not, would appreciate any help on how to do this.
>>>>>>>>
>>>>>>>>  PS, I am trying to recreate the word count example in sparkR, and
>>>>>>>> want to import README.md file, or just any file into sparkR.
>>>>>>>>
>>>>>>>>  Thanks in advance.
>>>>>>>>
>>>>>>>>  Best,
>>>>>>>> Wei
>>>>>>>>
>>>>>>>>     CONFIDENTIALITY NOTICE This message and any included
>>>>>>>> attachments are from Cerner Corporation and are intended only for the
>>>>>>>> addressee. The information contained in this message is confidential and
>>>>>>>> may constitute inside or non-public information under international,
>>>>>>>> federal, or state securities laws. Unauthorized forwarding, printing,
>>>>>>>> copying, distribution, or use of such information is strictly prohibited
>>>>>>>> and may be unlawful. If you are not the addressee, please promptly delete
>>>>>>>> this message and notify the sender of the delivery error by e-mail or you
>>>>>>>> may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1)
>>>>>>>> (816)221-1024.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: sparkR could not find function "textFile"

Posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu>.

You can check my comment below the answer at
http://stackoverflow.com/a/30959388/4577954. BTW we added a new option to
sparkR.init to pass in packages and that should be a part of 1.5

Shivaram

On Wed, Jul 1, 2015 at 10:03 AM, Sourav Mazumder <
sourav.mazumder00@gmail.com> wrote:

> Hi,
>
> Piggybacking on this discussion.
>
> I'm trying to achieve the same, reading a csv file, from RStudio. Where
> I'm stuck is how to supply some additional package from RStudio to
> spark.init() as sparkR.init does() not provide an option to specify
> additional package.
>
> I tried following codefrom RStudio. It is giving me error "Error in
> callJMethod(sqlContext, "load", source, options) :
>   Invalid jobj 1. If SparkR was restarted, Spark operations need to be
> re-executed."
>
> ------
> Sys.setenv(SPARK_HOME="C:\\spark-1.4.0-bin-hadoop2.6")
> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"),.libPaths()))
> library(SparkR)
>
> sparkR.stop()
>
> sc <- sparkR.init(master="local[2]", sparkEnvir =
> list(spark.executor.memory="1G"),
> sparkJars="C:\\spark-1.4.0-bin-hadoop2.6\\lib\\spark-csv_2.11-1.1.0.jar")
> /* I have downloaded this spark-csv jar and kept it in lib folder of Spark
> */
>
> sqlContext <- sparkRSQL.init(sc)
>
> plutoMN <- read.df(sqlContext,
> "C:\\Users\\Sourav\\Work\\SparkDataScience\\PlutoMN.csv", source =
> "com.databricks.spark.csv").
>
> ------
>
> However, I also tried this from shell as 'sparkR --package
> com.databricks:spark-csv_2.11:1.1.0". This time I used the following code
> and it works all fine.
>
> sqlContext <- sparkRSQL.init(sc)
>
> plutoMN <- read.df(sqlContext,
> "C:\\Users\\Sourav\\Work\\SparkDataScience\\PlutoMN.csv", source =
> "com.databricks.spark.csv").
>
> Any idea how to achieve the same from RStudio ?
>
> Regards,
>
>
>
>
> On Thu, Jun 25, 2015 at 2:38 PM, Wei Zhou <zh...@gmail.com> wrote:
>
>> I tried out the solution using spark-csv package, and it worked fine now
>> :) Thanks. Yes, I'm playing with a file with all columns as String, but the
>> real data I want to process are all doubles. I'm just exploring what sparkR
>> can do versus regular scala spark, as I am by heart a R person.
>>
>> 2015-06-25 14:26 GMT-07:00 Eskilson,Aleksander <Al...@cerner.com>
>> :
>>
>>>  Sure, I had a similar question that Shivaram was able fast for me, the
>>> solution is implemented using a separate DataBrick’s library. Check out
>>> this thread from the email archives [1], and the read.df() command [2]. CSV
>>> files can be a bit tricky, especially with inferring their schemas. Are you
>>> using just strings as your column types right now?
>>>
>>>  Alek
>>>
>>>  [1] --
>>> http://apache-spark-developers-list.1001551.n3.nabble.com/CSV-Support-in-SparkR-td12559.html
>>> [2] -- https://spark.apache.org/docs/latest/api/R/read.df.html
>>>
>>>   From: Wei Zhou <zh...@gmail.com>
>>> Date: Thursday, June 25, 2015 at 4:15 PM
>>> To: "shivaram@eecs.berkeley.edu" <sh...@eecs.berkeley.edu>
>>> Cc: Aleksander Eskilson <Al...@cerner.com>, "
>>> user@spark.apache.org" <us...@spark.apache.org>
>>> Subject: Re: sparkR could not find function "textFile"
>>>
>>>   Thanks to both Shivaram and Alek. Then if I want to create DataFrame
>>> from comma separated flat files, what would you recommend me to do? One way
>>> I can think of is first reading the data as you would do in r, using
>>> read.table(), and then create spark DataFrame out of that R dataframe, but
>>> it is obviously not scalable.
>>>
>>>
>>> 2015-06-25 13:59 GMT-07:00 Shivaram Venkataraman <
>>> shivaram@eecs.berkeley.edu>:
>>>
>>>> The `head` function is not supported for the RRDD that is returned by
>>>> `textFile`. You can run `take(lines, 5L)`. I should add a warning here that
>>>> the RDD API in SparkR is private because we might not support it in the
>>>> upcoming releases. So if you can use the DataFrame API for your application
>>>> you should try that out.
>>>>
>>>>  Thanks
>>>>  Shivaram
>>>>
>>>> On Thu, Jun 25, 2015 at 1:49 PM, Wei Zhou <zh...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Alek,
>>>>>
>>>>>  Just a follow up question. This is what I did in sparkR shell:
>>>>>
>>>>>  lines <- SparkR:::textFile(sc, "./README.md")
>>>>>  head(lines)
>>>>>
>>>>>  And I am getting error:
>>>>>
>>>>> "Error in x[seq_len(n)] : object of type 'S4' is not subsettable"
>>>>>
>>>>> I'm wondering what did I do wrong. Thanks in advance.
>>>>>
>>>>> Wei
>>>>>
>>>>> 2015-06-25 13:44 GMT-07:00 Wei Zhou <zh...@gmail.com>:
>>>>>
>>>>>> Hi Alek,
>>>>>>
>>>>>>  Thanks for the explanation, it is very helpful.
>>>>>>
>>>>>>  Cheers,
>>>>>> Wei
>>>>>>
>>>>>> 2015-06-25 13:40 GMT-07:00 Eskilson,Aleksander <
>>>>>> Alek.Eskilson@cerner.com>:
>>>>>>
>>>>>>>  Hi there,
>>>>>>>
>>>>>>>  The tutorial you’re reading there was written before the merge of
>>>>>>> SparkR for Spark 1.4.0
>>>>>>> For the merge, the RDD API (which includes the textFile() function)
>>>>>>> was made private, as the devs felt many of its functions were too low
>>>>>>> level. They focused instead on finishing the DataFrame API which supports
>>>>>>> local, HDFS, and Hive/HBase file reads. In the meantime, the devs are
>>>>>>> trying to determine which functions of the RDD API, if any, should be made
>>>>>>> public again. You can see the rationale behind this decision on the issue’s
>>>>>>> JIRA [1].
>>>>>>>
>>>>>>>  You can still make use of those now private RDD functions by
>>>>>>> prepending the function call with the SparkR private namespace, for
>>>>>>> example, you’d use
>>>>>>> SparkR:::textFile(…).
>>>>>>>
>>>>>>>  Hope that helps,
>>>>>>> Alek
>>>>>>>
>>>>>>>  [1] -- https://issues.apache.org/jira/browse/SPARK-7230
>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D7230&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=x60a-3ztBe4XOw2bOnEI9-Mc6mENXT8PVxYvsmTLVG8&s=HpX1Cpayu5Mwu9JVt2znimJyUwtV3vcPurUO9ZJhASo&e=>
>>>>>>>
>>>>>>>   From: Wei Zhou <zh...@gmail.com>
>>>>>>> Date: Thursday, June 25, 2015 at 3:33 PM
>>>>>>> To: "user@spark.apache.org" <us...@spark.apache.org>
>>>>>>> Subject: sparkR could not find function "textFile"
>>>>>>>
>>>>>>>   Hi all,
>>>>>>>
>>>>>>>  I am exploring sparkR by activating the shell and following the
>>>>>>> tutorial here https://amplab-extras.github.io/SparkR-pkg/
>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__amplab-2Dextras.github.io_SparkR-2Dpkg_&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=aL4A2Pv9tHbhgJUX-EnuYx2HntTnrqVpegm6Ag-FwnQ&s=qfOET1UvP0ECAKgnTJw8G13sFTi_PhiJ8Q89fMSgH_Q&e=>
>>>>>>>
>>>>>>>  And when I tried to read in a local file with textFile(sc,
>>>>>>> "file_location"), it gives an error could not find function "textFile".
>>>>>>>
>>>>>>>  By reading through sparkR doc for 1.4, it seems that we need
>>>>>>> sqlContext to import data, for example.
>>>>>>>
>>>>>>> people <- read.df(sqlContext, "./examples/src/main/resources/people.json", "json"
>>>>>>>
>>>>>>> )
>>>>>>> And we need to specify the file type.
>>>>>>>
>>>>>>>  My question is does sparkR stop supporting general type file
>>>>>>> importing? If not, would appreciate any help on how to do this.
>>>>>>>
>>>>>>>  PS, I am trying to recreate the word count example in sparkR, and
>>>>>>> want to import README.md file, or just any file into sparkR.
>>>>>>>
>>>>>>>  Thanks in advance.
>>>>>>>
>>>>>>>  Best,
>>>>>>> Wei
>>>>>>>
>>>>>>>     CONFIDENTIALITY NOTICE This message and any included
>>>>>>> attachments are from Cerner Corporation and are intended only for the
>>>>>>> addressee. The information contained in this message is confidential and
>>>>>>> may constitute inside or non-public information under international,
>>>>>>> federal, or state securities laws. Unauthorized forwarding, printing,
>>>>>>> copying, distribution, or use of such information is strictly prohibited
>>>>>>> and may be unlawful. If you are not the addressee, please promptly delete
>>>>>>> this message and notify the sender of the delivery error by e-mail or you
>>>>>>> may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1)
>>>>>>> (816)221-1024.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>