You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sandeep Khurana <sa...@infoworks.io> on 2016/01/05 12:27:59 UTC
sparkR ORC support.
Hello
I need to read an ORC files in hdfs in R using spark. I am not able to find
a package to do that.
Can anyone help with documentation or example for this purpose?
--
Architect
Infoworks.io
http://Infoworks.io
Re: sparkR ORC support.
Posted by Sandeep Khurana <sa...@infoworks.io>.
I call stop from console as R studio warns and advises it. And yes. after
stop was called the whole script was run again together. It means init
"hivecontext <- sparkRHive.init(sc)" is called after stop always.
On Tue, Jan 12, 2016 at 8:31 PM, Felix Cheung <fe...@hotmail.com>
wrote:
> As you can see from my reply below from Jan 6, calling sparkR.stop()
> invalidates both sc and hivecontext you have and results in this invalid
> jobj error.
>
> If you start R and run this, it should work:
>
> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>
> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
> library(SparkR)
>
> sc <- sparkR.init()
> hivecontext <- sparkRHive.init(sc)
> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>
>
> Is there a reason you want to call stop? If you do, you would need to call
> the line hivecontext <- sparkRHive.init(sc) again.
>
>
> _____________________________
> From: Sandeep Khurana <sa...@infoworks.io>
> Sent: Tuesday, January 12, 2016 5:20 AM
> Subject: Re: sparkR ORC support.
> To: Felix Cheung <fe...@hotmail.com>
> Cc: spark users <us...@spark.apache.org>, Prem Sure <pr...@gmail.com>,
> Deepak Sharma <de...@gmail.com>, Yanbo Liang <yb...@gmail.com>
>
>
> It worked for sometime. Then I did sparkR.stop() an re-ran again to get
> the same error. Any idea why it ran fine before ( while running fine it
> kept giving warning reusing existing spark-context and that I should
> restart) ? There is one more R code which instantiated spark , I ran that
> too again.
>
>
> On Tue, Jan 12, 2016 at 3:05 PM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
>> Complete stacktrace is. Can it be something wih java versions?
>>
>>
>> stop("invalid jobj ", value$id)
>> 8
>> writeJobj(con, object)
>> 7
>> writeObject(con, a)
>> 6
>> writeArgs(rc, args)
>> 5
>> invokeJava(isStatic = TRUE, className, methodName, ...)
>> 4
>> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
>> source, options)
>> 3
>> read.df(sqlContext, path, source, schema, ...)
>> 2
>> loadDF(hivecontext, filepath, "orc")
>>
>> On Tue, Jan 12, 2016 at 2:41 PM, Sandeep Khurana <sa...@infoworks.io>
>> wrote:
>>
>>> Running this gave
>>>
>>> 16/01/12 04:06:54 INFO BlockManagerMaster: Registered BlockManagerError in writeJobj(con, object) : invalid jobj 3
>>>
>>>
>>> How does it know which hive schema to connect to?
>>>
>>>
>>>
>>> On Tue, Jan 12, 2016 at 2:34 PM, Felix Cheung <felixcheung_m@hotmail.com
>>> > wrote:
>>>
>>>> It looks like you have overwritten sc. Could you try this:
>>>>
>>>>
>>>> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>>>>
>>>> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"),
>>>> .libPaths()))
>>>> library(SparkR)
>>>>
>>>> sc <- sparkR.init()
>>>> hivecontext <- sparkRHive.init(sc)
>>>> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>> Date: Tue, 12 Jan 2016 14:28:58 +0530
>>>> Subject: Re: sparkR ORC support.
>>>> From: sandeep@infoworks.io
>>>> To: felixcheung_m@hotmail.com
>>>> CC: ybliang8@gmail.com; user@spark.apache.org; premsure542@gmail.com;
>>>> deepakmca05@gmail.com
>>>>
>>>>
>>>> The code is very simple, pasted below .
>>>> hive-site.xml is in spark conf already. I still see this error
>>>>
>>>> Error in writeJobj(con, object) : invalid jobj 3
>>>>
>>>> after running the script below
>>>>
>>>>
>>>> script
>>>> =======
>>>> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>>>>
>>>>
>>>> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"),
>>>> .libPaths()))
>>>> library(SparkR)
>>>>
>>>> sc <<- sparkR.init()
>>>> sc <<- sparkRHive.init()
>>>> hivecontext <<- sparkRHive.init(sc)
>>>> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>>>> #View(df)
>>>>
>>>>
>>>> On Wed, Jan 6, 2016 at 11:08 PM, Felix Cheung <
>>>> felixcheung_m@hotmail.com> wrote:
>>>>
>>>> Yes, as Yanbo suggested, it looks like there is something wrong with
>>>> the sqlContext.
>>>>
>>>> Could you forward us your code please?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Jan 6, 2016 at 5:52 AM -0800, "Yanbo Liang" <ybliang8@gmail.com
>>>> > wrote:
>>>>
>>>> You should ensure your sqlContext is HiveContext.
>>>>
>>>> sc <- sparkR.init()
>>>>
>>>> sqlContext <- sparkRHive.init(sc)
>>>>
>>>>
>>>> 2016-01-06 20:35 GMT+08:00 Sandeep Khurana <sa...@infoworks.io>:
>>>>
>>>> Felix
>>>>
>>>> I tried the option suggested by you. It gave below error. I am going
>>>> to try the option suggested by Prem .
>>>>
>>>> Error in writeJobj(con, object) : invalid jobj 1
>>>> 8
>>>> stop("invalid jobj ", value$id)
>>>> 7
>>>> writeJobj(con, object)
>>>> 6
>>>> writeObject(con, a)
>>>> 5
>>>> writeArgs(rc, args)
>>>> 4
>>>> invokeJava(isStatic = TRUE, className, methodName, ...)
>>>> 3
>>>> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF",
>>>> sqlContext, source, options)
>>>> 2
>>>> read.df(sqlContext, filepath, "orc") at
>>>> spark_api.R#108
>>>>
>>>> On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <
>>>> felixcheung_m@hotmail.com> wrote:
>>>>
>>>> Firstly I don't have ORC data to verify but this should work:
>>>>
>>>> df <- loadDF(sqlContext, "data/path", "orc")
>>>>
>>>> Secondly, could you check if sparkR.stop() was called?
>>>> sparkRHive.init() should be called after sparkR.init() - please check if
>>>> there is any error message there.
>>>>
>>>> _____________________________
>>>> From: Prem Sure < premsure542@gmail.com>
>>>> Sent: Tuesday, January 5, 2016 8:12 AM
>>>> Subject: Re: sparkR ORC support.
>>>> To: Sandeep Khurana < sandeep@infoworks.io>
>>>> Cc: spark users < user@spark.apache.org>, Deepak Sharma <
>>>> deepakmca05@gmail.com>
>>>>
>>>>
>>>>
>>>> Yes Sandeep, also copy hive-site.xml too to spark conf directory.
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sa...@infoworks.io>
>>>> wrote:
>>>>
>>>> Also, do I need to setup hive in spark as per the link
>>>> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark
>>>> ?
>>>>
>>>> We might need to copy hdfs-site.xml file to spark conf directory ?
>>>>
>>>> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sa...@infoworks.io>
>>>> wrote:
>>>>
>>>> Deepak
>>>>
>>>> Tried this. Getting this error now
>>>>
>>>> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") : unused argument ("")
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi Sandeep
>>>> can you try this ?
>>>>
>>>> results <- sql(hivecontext, "FROM test SELECT id","")
>>>>
>>>> Thanks
>>>> Deepak
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
>>>> wrote:
>>>>
>>>> Thanks Deepak.
>>>>
>>>> I tried this as well. I created a hivecontext with "hivecontext <<-
>>>> sparkRHive.init(sc) " .
>>>>
>>>> When I tried to read hive table from this ,
>>>>
>>>> results <- sql(hivecontext, "FROM test SELECT id")
>>>>
>>>> I get below error,
>>>>
>>>> Error in callJMethod(sqlContext, "sql", sqlQuery) : Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>>>>
>>>>
>>>> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>>>>
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi Sandeep
>>>> I am not sure if ORC can be read directly in R.
>>>> But there can be a workaround .First create hive table on top of ORC
>>>> files and then access hive table in R.
>>>>
>>>> Thanks
>>>> Deepak
>>>>
>>>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io>
>>>> wrote:
>>>>
>>>> Hello
>>>>
>>>> I need to read an ORC files in hdfs in R using spark. I am not able to
>>>> find a package to do that.
>>>>
>>>> Can anyone help with documentation or example for this purpose?
>>>>
>>>> --
>>>> Architect
>>>> Infoworks.io <http://infoworks.io>
>>>> http://Infoworks.io
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks
>>>> Deepak
>>>> www.bigdatabig.com
>>>> www.keosha.net
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks
>>>> Deepak
>>>> www.bigdatabig.com
>>>> www.keosha.net
>>>>
>>>>
>>>>
>>>>
>>>>
Re: sparkR ORC support.
Posted by Felix Cheung <fe...@hotmail.com>.
As you can see from my reply below from Jan 6, calling sparkR.stop() invalidates both sc and hivecontext you have and results in this invalid jobj error.
If you start R and run this, it should work:
Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))library(SparkR)
sc <- sparkR.init()hivecontext <- sparkRHive.init(sc)df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
Is there a reason you want to call stop? If you do, you would need to call the line hivecontext <- sparkRHive.init(sc) again.
_____________________________
From: Sandeep Khurana <sa...@infoworks.io>
Sent: Tuesday, January 12, 2016 5:20 AM
Subject: Re: sparkR ORC support.
To: Felix Cheung <fe...@hotmail.com>
Cc: spark users <us...@spark.apache.org>, Prem Sure <pr...@gmail.com>, Deepak Sharma <de...@gmail.com>, Yanbo Liang <yb...@gmail.com>
It worked for sometime. Then I did sparkR.stop() an re-ran again to get the same error. Any idea why it ran fine before ( while running fine it kept giving warning reusing existing spark-context and that I should restart) ? There is one more R code which instantiated spark , I ran that too again.
On Tue, Jan 12, 2016 at 3:05 PM, Sandeep Khurana <sa...@infoworks.io> wrote:
Complete stacktrace is. Can it be something wih java versions?
stop("invalid jobj ", value$id) 8 writeJobj(con, object) 7 writeObject(con, a) 6 writeArgs(rc, args) 5 invokeJava(isStatic = TRUE, className, methodName, ...) 4 callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext, source, options) 3 read.df(sqlContext, path, source, schema, ...) 2 loadDF(hivecontext, filepath, "orc")
On Tue, Jan 12, 2016 at 2:41 PM, Sandeep Khurana <sa...@infoworks.io> wrote:
Running this gave
16/01/12 04:06:54 INFO BlockManagerMaster: Registered BlockManagerError in writeJobj(con, object) : invalid jobj 3
How does it know which hive schema to connect to?
On Tue, Jan 12, 2016 at 2:34 PM, Felix Cheung <fe...@hotmail.com> wrote:
It looks like you have overwritten sc. Could you try this:
Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths())) library(SparkR)
sc <- sparkR.init() hivecontext <- sparkRHive.init(sc) df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
Date: Tue, 12 Jan 2016 14:28:58 +0530
Subject: Re: sparkR ORC support.
From: sandeep@infoworks.io
To: felixcheung_m@hotmail.com
CC: ybliang8@gmail.com; user@spark.apache.org; premsure542@gmail.com; deepakmca05@gmail.com
The code is very simple, pasted below . hive-site.xml is in spark conf already. I still see this error Error in writeJobj(con, object) : invalid jobj 3 after running the script below
script ======= Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths())) library(SparkR)
sc <<- sparkR.init() sc <<- sparkRHive.init() hivecontext <<- sparkRHive.init(sc) df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc") #View(df)
On Wed, Jan 6, 2016 at 11:08 PM, Felix Cheung <fe...@hotmail.com> wrote:
Yes, as Yanbo suggested, it looks like there is something wrong with the sqlContext.
Could you forward us your code please?
On Wed, Jan 6, 2016 at 5:52 AM -0800, "Yanbo Liang" <yb...@gmail.com> wrote:
You should ensure your sqlContext is HiveContext. sc <- sparkR.init() sqlContext <- sparkRHive.init(sc)
2016-01-06 20:35 GMT+08:00 Sandeep Khurana <sa...@infoworks.io>:
Felix
I tried the option suggested by you. It gave below error. I am going to try the option suggested by Prem .
Error in writeJobj(con, object) : invalid jobj 1 8 stop("invalid jobj ", value$id) 7 writeJobj(con, object) 6 writeObject(con, a) 5 writeArgs(rc, args) 4 invokeJava(isStatic = TRUE, className, methodName, ...) 3 callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext, source, options) 2 read.df(sqlContext, filepath, "orc") at spark_api.R#108
On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <fe...@hotmail.com> wrote:
Firstly I don't have ORC data to verify but this should work:
df <- loadDF(sqlContext, "data/path", "orc")
Secondly, could you check if sparkR.stop() was called? sparkRHive.init() should be called after sparkR.init() - please check if there is any error message there.
_____________________________
From: Prem Sure < premsure542@gmail.com>
Sent: Tuesday, January 5, 2016 8:12 AM
Subject: Re: sparkR ORC support.
To: Sandeep Khurana < sandeep@infoworks.io>
Cc: spark users < user@spark.apache.org>, Deepak Sharma < deepakmca05@gmail.com>
Yes Sandeep, also copy hive-site.xml too to spark conf directory.
On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sa...@infoworks.io> wrote:
Also, do I need to setup hive in spark as per the link http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark ?
We might need to copy hdfs-site.xml file to spark conf directory ?
On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sa...@infoworks.io> wrote:
Deepak
Tried this. Getting this error now rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") : unused argument ("")
On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com> wrote:
Hi Sandeep can you try this ?
results <- sql(hivecontext, "FROM test SELECT id","")
Thanks Deepak
On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io> wrote:
Thanks Deepak.
I tried this as well. I created a hivecontext with "hivecontext <<- sparkRHive.init(sc) " .
When I tried to read hive table from this ,
results <- sql(hivecontext, "FROM test SELECT id")
I get below error,
Error in callJMethod(sqlContext, "sql", sqlQuery) : Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
Not sure what is causing this? Any leads or ideas? I am using rstudio.
On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com> wrote:
Hi Sandeep I am not sure if ORC can be read directly in R. But there can be a workaround .First create hive table on top of ORC files and then access hive table in R.
Thanks Deepak
On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io> wrote:
Hello
I need to read an ORC files in hdfs in R using spark. I am not able to find a package to do that.
Can anyone help with documentation or example for this purpose?
--
Architect
Infoworks.io
http://Infoworks.io
--
Thanks
Deepak
www.bigdatabig.com
www.keosha.net
--
Architect
Infoworks.io
http://Infoworks.io
--
Thanks
Deepak
www.bigdatabig.com
www.keosha.net
--
Architect
Infoworks.io
http://Infoworks.io
--
Architect
Infoworks.io
http://Infoworks.io
--
Architect
Infoworks.io
http://Infoworks.io
--
Architect
Infoworks.io
http://Infoworks.io
--
Architect
Infoworks.io
http://Infoworks.io
--
Architect
Infoworks.io
http://Infoworks.io
--
Architect
Infoworks.io
http://Infoworks.io
Re: sparkR ORC support.
Posted by Sandeep Khurana <sa...@infoworks.io>.
It worked for sometime. Then I did sparkR.stop() an re-ran again to get
the same error. Any idea why it ran fine before ( while running fine it
kept giving warning reusing existing spark-context and that I should
restart) ? There is one more R code which instantiated spark , I ran that
too again.
On Tue, Jan 12, 2016 at 3:05 PM, Sandeep Khurana <sa...@infoworks.io>
wrote:
> Complete stacktrace is. Can it be something wih java versions?
>
>
> stop("invalid jobj ", value$id)
> 8
> writeJobj(con, object)
> 7
> writeObject(con, a)
> 6
> writeArgs(rc, args)
> 5
> invokeJava(isStatic = TRUE, className, methodName, ...)
> 4
> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
> source, options)
> 3
> read.df(sqlContext, path, source, schema, ...)
> 2
> loadDF(hivecontext, filepath, "orc")
>
> On Tue, Jan 12, 2016 at 2:41 PM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
>> Running this gave
>>
>> 16/01/12 04:06:54 INFO BlockManagerMaster: Registered BlockManagerError in writeJobj(con, object) : invalid jobj 3
>>
>>
>> How does it know which hive schema to connect to?
>>
>>
>>
>> On Tue, Jan 12, 2016 at 2:34 PM, Felix Cheung <fe...@hotmail.com>
>> wrote:
>>
>>> It looks like you have overwritten sc. Could you try this:
>>>
>>>
>>> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>>>
>>> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"),
>>> .libPaths()))
>>> library(SparkR)
>>>
>>> sc <- sparkR.init()
>>> hivecontext <- sparkRHive.init(sc)
>>> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>>>
>>>
>>>
>>> ------------------------------
>>> Date: Tue, 12 Jan 2016 14:28:58 +0530
>>> Subject: Re: sparkR ORC support.
>>> From: sandeep@infoworks.io
>>> To: felixcheung_m@hotmail.com
>>> CC: ybliang8@gmail.com; user@spark.apache.org; premsure542@gmail.com;
>>> deepakmca05@gmail.com
>>>
>>>
>>> The code is very simple, pasted below .
>>> hive-site.xml is in spark conf already. I still see this error
>>>
>>> Error in writeJobj(con, object) : invalid jobj 3
>>>
>>> after running the script below
>>>
>>>
>>> script
>>> =======
>>> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>>>
>>>
>>> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"),
>>> .libPaths()))
>>> library(SparkR)
>>>
>>> sc <<- sparkR.init()
>>> sc <<- sparkRHive.init()
>>> hivecontext <<- sparkRHive.init(sc)
>>> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>>> #View(df)
>>>
>>>
>>> On Wed, Jan 6, 2016 at 11:08 PM, Felix Cheung <felixcheung_m@hotmail.com
>>> > wrote:
>>>
>>> Yes, as Yanbo suggested, it looks like there is something wrong with the
>>> sqlContext.
>>>
>>> Could you forward us your code please?
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Jan 6, 2016 at 5:52 AM -0800, "Yanbo Liang" <yb...@gmail.com>
>>> wrote:
>>>
>>> You should ensure your sqlContext is HiveContext.
>>>
>>> sc <- sparkR.init()
>>>
>>> sqlContext <- sparkRHive.init(sc)
>>>
>>>
>>> 2016-01-06 20:35 GMT+08:00 Sandeep Khurana <sa...@infoworks.io>:
>>>
>>> Felix
>>>
>>> I tried the option suggested by you. It gave below error. I am going
>>> to try the option suggested by Prem .
>>>
>>> Error in writeJobj(con, object) : invalid jobj 1
>>> 8
>>> stop("invalid jobj ", value$id)
>>> 7
>>> writeJobj(con, object)
>>> 6
>>> writeObject(con, a)
>>> 5
>>> writeArgs(rc, args)
>>> 4
>>> invokeJava(isStatic = TRUE, className, methodName, ...)
>>> 3
>>> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
>>> source, options)
>>> 2
>>> read.df(sqlContext, filepath, "orc") at
>>> spark_api.R#108
>>>
>>> On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <felixcheung_m@hotmail.com
>>> > wrote:
>>>
>>> Firstly I don't have ORC data to verify but this should work:
>>>
>>> df <- loadDF(sqlContext, "data/path", "orc")
>>>
>>> Secondly, could you check if sparkR.stop() was called? sparkRHive.init()
>>> should be called after sparkR.init() - please check if there is any error
>>> message there.
>>>
>>> _____________________________
>>> From: Prem Sure <pr...@gmail.com>
>>> Sent: Tuesday, January 5, 2016 8:12 AM
>>> Subject: Re: sparkR ORC support.
>>> To: Sandeep Khurana <sa...@infoworks.io>
>>> Cc: spark users <us...@spark.apache.org>, Deepak Sharma <
>>> deepakmca05@gmail.com>
>>>
>>>
>>>
>>> Yes Sandeep, also copy hive-site.xml too to spark conf directory.
>>>
>>>
>>> On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sa...@infoworks.io>
>>> wrote:
>>>
>>> Also, do I need to setup hive in spark as per the link
>>> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark
>>> ?
>>>
>>> We might need to copy hdfs-site.xml file to spark conf directory ?
>>>
>>> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sa...@infoworks.io>
>>> wrote:
>>>
>>> Deepak
>>>
>>> Tried this. Getting this error now
>>>
>>> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") : unused argument ("")
>>>
>>>
>>> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com>
>>> wrote:
>>>
>>> Hi Sandeep
>>> can you try this ?
>>>
>>> results <- sql(hivecontext, "FROM test SELECT id","")
>>>
>>> Thanks
>>> Deepak
>>>
>>>
>>> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
>>> wrote:
>>>
>>> Thanks Deepak.
>>>
>>> I tried this as well. I created a hivecontext with "hivecontext <<-
>>> sparkRHive.init(sc) " .
>>>
>>> When I tried to read hive table from this ,
>>>
>>> results <- sql(hivecontext, "FROM test SELECT id")
>>>
>>> I get below error,
>>>
>>> Error in callJMethod(sqlContext, "sql", sqlQuery) : Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>>>
>>>
>>> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>>>
>>>
>>>
>>> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
>>> wrote:
>>>
>>> Hi Sandeep
>>> I am not sure if ORC can be read directly in R.
>>> But there can be a workaround .First create hive table on top of ORC
>>> files and then access hive table in R.
>>>
>>> Thanks
>>> Deepak
>>>
>>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io>
>>> wrote:
>>>
>>> Hello
>>>
>>> I need to read an ORC files in hdfs in R using spark. I am not able to
>>> find a package to do that.
>>>
>>> Can anyone help with documentation or example for this purpose?
>>>
>>> --
>>> Architect
>>> Infoworks.io <http://infoworks.io>
>>> http://Infoworks.io
>>>
>>>
>>>
>>>
>>> --
>>> Thanks
>>> Deepak
>>> www.bigdatabig.com
>>> www.keosha.net
>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io <http://infoworks.io>
>>> http://Infoworks.io
>>>
>>>
>>>
>>>
>>> --
>>> Thanks
>>> Deepak
>>> www.bigdatabig.com
>>> www.keosha.net
>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io <http://infoworks.io>
>>> http://Infoworks.io
>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io <http://infoworks.io>
>>> http://Infoworks.io
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io
>>> http://Infoworks.io
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io
>>> http://Infoworks.io
>>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io
>> http://Infoworks.io
>>
>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>
--
Architect
Infoworks.io
http://Infoworks.io
Re: sparkR ORC support.
Posted by Sandeep Khurana <sa...@infoworks.io>.
Complete stacktrace is. Can it be something wih java versions?
stop("invalid jobj ", value$id)
8
writeJobj(con, object)
7
writeObject(con, a)
6
writeArgs(rc, args)
5
invokeJava(isStatic = TRUE, className, methodName, ...)
4
callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
source, options)
3
read.df(sqlContext, path, source, schema, ...)
2
loadDF(hivecontext, filepath, "orc")
On Tue, Jan 12, 2016 at 2:41 PM, Sandeep Khurana <sa...@infoworks.io>
wrote:
> Running this gave
>
> 16/01/12 04:06:54 INFO BlockManagerMaster: Registered BlockManagerError in writeJobj(con, object) : invalid jobj 3
>
>
> How does it know which hive schema to connect to?
>
>
>
> On Tue, Jan 12, 2016 at 2:34 PM, Felix Cheung <fe...@hotmail.com>
> wrote:
>
>> It looks like you have overwritten sc. Could you try this:
>>
>>
>> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>>
>> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
>> library(SparkR)
>>
>> sc <- sparkR.init()
>> hivecontext <- sparkRHive.init(sc)
>> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>>
>>
>>
>> ------------------------------
>> Date: Tue, 12 Jan 2016 14:28:58 +0530
>> Subject: Re: sparkR ORC support.
>> From: sandeep@infoworks.io
>> To: felixcheung_m@hotmail.com
>> CC: ybliang8@gmail.com; user@spark.apache.org; premsure542@gmail.com;
>> deepakmca05@gmail.com
>>
>>
>> The code is very simple, pasted below .
>> hive-site.xml is in spark conf already. I still see this error
>>
>> Error in writeJobj(con, object) : invalid jobj 3
>>
>> after running the script below
>>
>>
>> script
>> =======
>> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>>
>>
>> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
>> library(SparkR)
>>
>> sc <<- sparkR.init()
>> sc <<- sparkRHive.init()
>> hivecontext <<- sparkRHive.init(sc)
>> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>> #View(df)
>>
>>
>> On Wed, Jan 6, 2016 at 11:08 PM, Felix Cheung <fe...@hotmail.com>
>> wrote:
>>
>> Yes, as Yanbo suggested, it looks like there is something wrong with the
>> sqlContext.
>>
>> Could you forward us your code please?
>>
>>
>>
>>
>>
>> On Wed, Jan 6, 2016 at 5:52 AM -0800, "Yanbo Liang" <yb...@gmail.com>
>> wrote:
>>
>> You should ensure your sqlContext is HiveContext.
>>
>> sc <- sparkR.init()
>>
>> sqlContext <- sparkRHive.init(sc)
>>
>>
>> 2016-01-06 20:35 GMT+08:00 Sandeep Khurana <sa...@infoworks.io>:
>>
>> Felix
>>
>> I tried the option suggested by you. It gave below error. I am going to
>> try the option suggested by Prem .
>>
>> Error in writeJobj(con, object) : invalid jobj 1
>> 8
>> stop("invalid jobj ", value$id)
>> 7
>> writeJobj(con, object)
>> 6
>> writeObject(con, a)
>> 5
>> writeArgs(rc, args)
>> 4
>> invokeJava(isStatic = TRUE, className, methodName, ...)
>> 3
>> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
>> source, options)
>> 2
>> read.df(sqlContext, filepath, "orc") at
>> spark_api.R#108
>>
>> On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <fe...@hotmail.com>
>> wrote:
>>
>> Firstly I don't have ORC data to verify but this should work:
>>
>> df <- loadDF(sqlContext, "data/path", "orc")
>>
>> Secondly, could you check if sparkR.stop() was called? sparkRHive.init()
>> should be called after sparkR.init() - please check if there is any error
>> message there.
>>
>> _____________________________
>> From: Prem Sure <pr...@gmail.com>
>> Sent: Tuesday, January 5, 2016 8:12 AM
>> Subject: Re: sparkR ORC support.
>> To: Sandeep Khurana <sa...@infoworks.io>
>> Cc: spark users <us...@spark.apache.org>, Deepak Sharma <
>> deepakmca05@gmail.com>
>>
>>
>>
>> Yes Sandeep, also copy hive-site.xml too to spark conf directory.
>>
>>
>> On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sa...@infoworks.io>
>> wrote:
>>
>> Also, do I need to setup hive in spark as per the link
>> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark
>> ?
>>
>> We might need to copy hdfs-site.xml file to spark conf directory ?
>>
>> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sa...@infoworks.io>
>> wrote:
>>
>> Deepak
>>
>> Tried this. Getting this error now
>>
>> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") : unused argument ("")
>>
>>
>> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com>
>> wrote:
>>
>> Hi Sandeep
>> can you try this ?
>>
>> results <- sql(hivecontext, "FROM test SELECT id","")
>>
>> Thanks
>> Deepak
>>
>>
>> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
>> wrote:
>>
>> Thanks Deepak.
>>
>> I tried this as well. I created a hivecontext with "hivecontext <<-
>> sparkRHive.init(sc) " .
>>
>> When I tried to read hive table from this ,
>>
>> results <- sql(hivecontext, "FROM test SELECT id")
>>
>> I get below error,
>>
>> Error in callJMethod(sqlContext, "sql", sqlQuery) : Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>>
>>
>> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>>
>>
>>
>> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
>> wrote:
>>
>> Hi Sandeep
>> I am not sure if ORC can be read directly in R.
>> But there can be a workaround .First create hive table on top of ORC
>> files and then access hive table in R.
>>
>> Thanks
>> Deepak
>>
>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io>
>> wrote:
>>
>> Hello
>>
>> I need to read an ORC files in hdfs in R using spark. I am not able to
>> find a package to do that.
>>
>> Can anyone help with documentation or example for this purpose?
>>
>> --
>> Architect
>> Infoworks.io <http://infoworks.io>
>> http://Infoworks.io
>>
>>
>>
>>
>> --
>> Thanks
>> Deepak
>> www.bigdatabig.com
>> www.keosha.net
>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io <http://infoworks.io>
>> http://Infoworks.io
>>
>>
>>
>>
>> --
>> Thanks
>> Deepak
>> www.bigdatabig.com
>> www.keosha.net
>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io <http://infoworks.io>
>> http://Infoworks.io
>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io <http://infoworks.io>
>> http://Infoworks.io
>>
>>
>>
>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io
>> http://Infoworks.io
>>
>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io
>> http://Infoworks.io
>>
>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>
--
Architect
Infoworks.io
http://Infoworks.io
Re: sparkR ORC support.
Posted by Sandeep Khurana <sa...@infoworks.io>.
Running this gave
16/01/12 04:06:54 INFO BlockManagerMaster: Registered
BlockManagerError in writeJobj(con, object) : invalid jobj 3
How does it know which hive schema to connect to?
On Tue, Jan 12, 2016 at 2:34 PM, Felix Cheung <fe...@hotmail.com>
wrote:
> It looks like you have overwritten sc. Could you try this:
>
>
> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>
> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
> library(SparkR)
>
> sc <- sparkR.init()
> hivecontext <- sparkRHive.init(sc)
> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>
>
>
> ------------------------------
> Date: Tue, 12 Jan 2016 14:28:58 +0530
> Subject: Re: sparkR ORC support.
> From: sandeep@infoworks.io
> To: felixcheung_m@hotmail.com
> CC: ybliang8@gmail.com; user@spark.apache.org; premsure542@gmail.com;
> deepakmca05@gmail.com
>
>
> The code is very simple, pasted below .
> hive-site.xml is in spark conf already. I still see this error
>
> Error in writeJobj(con, object) : invalid jobj 3
>
> after running the script below
>
>
> script
> =======
> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>
>
> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
> library(SparkR)
>
> sc <<- sparkR.init()
> sc <<- sparkRHive.init()
> hivecontext <<- sparkRHive.init(sc)
> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
> #View(df)
>
>
> On Wed, Jan 6, 2016 at 11:08 PM, Felix Cheung <fe...@hotmail.com>
> wrote:
>
> Yes, as Yanbo suggested, it looks like there is something wrong with the
> sqlContext.
>
> Could you forward us your code please?
>
>
>
>
>
> On Wed, Jan 6, 2016 at 5:52 AM -0800, "Yanbo Liang" <yb...@gmail.com>
> wrote:
>
> You should ensure your sqlContext is HiveContext.
>
> sc <- sparkR.init()
>
> sqlContext <- sparkRHive.init(sc)
>
>
> 2016-01-06 20:35 GMT+08:00 Sandeep Khurana <sa...@infoworks.io>:
>
> Felix
>
> I tried the option suggested by you. It gave below error. I am going to
> try the option suggested by Prem .
>
> Error in writeJobj(con, object) : invalid jobj 1
> 8
> stop("invalid jobj ", value$id)
> 7
> writeJobj(con, object)
> 6
> writeObject(con, a)
> 5
> writeArgs(rc, args)
> 4
> invokeJava(isStatic = TRUE, className, methodName, ...)
> 3
> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
> source, options)
> 2
> read.df(sqlContext, filepath, "orc") at
> spark_api.R#108
>
> On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <fe...@hotmail.com>
> wrote:
>
> Firstly I don't have ORC data to verify but this should work:
>
> df <- loadDF(sqlContext, "data/path", "orc")
>
> Secondly, could you check if sparkR.stop() was called? sparkRHive.init()
> should be called after sparkR.init() - please check if there is any error
> message there.
>
> _____________________________
> From: Prem Sure <pr...@gmail.com>
> Sent: Tuesday, January 5, 2016 8:12 AM
> Subject: Re: sparkR ORC support.
> To: Sandeep Khurana <sa...@infoworks.io>
> Cc: spark users <us...@spark.apache.org>, Deepak Sharma <
> deepakmca05@gmail.com>
>
>
>
> Yes Sandeep, also copy hive-site.xml too to spark conf directory.
>
>
> On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
> Also, do I need to setup hive in spark as per the link
> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark
> ?
>
> We might need to copy hdfs-site.xml file to spark conf directory ?
>
> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
> Deepak
>
> Tried this. Getting this error now
>
> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") : unused argument ("")
>
>
> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com>
> wrote:
>
> Hi Sandeep
> can you try this ?
>
> results <- sql(hivecontext, "FROM test SELECT id","")
>
> Thanks
> Deepak
>
>
> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
> Thanks Deepak.
>
> I tried this as well. I created a hivecontext with "hivecontext <<-
> sparkRHive.init(sc) " .
>
> When I tried to read hive table from this ,
>
> results <- sql(hivecontext, "FROM test SELECT id")
>
> I get below error,
>
> Error in callJMethod(sqlContext, "sql", sqlQuery) : Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>
>
> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>
>
>
> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
> wrote:
>
> Hi Sandeep
> I am not sure if ORC can be read directly in R.
> But there can be a workaround .First create hive table on top of ORC files
> and then access hive table in R.
>
> Thanks
> Deepak
>
> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
> Hello
>
> I need to read an ORC files in hdfs in R using spark. I am not able to
> find a package to do that.
>
> Can anyone help with documentation or example for this purpose?
>
> --
> Architect
> Infoworks.io <http://infoworks.io>
> http://Infoworks.io
>
>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>
>
>
>
> --
> Architect
> Infoworks.io <http://infoworks.io>
> http://Infoworks.io
>
>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>
>
>
>
> --
> Architect
> Infoworks.io <http://infoworks.io>
> http://Infoworks.io
>
>
>
>
> --
> Architect
> Infoworks.io <http://infoworks.io>
> http://Infoworks.io
>
>
>
>
>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>
>
>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>
--
Architect
Infoworks.io
http://Infoworks.io
RE: sparkR ORC support.
Posted by Felix Cheung <fe...@hotmail.com>.
It looks like you have overwritten sc. Could you try this:
Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))library(SparkR)
sc <- sparkR.init()hivecontext <- sparkRHive.init(sc)df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
Date: Tue, 12 Jan 2016 14:28:58 +0530
Subject: Re: sparkR ORC support.
From: sandeep@infoworks.io
To: felixcheung_m@hotmail.com
CC: ybliang8@gmail.com; user@spark.apache.org; premsure542@gmail.com; deepakmca05@gmail.com
The code is very simple, pasted below . hive-site.xml is in spark conf already. I still see this error Error in writeJobj(con, object) : invalid jobj 3
after running the script below
script=======Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))library(SparkR)
sc <<- sparkR.init()sc <<- sparkRHive.init()hivecontext <<- sparkRHive.init(sc)df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")#View(df)
On Wed, Jan 6, 2016 at 11:08 PM, Felix Cheung <fe...@hotmail.com> wrote:
Yes, as Yanbo suggested, it looks like there is something wrong with the sqlContext.
Could you forward us your code please?
On Wed, Jan 6, 2016 at 5:52 AM -0800, "Yanbo Liang"
<yb...@gmail.com> wrote:
You should ensure your sqlContext is HiveContext.
sc <- sparkR.init()
sqlContext <- sparkRHive.init(sc)
2016-01-06 20:35 GMT+08:00 Sandeep Khurana
<sa...@infoworks.io>:
Felix
I tried the option suggested by you. It gave below error. I am going to try the option suggested by Prem .
Error in writeJobj(con, object) : invalid jobj 1
8
stop("invalid jobj ", value$id)
7
writeJobj(con, object)
6
writeObject(con, a)
5
writeArgs(rc, args)
4
invokeJava(isStatic = TRUE, className, methodName, ...)
3
callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext, source, options)
2
read.df(sqlContext, filepath, "orc") at
spark_api.R#108
On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung
<fe...@hotmail.com> wrote:
Firstly I don't have ORC data to verify but this should work:
df <- loadDF(sqlContext, "data/path", "orc")
Secondly, could you check if sparkR.stop() was called? sparkRHive.init() should be called after sparkR.init() - please check if there is any error message there.
_____________________________
From: Prem Sure <pr...@gmail.com>
Sent: Tuesday, January 5, 2016 8:12 AM
Subject: Re: sparkR ORC support.
To: Sandeep Khurana <sa...@infoworks.io>
Cc: spark users <us...@spark.apache.org>, Deepak Sharma <de...@gmail.com>
Yes Sandeep, also copy hive-site.xml too to spark conf directory.
On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana
<sa...@infoworks.io> wrote:
Also, do I need to setup hive in spark as per the link
http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark ?
We might need to copy hdfs-site.xml file to spark conf directory ?
On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana
<sa...@infoworks.io> wrote:
Deepak
Tried this. Getting this error now
rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") : unused argument ("")
On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma
<de...@gmail.com> wrote:
Hi Sandeep
can you try this ?
results <- sql(hivecontext, "FROM test SELECT id","")
Thanks
Deepak
On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana
<sa...@infoworks.io> wrote:
Thanks Deepak.
I tried this as well. I created a hivecontext with "hivecontext <<- sparkRHive.init(sc) " .
When I tried to read hive table from this ,
results <- sql(hivecontext, "FROM test SELECT id")
I get below error,
Error in callJMethod(sqlContext, "sql", sqlQuery) : Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
Not sure what is causing this? Any leads or ideas? I am using rstudio.
On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma
<de...@gmail.com> wrote:
Hi Sandeep
I am not sure if ORC can be read directly in R.
But there can be a workaround .First create hive table on top of ORC files and then access hive table in R.
Thanks
Deepak
On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana
<sa...@infoworks.io> wrote:
Hello
I need to read an ORC files in hdfs in R using spark. I am not able to find a package to do that.
Can anyone help with documentation or example for this purpose?
--
Architect
Infoworks.io
http://Infoworks.io
--
Thanks
Deepak
www.bigdatabig.com
www.keosha.net
--
Architect
Infoworks.io
http://Infoworks.io
--
Thanks
Deepak
www.bigdatabig.com
www.keosha.net
--
Architect
Infoworks.io
http://Infoworks.io
--
Architect
Infoworks.io
http://Infoworks.io
--
Architect
Infoworks.io
http://Infoworks.io
--
Architect
Infoworks.io
http://Infoworks.io
Re: sparkR ORC support.
Posted by Sandeep Khurana <sa...@infoworks.io>.
The code is very simple, pasted below .
hive-site.xml is in spark conf already. I still see this error
Error in writeJobj(con, object) : invalid jobj 3
after running the script below
script
=======
Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
sc <<- sparkR.init()
sc <<- sparkRHive.init()
hivecontext <<- sparkRHive.init(sc)
df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
#View(df)
On Wed, Jan 6, 2016 at 11:08 PM, Felix Cheung <fe...@hotmail.com>
wrote:
> Yes, as Yanbo suggested, it looks like there is something wrong with the
> sqlContext.
>
> Could you forward us your code please?
>
>
>
>
>
> On Wed, Jan 6, 2016 at 5:52 AM -0800, "Yanbo Liang" <yb...@gmail.com>
> wrote:
>
> You should ensure your sqlContext is HiveContext.
>
> sc <- sparkR.init()
>
> sqlContext <- sparkRHive.init(sc)
>
>
> 2016-01-06 20:35 GMT+08:00 Sandeep Khurana <sa...@infoworks.io>:
>
> Felix
>
> I tried the option suggested by you. It gave below error. I am going to
> try the option suggested by Prem .
>
> Error in writeJobj(con, object) : invalid jobj 1
> 8
> stop("invalid jobj ", value$id)
> 7
> writeJobj(con, object)
> 6
> writeObject(con, a)
> 5
> writeArgs(rc, args)
> 4
> invokeJava(isStatic = TRUE, className, methodName, ...)
> 3
> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
> source, options)
> 2
> read.df(sqlContext, filepath, "orc") at
> spark_api.R#108
>
> On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <fe...@hotmail.com>
> wrote:
>
> Firstly I don't have ORC data to verify but this should work:
>
> df <- loadDF(sqlContext, "data/path", "orc")
>
> Secondly, could you check if sparkR.stop() was called? sparkRHive.init()
> should be called after sparkR.init() - please check if there is any error
> message there.
>
> _____________________________
> From: Prem Sure <pr...@gmail.com>
> Sent: Tuesday, January 5, 2016 8:12 AM
> Subject: Re: sparkR ORC support.
> To: Sandeep Khurana <sa...@infoworks.io>
> Cc: spark users <us...@spark.apache.org>, Deepak Sharma <
> deepakmca05@gmail.com>
>
>
>
> Yes Sandeep, also copy hive-site.xml too to spark conf directory.
>
>
> On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
> Also, do I need to setup hive in spark as per the link
> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark
> ?
>
> We might need to copy hdfs-site.xml file to spark conf directory ?
>
> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
> Deepak
>
> Tried this. Getting this error now
>
> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") : unused argument ("")
>
>
> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com>
> wrote:
>
> Hi Sandeep
> can you try this ?
>
> results <- sql(hivecontext, "FROM test SELECT id","")
>
> Thanks
> Deepak
>
>
> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
> Thanks Deepak.
>
> I tried this as well. I created a hivecontext with "hivecontext <<-
> sparkRHive.init(sc) " .
>
> When I tried to read hive table from this ,
>
> results <- sql(hivecontext, "FROM test SELECT id")
>
> I get below error,
>
> Error in callJMethod(sqlContext, "sql", sqlQuery) : Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>
>
> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>
>
>
> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
> wrote:
>
> Hi Sandeep
> I am not sure if ORC can be read directly in R.
> But there can be a workaround .First create hive table on top of ORC files
> and then access hive table in R.
>
> Thanks
> Deepak
>
> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
> Hello
>
> I need to read an ORC files in hdfs in R using spark. I am not able to
> find a package to do that.
>
> Can anyone help with documentation or example for this purpose?
>
> --
> Architect
> Infoworks.io <http://infoworks.io>
> http://Infoworks.io
>
>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>
>
>
>
> --
> Architect
> Infoworks.io <http://infoworks.io>
> http://Infoworks.io
>
>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>
>
>
>
> --
> Architect
> Infoworks.io <http://infoworks.io>
> http://Infoworks.io
>
>
>
>
> --
> Architect
> Infoworks.io <http://infoworks.io>
> http://Infoworks.io
>
>
>
>
>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>
>
>
--
Architect
Infoworks.io
http://Infoworks.io
Re: sparkR ORC support.
Posted by Felix Cheung <fe...@hotmail.com>.
Yes, as Yanbo suggested, it looks like there is something wrong with the sqlContext.
Could you forward us your code please?
On Wed, Jan 6, 2016 at 5:52 AM -0800, "Yanbo Liang" <yb...@gmail.com> wrote:
You should ensure your sqlContext is HiveContext.
sc <- sparkR.init()
sqlContext <- sparkRHive.init(sc)
2016-01-06 20:35 GMT+08:00 Sandeep Khurana <sa...@infoworks.io>:
> Felix
>
> I tried the option suggested by you. It gave below error. I am going to
> try the option suggested by Prem .
>
> Error in writeJobj(con, object) : invalid jobj 1
> 8
> stop("invalid jobj ", value$id)
> 7
> writeJobj(con, object)
> 6
> writeObject(con, a)
> 5
> writeArgs(rc, args)
> 4
> invokeJava(isStatic = TRUE, className, methodName, ...)
> 3
> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
> source, options)
> 2
> read.df(sqlContext, filepath, "orc") at
> spark_api.R#108
>
> On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <fe...@hotmail.com>
> wrote:
>
>> Firstly I don't have ORC data to verify but this should work:
>>
>> df <- loadDF(sqlContext, "data/path", "orc")
>>
>> Secondly, could you check if sparkR.stop() was called? sparkRHive.init()
>> should be called after sparkR.init() - please check if there is any error
>> message there.
>>
>> _____________________________
>> From: Prem Sure <pr...@gmail.com>
>> Sent: Tuesday, January 5, 2016 8:12 AM
>> Subject: Re: sparkR ORC support.
>> To: Sandeep Khurana <sa...@infoworks.io>
>> Cc: spark users <us...@spark.apache.org>, Deepak Sharma <
>> deepakmca05@gmail.com>
>>
>>
>>
>> Yes Sandeep, also copy hive-site.xml too to spark conf directory.
>>
>>
>> On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sa...@infoworks.io>
>> wrote:
>>
>>> Also, do I need to setup hive in spark as per the link
>>> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark
>>> ?
>>>
>>> We might need to copy hdfs-site.xml file to spark conf directory ?
>>>
>>> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sa...@infoworks.io>
>>> wrote:
>>>
>>>> Deepak
>>>>
>>>> Tried this. Getting this error now
>>>>
>>>> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") : unused argument ("")
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Sandeep
>>>>> can you try this ?
>>>>>
>>>>> results <- sql(hivecontext, "FROM test SELECT id","")
>>>>>
>>>>> Thanks
>>>>> Deepak
>>>>>
>>>>>
>>>>> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
>>>>> wrote:
>>>>>
>>>>>> Thanks Deepak.
>>>>>>
>>>>>> I tried this as well. I created a hivecontext with "hivecontext
>>>>>> <<- sparkRHive.init(sc) " .
>>>>>>
>>>>>> When I tried to read hive table from this ,
>>>>>>
>>>>>> results <- sql(hivecontext, "FROM test SELECT id")
>>>>>>
>>>>>> I get below error,
>>>>>>
>>>>>> Error in callJMethod(sqlContext, "sql", sqlQuery) : Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>>>>>>
>>>>>>
>>>>>> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Sandeep
>>>>>>> I am not sure if ORC can be read directly in R.
>>>>>>> But there can be a workaround .First create hive table on top of ORC
>>>>>>> files and then access hive table in R.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Deepak
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <
>>>>>>> sandeep@infoworks.io> wrote:
>>>>>>>
>>>>>>>> Hello
>>>>>>>>
>>>>>>>> I need to read an ORC files in hdfs in R using spark. I am not able
>>>>>>>> to find a package to do that.
>>>>>>>>
>>>>>>>> Can anyone help with documentation or example for this purpose?
>>>>>>>>
>>>>>>>> --
>>>>>>>> Architect
>>>>>>>> Infoworks.io <http://infoworks.io>
>>>>>>>> http://Infoworks.io
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Thanks
>>>>>>> Deepak
>>>>>>> www.bigdatabig.com
>>>>>>> www.keosha.net
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Architect
>>>>>> Infoworks.io <http://infoworks.io>
>>>>>> http://Infoworks.io
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks
>>>>> Deepak
>>>>> www.bigdatabig.com
>>>>> www.keosha.net
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Architect
>>>> Infoworks.io <http://infoworks.io>
>>>> http://Infoworks.io
>>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io <http://infoworks.io>
>>> http://Infoworks.io
>>>
>>
>>
>>
>>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>
Re: sparkR ORC support.
Posted by Yanbo Liang <yb...@gmail.com>.
You should ensure your sqlContext is HiveContext.
sc <- sparkR.init()
sqlContext <- sparkRHive.init(sc)
2016-01-06 20:35 GMT+08:00 Sandeep Khurana <sa...@infoworks.io>:
> Felix
>
> I tried the option suggested by you. It gave below error. I am going to
> try the option suggested by Prem .
>
> Error in writeJobj(con, object) : invalid jobj 1
> 8
> stop("invalid jobj ", value$id)
> 7
> writeJobj(con, object)
> 6
> writeObject(con, a)
> 5
> writeArgs(rc, args)
> 4
> invokeJava(isStatic = TRUE, className, methodName, ...)
> 3
> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
> source, options)
> 2
> read.df(sqlContext, filepath, "orc") at
> spark_api.R#108
>
> On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <fe...@hotmail.com>
> wrote:
>
>> Firstly I don't have ORC data to verify but this should work:
>>
>> df <- loadDF(sqlContext, "data/path", "orc")
>>
>> Secondly, could you check if sparkR.stop() was called? sparkRHive.init()
>> should be called after sparkR.init() - please check if there is any error
>> message there.
>>
>> _____________________________
>> From: Prem Sure <pr...@gmail.com>
>> Sent: Tuesday, January 5, 2016 8:12 AM
>> Subject: Re: sparkR ORC support.
>> To: Sandeep Khurana <sa...@infoworks.io>
>> Cc: spark users <us...@spark.apache.org>, Deepak Sharma <
>> deepakmca05@gmail.com>
>>
>>
>>
>> Yes Sandeep, also copy hive-site.xml too to spark conf directory.
>>
>>
>> On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sa...@infoworks.io>
>> wrote:
>>
>>> Also, do I need to setup hive in spark as per the link
>>> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark
>>> ?
>>>
>>> We might need to copy hdfs-site.xml file to spark conf directory ?
>>>
>>> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sa...@infoworks.io>
>>> wrote:
>>>
>>>> Deepak
>>>>
>>>> Tried this. Getting this error now
>>>>
>>>> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") : unused argument ("")
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Sandeep
>>>>> can you try this ?
>>>>>
>>>>> results <- sql(hivecontext, "FROM test SELECT id","")
>>>>>
>>>>> Thanks
>>>>> Deepak
>>>>>
>>>>>
>>>>> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
>>>>> wrote:
>>>>>
>>>>>> Thanks Deepak.
>>>>>>
>>>>>> I tried this as well. I created a hivecontext with "hivecontext
>>>>>> <<- sparkRHive.init(sc) " .
>>>>>>
>>>>>> When I tried to read hive table from this ,
>>>>>>
>>>>>> results <- sql(hivecontext, "FROM test SELECT id")
>>>>>>
>>>>>> I get below error,
>>>>>>
>>>>>> Error in callJMethod(sqlContext, "sql", sqlQuery) : Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>>>>>>
>>>>>>
>>>>>> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Sandeep
>>>>>>> I am not sure if ORC can be read directly in R.
>>>>>>> But there can be a workaround .First create hive table on top of ORC
>>>>>>> files and then access hive table in R.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Deepak
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <
>>>>>>> sandeep@infoworks.io> wrote:
>>>>>>>
>>>>>>>> Hello
>>>>>>>>
>>>>>>>> I need to read an ORC files in hdfs in R using spark. I am not able
>>>>>>>> to find a package to do that.
>>>>>>>>
>>>>>>>> Can anyone help with documentation or example for this purpose?
>>>>>>>>
>>>>>>>> --
>>>>>>>> Architect
>>>>>>>> Infoworks.io <http://infoworks.io>
>>>>>>>> http://Infoworks.io
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Thanks
>>>>>>> Deepak
>>>>>>> www.bigdatabig.com
>>>>>>> www.keosha.net
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Architect
>>>>>> Infoworks.io <http://infoworks.io>
>>>>>> http://Infoworks.io
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks
>>>>> Deepak
>>>>> www.bigdatabig.com
>>>>> www.keosha.net
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Architect
>>>> Infoworks.io <http://infoworks.io>
>>>> http://Infoworks.io
>>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io <http://infoworks.io>
>>> http://Infoworks.io
>>>
>>
>>
>>
>>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>
Re: sparkR ORC support.
Posted by Sandeep Khurana <sa...@infoworks.io>.
Felix
I tried the option suggested by you. It gave below error. I am going to
try the option suggested by Prem .
Error in writeJobj(con, object) : invalid jobj 1
8
stop("invalid jobj ", value$id)
7
writeJobj(con, object)
6
writeObject(con, a)
5
writeArgs(rc, args)
4
invokeJava(isStatic = TRUE, className, methodName, ...)
3
callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
source, options)
2
read.df(sqlContext, filepath, "orc") at
spark_api.R#108
On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <fe...@hotmail.com>
wrote:
> Firstly I don't have ORC data to verify but this should work:
>
> df <- loadDF(sqlContext, "data/path", "orc")
>
> Secondly, could you check if sparkR.stop() was called? sparkRHive.init()
> should be called after sparkR.init() - please check if there is any error
> message there.
>
> _____________________________
> From: Prem Sure <pr...@gmail.com>
> Sent: Tuesday, January 5, 2016 8:12 AM
> Subject: Re: sparkR ORC support.
> To: Sandeep Khurana <sa...@infoworks.io>
> Cc: spark users <us...@spark.apache.org>, Deepak Sharma <
> deepakmca05@gmail.com>
>
>
>
> Yes Sandeep, also copy hive-site.xml too to spark conf directory.
>
>
> On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
>> Also, do I need to setup hive in spark as per the link
>> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark
>> ?
>>
>> We might need to copy hdfs-site.xml file to spark conf directory ?
>>
>> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sa...@infoworks.io>
>> wrote:
>>
>>> Deepak
>>>
>>> Tried this. Getting this error now
>>>
>>> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") : unused argument ("")
>>>
>>>
>>> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com>
>>> wrote:
>>>
>>>> Hi Sandeep
>>>> can you try this ?
>>>>
>>>> results <- sql(hivecontext, "FROM test SELECT id","")
>>>>
>>>> Thanks
>>>> Deepak
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
>>>> wrote:
>>>>
>>>>> Thanks Deepak.
>>>>>
>>>>> I tried this as well. I created a hivecontext with "hivecontext <<-
>>>>> sparkRHive.init(sc) " .
>>>>>
>>>>> When I tried to read hive table from this ,
>>>>>
>>>>> results <- sql(hivecontext, "FROM test SELECT id")
>>>>>
>>>>> I get below error,
>>>>>
>>>>> Error in callJMethod(sqlContext, "sql", sqlQuery) : Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>>>>>
>>>>>
>>>>> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Sandeep
>>>>>> I am not sure if ORC can be read directly in R.
>>>>>> But there can be a workaround .First create hive table on top of ORC
>>>>>> files and then access hive table in R.
>>>>>>
>>>>>> Thanks
>>>>>> Deepak
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sandeep@infoworks.io
>>>>>> > wrote:
>>>>>>
>>>>>>> Hello
>>>>>>>
>>>>>>> I need to read an ORC files in hdfs in R using spark. I am not able
>>>>>>> to find a package to do that.
>>>>>>>
>>>>>>> Can anyone help with documentation or example for this purpose?
>>>>>>>
>>>>>>> --
>>>>>>> Architect
>>>>>>> Infoworks.io <http://infoworks.io>
>>>>>>> http://Infoworks.io
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks
>>>>>> Deepak
>>>>>> www.bigdatabig.com
>>>>>> www.keosha.net
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Architect
>>>>> Infoworks.io <http://infoworks.io>
>>>>> http://Infoworks.io
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks
>>>> Deepak
>>>> www.bigdatabig.com
>>>> www.keosha.net
>>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io <http://infoworks.io>
>>> http://Infoworks.io
>>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io <http://infoworks.io>
>> http://Infoworks.io
>>
>
>
>
>
--
Architect
Infoworks.io
http://Infoworks.io
Re: sparkR ORC support.
Posted by Felix Cheung <fe...@hotmail.com>.
Firstly I don't have ORC data to verify but this should work:
df <- loadDF(sqlContext, "data/path", "orc")
Secondly, could you check if sparkR.stop() was called? sparkRHive.init() should be called after sparkR.init() - please check if there is any error message there.
_____________________________
From: Prem Sure <pr...@gmail.com>
Sent: Tuesday, January 5, 2016 8:12 AM
Subject: Re: sparkR ORC support.
To: Sandeep Khurana <sa...@infoworks.io>
Cc: spark users <us...@spark.apache.org>, Deepak Sharma <de...@gmail.com>
Yes Sandeep, also copy hive-site.xml too to spark conf directory.
On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sa...@infoworks.io> wrote:
Also, do I need to setup hive in spark as per the link http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark ?
We might need to copy hdfs-site.xml file to spark conf directory ?
On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sa...@infoworks.io> wrote:
Deepak
Tried this. Getting this error now rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") : unused argument ("")
On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com> wrote:
Hi Sandeep can you try this ?
results <- sql(hivecontext, "FROM test SELECT id","")
Thanks Deepak
On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io> wrote:
Thanks Deepak.
I tried this as well. I created a hivecontext with "hivecontext <<- sparkRHive.init(sc) " .
When I tried to read hive table from this ,
results <- sql(hivecontext, "FROM test SELECT id")
I get below error,
Error in callJMethod(sqlContext, "sql", sqlQuery) : Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
Not sure what is causing this? Any leads or ideas? I am using rstudio.
On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com> wrote:
Hi Sandeep I am not sure if ORC can be read directly in R. But there can be a workaround .First create hive table on top of ORC files and then access hive table in R.
Thanks Deepak
On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io> wrote:
Hello
I need to read an ORC files in hdfs in R using spark. I am not able to find a package to do that.
Can anyone help with documentation or example for this purpose?
--
Architect
Infoworks.io
http://Infoworks.io
--
Thanks
Deepak
www.bigdatabig.com
www.keosha.net
--
Architect
Infoworks.io
http://Infoworks.io
--
Thanks
Deepak
www.bigdatabig.com
www.keosha.net
--
Architect
Infoworks.io
http://Infoworks.io
--
Architect
Infoworks.io
http://Infoworks.io
Re: sparkR ORC support.
Posted by Prem Sure <pr...@gmail.com>.
Yes Sandeep, also copy hive-site.xml too to spark conf directory.
On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sa...@infoworks.io>
wrote:
> Also, do I need to setup hive in spark as per the link
> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark
> ?
>
> We might need to copy hdfs-site.xml file to spark conf directory ?
>
> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
>> Deepak
>>
>> Tried this. Getting this error now
>>
>> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") :
>> unused argument ("")
>>
>>
>> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com>
>> wrote:
>>
>>> Hi Sandeep
>>> can you try this ?
>>>
>>> results <- sql(hivecontext, "FROM test SELECT id","")
>>>
>>> Thanks
>>> Deepak
>>>
>>>
>>> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
>>> wrote:
>>>
>>>> Thanks Deepak.
>>>>
>>>> I tried this as well. I created a hivecontext with "hivecontext <<-
>>>> sparkRHive.init(sc) " .
>>>>
>>>> When I tried to read hive table from this ,
>>>>
>>>> results <- sql(hivecontext, "FROM test SELECT id")
>>>>
>>>> I get below error,
>>>>
>>>> Error in callJMethod(sqlContext, "sql", sqlQuery) :
>>>> Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>>>>
>>>>
>>>> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>>>>
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Sandeep
>>>>> I am not sure if ORC can be read directly in R.
>>>>> But there can be a workaround .First create hive table on top of ORC
>>>>> files and then access hive table in R.
>>>>>
>>>>> Thanks
>>>>> Deepak
>>>>>
>>>>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io>
>>>>> wrote:
>>>>>
>>>>>> Hello
>>>>>>
>>>>>> I need to read an ORC files in hdfs in R using spark. I am not able
>>>>>> to find a package to do that.
>>>>>>
>>>>>> Can anyone help with documentation or example for this purpose?
>>>>>>
>>>>>> --
>>>>>> Architect
>>>>>> Infoworks.io
>>>>>> http://Infoworks.io
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks
>>>>> Deepak
>>>>> www.bigdatabig.com
>>>>> www.keosha.net
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Architect
>>>> Infoworks.io
>>>> http://Infoworks.io
>>>>
>>>
>>>
>>>
>>> --
>>> Thanks
>>> Deepak
>>> www.bigdatabig.com
>>> www.keosha.net
>>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io
>> http://Infoworks.io
>>
>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>
Re: sparkR ORC support.
Posted by Sandeep Khurana <sa...@infoworks.io>.
Also, do I need to setup hive in spark as per the link
http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark ?
We might need to copy hdfs-site.xml file to spark conf directory ?
On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sa...@infoworks.io>
wrote:
> Deepak
>
> Tried this. Getting this error now
>
> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") :
> unused argument ("")
>
>
> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com>
> wrote:
>
>> Hi Sandeep
>> can you try this ?
>>
>> results <- sql(hivecontext, "FROM test SELECT id","")
>>
>> Thanks
>> Deepak
>>
>>
>> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
>> wrote:
>>
>>> Thanks Deepak.
>>>
>>> I tried this as well. I created a hivecontext with "hivecontext <<-
>>> sparkRHive.init(sc) " .
>>>
>>> When I tried to read hive table from this ,
>>>
>>> results <- sql(hivecontext, "FROM test SELECT id")
>>>
>>> I get below error,
>>>
>>> Error in callJMethod(sqlContext, "sql", sqlQuery) :
>>> Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>>>
>>>
>>> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>>>
>>>
>>>
>>> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
>>> wrote:
>>>
>>>> Hi Sandeep
>>>> I am not sure if ORC can be read directly in R.
>>>> But there can be a workaround .First create hive table on top of ORC
>>>> files and then access hive table in R.
>>>>
>>>> Thanks
>>>> Deepak
>>>>
>>>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io>
>>>> wrote:
>>>>
>>>>> Hello
>>>>>
>>>>> I need to read an ORC files in hdfs in R using spark. I am not able to
>>>>> find a package to do that.
>>>>>
>>>>> Can anyone help with documentation or example for this purpose?
>>>>>
>>>>> --
>>>>> Architect
>>>>> Infoworks.io
>>>>> http://Infoworks.io
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks
>>>> Deepak
>>>> www.bigdatabig.com
>>>> www.keosha.net
>>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io
>>> http://Infoworks.io
>>>
>>
>>
>>
>> --
>> Thanks
>> Deepak
>> www.bigdatabig.com
>> www.keosha.net
>>
>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>
--
Architect
Infoworks.io
http://Infoworks.io
Re: sparkR ORC support.
Posted by Sandeep Khurana <sa...@infoworks.io>.
Deepak
Tried this. Getting this error now
rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") :
unused argument ("")
On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com> wrote:
> Hi Sandeep
> can you try this ?
>
> results <- sql(hivecontext, "FROM test SELECT id","")
>
> Thanks
> Deepak
>
>
> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
>> Thanks Deepak.
>>
>> I tried this as well. I created a hivecontext with "hivecontext <<-
>> sparkRHive.init(sc) " .
>>
>> When I tried to read hive table from this ,
>>
>> results <- sql(hivecontext, "FROM test SELECT id")
>>
>> I get below error,
>>
>> Error in callJMethod(sqlContext, "sql", sqlQuery) :
>> Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>>
>>
>> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>>
>>
>>
>> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
>> wrote:
>>
>>> Hi Sandeep
>>> I am not sure if ORC can be read directly in R.
>>> But there can be a workaround .First create hive table on top of ORC
>>> files and then access hive table in R.
>>>
>>> Thanks
>>> Deepak
>>>
>>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io>
>>> wrote:
>>>
>>>> Hello
>>>>
>>>> I need to read an ORC files in hdfs in R using spark. I am not able to
>>>> find a package to do that.
>>>>
>>>> Can anyone help with documentation or example for this purpose?
>>>>
>>>> --
>>>> Architect
>>>> Infoworks.io
>>>> http://Infoworks.io
>>>>
>>>
>>>
>>>
>>> --
>>> Thanks
>>> Deepak
>>> www.bigdatabig.com
>>> www.keosha.net
>>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io
>> http://Infoworks.io
>>
>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>
--
Architect
Infoworks.io
http://Infoworks.io
Re: sparkR ORC support.
Posted by Deepak Sharma <de...@gmail.com>.
Hi Sandeep
can you try this ?
results <- sql(hivecontext, "FROM test SELECT id","")
Thanks
Deepak
On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
wrote:
> Thanks Deepak.
>
> I tried this as well. I created a hivecontext with "hivecontext <<-
> sparkRHive.init(sc) " .
>
> When I tried to read hive table from this ,
>
> results <- sql(hivecontext, "FROM test SELECT id")
>
> I get below error,
>
> Error in callJMethod(sqlContext, "sql", sqlQuery) :
> Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>
>
> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>
>
>
> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
> wrote:
>
>> Hi Sandeep
>> I am not sure if ORC can be read directly in R.
>> But there can be a workaround .First create hive table on top of ORC
>> files and then access hive table in R.
>>
>> Thanks
>> Deepak
>>
>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io>
>> wrote:
>>
>>> Hello
>>>
>>> I need to read an ORC files in hdfs in R using spark. I am not able to
>>> find a package to do that.
>>>
>>> Can anyone help with documentation or example for this purpose?
>>>
>>> --
>>> Architect
>>> Infoworks.io
>>> http://Infoworks.io
>>>
>>
>>
>>
>> --
>> Thanks
>> Deepak
>> www.bigdatabig.com
>> www.keosha.net
>>
>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>
--
Thanks
Deepak
www.bigdatabig.com
www.keosha.net
Re: sparkR ORC support.
Posted by Sandeep Khurana <sa...@infoworks.io>.
Thanks Deepak.
I tried this as well. I created a hivecontext with "hivecontext <<-
sparkRHive.init(sc) " .
When I tried to read hive table from this ,
results <- sql(hivecontext, "FROM test SELECT id")
I get below error,
Error in callJMethod(sqlContext, "sql", sqlQuery) :
Invalid jobj 2. If SparkR was restarted, Spark operations need to be
re-executed.
Not sure what is causing this? Any leads or ideas? I am using rstudio.
On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com> wrote:
> Hi Sandeep
> I am not sure if ORC can be read directly in R.
> But there can be a workaround .First create hive table on top of ORC files
> and then access hive table in R.
>
> Thanks
> Deepak
>
> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
>> Hello
>>
>> I need to read an ORC files in hdfs in R using spark. I am not able to
>> find a package to do that.
>>
>> Can anyone help with documentation or example for this purpose?
>>
>> --
>> Architect
>> Infoworks.io
>> http://Infoworks.io
>>
>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>
--
Architect
Infoworks.io
http://Infoworks.io
Re: sparkR ORC support.
Posted by Deepak Sharma <de...@gmail.com>.
Hi Sandeep
I am not sure if ORC can be read directly in R.
But there can be a workaround .First create hive table on top of ORC files
and then access hive table in R.
Thanks
Deepak
On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io>
wrote:
> Hello
>
> I need to read an ORC files in hdfs in R using spark. I am not able to
> find a package to do that.
>
> Can anyone help with documentation or example for this purpose?
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>
--
Thanks
Deepak
www.bigdatabig.com
www.keosha.net