You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sandeep Khurana <sa...@infoworks.io> on 2016/01/05 12:27:59 UTC

sparkR ORC support.

Hello

I need to read an ORC files in hdfs in R using spark. I am not able to find
a package to do that.

Can anyone help with documentation or example for this purpose?

-- 
Architect
Infoworks.io
http://Infoworks.io

Re: sparkR ORC support.

Posted by Sandeep Khurana <sa...@infoworks.io>.
I call stop from console as R studio warns  and advises it. And yes. after
stop was called the whole script was run again together. It means init
 "hivecontext <- sparkRHive.init(sc)" is called after stop always.

On Tue, Jan 12, 2016 at 8:31 PM, Felix Cheung <fe...@hotmail.com>
wrote:

> As you can see from my reply below from Jan 6, calling sparkR.stop()
> invalidates both sc and hivecontext you have and results in this invalid
> jobj error.
>
> If you start R and run this, it should work:
>
> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>
> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
> library(SparkR)
>
> sc <- sparkR.init()
> hivecontext <- sparkRHive.init(sc)
> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>
>
> Is there a reason you want to call stop? If you do, you would need to call
> the line hivecontext <- sparkRHive.init(sc) again.
>
>
> _____________________________
> From: Sandeep Khurana <sa...@infoworks.io>
> Sent: Tuesday, January 12, 2016 5:20 AM
> Subject: Re: sparkR ORC support.
> To: Felix Cheung <fe...@hotmail.com>
> Cc: spark users <us...@spark.apache.org>, Prem Sure <pr...@gmail.com>,
> Deepak Sharma <de...@gmail.com>, Yanbo Liang <yb...@gmail.com>
>
>
> It worked for sometime. Then I did  sparkR.stop() an re-ran again to get
> the same error. Any idea why it ran fine before ( while running fine it
> kept giving warning reusing existing spark-context and that I should
> restart) ? There is one more R code which instantiated spark , I ran that
> too again.
>
>
> On Tue, Jan 12, 2016 at 3:05 PM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
>> Complete stacktrace is. Can it be something wih java versions?
>>
>>
>> stop("invalid jobj ", value$id)
>> 8
>> writeJobj(con, object)
>> 7
>> writeObject(con, a)
>> 6
>> writeArgs(rc, args)
>> 5
>> invokeJava(isStatic = TRUE, className, methodName, ...)
>> 4
>> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
>> source, options)
>> 3
>> read.df(sqlContext, path, source, schema, ...)
>> 2
>> loadDF(hivecontext, filepath, "orc")
>>
>> On Tue, Jan 12, 2016 at 2:41 PM, Sandeep Khurana <sa...@infoworks.io>
>> wrote:
>>
>>> Running this gave
>>>
>>> 16/01/12 04:06:54 INFO BlockManagerMaster: Registered BlockManagerError in writeJobj(con, object) : invalid jobj 3
>>>
>>>
>>> How does it know which hive schema to connect to?
>>>
>>>
>>>
>>> On Tue, Jan 12, 2016 at 2:34 PM, Felix Cheung <felixcheung_m@hotmail.com
>>> > wrote:
>>>
>>>> It looks like you have overwritten sc. Could you try this:
>>>>
>>>>
>>>> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>>>>
>>>> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"),
>>>> .libPaths()))
>>>> library(SparkR)
>>>>
>>>> sc <- sparkR.init()
>>>> hivecontext <- sparkRHive.init(sc)
>>>> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>> Date: Tue, 12 Jan 2016 14:28:58 +0530
>>>> Subject: Re: sparkR ORC support.
>>>> From: sandeep@infoworks.io
>>>> To: felixcheung_m@hotmail.com
>>>> CC: ybliang8@gmail.com; user@spark.apache.org; premsure542@gmail.com;
>>>> deepakmca05@gmail.com
>>>>
>>>>
>>>> The code is very simple, pasted below .
>>>> hive-site.xml is in spark conf already. I still see this error
>>>>
>>>> Error in writeJobj(con, object) : invalid jobj 3
>>>>
>>>> after running the script  below
>>>>
>>>>
>>>> script
>>>> =======
>>>> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>>>>
>>>>
>>>> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"),
>>>> .libPaths()))
>>>> library(SparkR)
>>>>
>>>> sc <<- sparkR.init()
>>>> sc <<- sparkRHive.init()
>>>> hivecontext <<- sparkRHive.init(sc)
>>>> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>>>> #View(df)
>>>>
>>>>
>>>> On Wed, Jan 6, 2016 at 11:08 PM, Felix Cheung <
>>>> felixcheung_m@hotmail.com> wrote:
>>>>
>>>> Yes, as Yanbo suggested, it looks like there is something wrong with
>>>> the sqlContext.
>>>>
>>>> Could you forward us your code please?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Jan 6, 2016 at 5:52 AM -0800, "Yanbo Liang" <ybliang8@gmail.com
>>>> > wrote:
>>>>
>>>> You should ensure your sqlContext is HiveContext.
>>>>
>>>> sc <- sparkR.init()
>>>>
>>>> sqlContext <- sparkRHive.init(sc)
>>>>
>>>>
>>>> 2016-01-06 20:35 GMT+08:00 Sandeep Khurana <sa...@infoworks.io>:
>>>>
>>>> Felix
>>>>
>>>> I tried the option suggested by you.  It gave below error.  I am going
>>>> to try the option suggested by Prem .
>>>>
>>>> Error in writeJobj(con, object) : invalid jobj 1
>>>> 8
>>>> stop("invalid jobj ", value$id)
>>>> 7
>>>> writeJobj(con, object)
>>>> 6
>>>> writeObject(con, a)
>>>> 5
>>>> writeArgs(rc, args)
>>>> 4
>>>> invokeJava(isStatic = TRUE, className, methodName, ...)
>>>> 3
>>>> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF",
>>>> sqlContext, source, options)
>>>> 2
>>>> read.df(sqlContext, filepath, "orc") at
>>>> spark_api.R#108
>>>>
>>>> On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <
>>>> felixcheung_m@hotmail.com> wrote:
>>>>
>>>> Firstly I don't have ORC data to verify but this should work:
>>>>
>>>> df <- loadDF(sqlContext, "data/path", "orc")
>>>>
>>>> Secondly, could you check if sparkR.stop() was called?
>>>> sparkRHive.init() should be called after sparkR.init() - please check if
>>>> there is any error message there.
>>>>
>>>> _____________________________
>>>> From: Prem Sure < premsure542@gmail.com>
>>>> Sent: Tuesday, January 5, 2016 8:12 AM
>>>> Subject: Re: sparkR ORC support.
>>>> To: Sandeep Khurana < sandeep@infoworks.io>
>>>> Cc: spark users < user@spark.apache.org>, Deepak Sharma <
>>>> deepakmca05@gmail.com>
>>>>
>>>>
>>>>
>>>> Yes Sandeep, also copy hive-site.xml too to spark conf directory.
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sa...@infoworks.io>
>>>> wrote:
>>>>
>>>> Also, do I need to setup hive in spark as per the link
>>>> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark
>>>> ?
>>>>
>>>> We might need to copy hdfs-site.xml file to spark conf directory ?
>>>>
>>>> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sa...@infoworks.io>
>>>> wrote:
>>>>
>>>> Deepak
>>>>
>>>> Tried this. Getting this error now
>>>>
>>>> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") :   unused argument ("")
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi Sandeep
>>>> can you try this ?
>>>>
>>>> results <- sql(hivecontext, "FROM test SELECT id","")
>>>>
>>>> Thanks
>>>> Deepak
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
>>>> wrote:
>>>>
>>>> Thanks Deepak.
>>>>
>>>> I tried this as well. I created a hivecontext   with  "hivecontext <<-
>>>> sparkRHive.init(sc) "  .
>>>>
>>>> When I tried to read hive table from this ,
>>>>
>>>> results <- sql(hivecontext, "FROM test SELECT id")
>>>>
>>>> I get below error,
>>>>
>>>> Error in callJMethod(sqlContext, "sql", sqlQuery) :   Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>>>>
>>>>
>>>> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>>>>
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi Sandeep
>>>> I am not sure if ORC can be read directly in R.
>>>> But there can be a workaround .First create hive table on top of ORC
>>>> files and then access hive table in R.
>>>>
>>>> Thanks
>>>> Deepak
>>>>
>>>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io>
>>>> wrote:
>>>>
>>>> Hello
>>>>
>>>> I need to read an ORC files in hdfs in R using spark. I am not able to
>>>> find a package to do that.
>>>>
>>>> Can anyone help with documentation or example for this purpose?
>>>>
>>>> --
>>>> Architect
>>>> Infoworks.io <http://infoworks.io>
>>>> http://Infoworks.io
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks
>>>> Deepak
>>>> www.bigdatabig.com
>>>> www.keosha.net
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks
>>>> Deepak
>>>> www.bigdatabig.com
>>>> www.keosha.net
>>>>
>>>>
>>>>
>>>>
>>>>

Re: sparkR ORC support.

Posted by Felix Cheung <fe...@hotmail.com>.
As you can see from my reply below from Jan 6, calling sparkR.stop() invalidates both sc and hivecontext you have and results in this invalid jobj error.
If you start R and run this, it should work:
Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))library(SparkR)
sc <- sparkR.init()hivecontext <- sparkRHive.init(sc)df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc") 
Is there a reason you want to call stop? If you do, you would need to call the line hivecontext <- sparkRHive.init(sc) again.



    _____________________________
From: Sandeep Khurana <sa...@infoworks.io>
Sent: Tuesday, January 12, 2016 5:20 AM
Subject: Re: sparkR ORC support.
To: Felix Cheung <fe...@hotmail.com>
Cc: spark users <us...@spark.apache.org>, Prem Sure <pr...@gmail.com>, Deepak Sharma <de...@gmail.com>, Yanbo Liang <yb...@gmail.com>


       It worked for sometime. Then I did  sparkR.stop() an re-ran again to get the same error. Any idea why it ran fine before ( while running fine it kept giving warning reusing existing spark-context and that I should restart) ? There is one more R code which instantiated spark , I ran that too again.       
               
             On Tue, Jan 12, 2016 at 3:05 PM, Sandeep Khurana        <sa...@infoworks.io> wrote:       
                        Complete stacktrace is. Can it be something wih java versions?                    
                     
                                                                                      stop("invalid jobj ", value$id)                                                                                          8                                                    writeJobj(con, object)                                                                                      7                                                    writeObject(con, a)                                                                                      6                                                    writeArgs(rc, args)                                                                                      5                                                                   invokeJava(isStatic = TRUE, className, methodName, ...)                                                                                         4                                                                   callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext, source, options)                                                                                         3                                                    read.df(sqlContext, path, source, schema, ...)                                                                                      2                                                    loadDF(hivecontext, filepath, "orc")                                                                                                     
                       On Tue, Jan 12, 2016 at 2:41 PM, Sandeep Khurana             <sa...@infoworks.io> wrote:            
                                       Running this gave                             
                                           16/01/12 04:06:54 INFO BlockManagerMaster: Registered BlockManagerError in writeJobj(con, object) : invalid jobj 3               
               How does it know which hive schema to connect to?
                                           
                                                                                     
                                 On Tue, Jan 12, 2016 at 2:34 PM, Felix Cheung                  <fe...@hotmail.com> wrote:                 
                                                                           It looks like you have overwritten sc. Could you try this:                    
 
 
                                           Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")                                                                
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))                                                                library(SparkR)                                                                
sc <- sparkR.init()                                                                                    hivecontext <- sparkRHive.init(sc)                                                                df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")                      

 
                                         Date: Tue, 12 Jan 2016 14:28:58 +0530                     
Subject: Re: sparkR ORC support.
From:                      sandeep@infoworks.io                     
To:                      felixcheung_m@hotmail.com                     
CC:                      ybliang8@gmail.com;                      user@spark.apache.org;                      premsure542@gmail.com;                      deepakmca05@gmail.com                                                                  
                       
                                               The code is very simple, pasted below .                                                   hive-site.xml is in spark conf already. I still see this error                                                                                                           Error in writeJobj(con, object) : invalid jobj 3                                                                                                                                                                                                                                                  after running the script  below                                                   
                                                                            
                                                                            script                                                                            =======                                                                                                       Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")                                                                               
                                                                               
                                                                               .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))                                                                               library(SparkR)                                                                               
                                                                               sc <<- sparkR.init()                                                                               sc <<- sparkRHive.init()                                                                               hivecontext <<- sparkRHive.init(sc)                                                                               df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")                                                                               #View(df)                                                                                                      
                                                                                                                       
                                                 On Wed, Jan 6, 2016 at 11:08 PM, Felix Cheung                          <fe...@hotmail.com> wrote:                         
                                                                                                            Yes, as Yanbo suggested, it looks like there is something wrong with the sqlContext.                                                                                   
                                                                                    Could you forward us your code please?                            
                             
                                                          
                                                                                                                 
                              
                              
                                                            On Wed, Jan 6, 2016 at 5:52 AM -0800, "Yanbo Liang"                                <yb...@gmail.com> wrote:                              
                               
                                                                                                                          You should ensure your sqlContext is HiveContext.                                                                 sc <- sparkR.init()                                 sqlContext <- sparkRHive.init(sc)                                                                                                                             
                                                                2016-01-06 20:35 GMT+08:00 Sandeep Khurana                                  <sa...@infoworks.io>:                                
                                                                                                     Felix                                                                      
                                                                                                         I tried the option suggested by you.  It gave below error.  I am going to try the option suggested by Prem .                                                                                                        
                                                                                                                                               Error in writeJobj(con, object) : invalid jobj 1                                                                                                                                                                                                                             8                                                                                                                                                         stop("invalid jobj ", value$id)                                                                                                                                                                                                                                                                         7                                                                                                                                                         writeJobj(con, object)                                                                                                                                                                                                                                                                         6                                                                                                                                                         writeObject(con, a)                                                                                                                                                                                                                                                                         5                                                                                                                                                         writeArgs(rc, args)                                                                                                                                                                                                                                                                         4                                                                                                                                                         invokeJava(isStatic = TRUE, className, methodName, ...)                                                                                                                                                                                                                                                                         3                                                                                                                                                         callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext, source, options)                                                                                                                                                                                                                                                                         2                                                                                                                                                         read.df(sqlContext, filepath, "orc") at                                                                             spark_api.R#108                                                                                                                                                                                                                                                                                                                                
                                                                          On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung                                       <fe...@hotmail.com> wrote:                                     
                                                                                                                                                             Firstly I don't have ORC data to verify but this should work:                                                                                                                       
                                                                                                                        df <- loadDF(sqlContext, "data/path", "orc")                                                                                                                       
                                                                                                                        Secondly, could you check if sparkR.stop() was called? sparkRHive.init() should be called after sparkR.init() - please check if there is any error message there.                                        
                                                                                  
                                                                                                                                                             _____________________________                                       
 From: Prem Sure <                                       premsure542@gmail.com>                                       
 Sent: Tuesday, January 5, 2016 8:12 AM                                       
 Subject: Re: sparkR ORC support.                                       
 To: Sandeep Khurana <                                       sandeep@infoworks.io>                                       
 Cc: spark users <                                       user@spark.apache.org>, Deepak Sharma <                                       deepakmca05@gmail.com>                                                                                                                          
                                          
                                          
                                                                                    Yes Sandeep, also copy hive-site.xml too to spark conf directory.                                           
                                           
                                                                                                                              
                                                                                      On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana                                             <sa...@infoworks.io> wrote:                                            
                                                                                                                                      Also, do I need to setup hive in spark as per the link                                                http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark ?                                                                                            
                                                                                                                                          We might need to copy hdfs-site.xml file to spark conf directory ?                                                                                                                                                                                                                                                                                    
                                                                                                On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana                                                  <sa...@infoworks.io> wrote:                                                 
                                                                                                                                                     Deepak                                                                                                      
                                                                                                                                                         Tried this. Getting this error now                                                                                                                                                          rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") :   unused argument ("")                                                                                                                                                                                                                                                                                                                  
                                                                                                          On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma                                                       <de...@gmail.com> wrote:                                                      
                                                                                                                                                                                                                             Hi Sandeep                                                                                                                                                                        can you try this ?                                                                                                                                                                        
                                                                                                                                                                                                                                  results <- sql(hivecontext, "FROM test SELECT id","")                                                                                                                                                                           
                                                                                                                                                                           Thanks                                                                                                                                                                                                                                        Deepak                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                                                                                
                                                                                                                    On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana                                                            <sa...@infoworks.io> wrote:                                                           
                                                                                                                                                                                   Thanks Deepak.                                                                                                                          
                                                                                                                                                                                       I tried this as well. I created a hivecontext   with  "hivecontext <<- sparkRHive.init(sc) "  .                                                                                                                                                                                       
                                                                                                                                                                                       When I tried to read hive table from this ,                                                                                                                                                                                        
                                                                                                                                                                                       results <- sql(hivecontext, "FROM test SELECT id")                                                                                                                                                                                       
                                                                                                                                                                                       I get below error,                                                                                                                                                                                        
                                                                                                                                                                                        Error in callJMethod(sqlContext, "sql", sqlQuery) :   Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.                                                              
                                                              Not sure what is causing this? Any leads or ideas? I am using rstudio.                                                               
                                                                                                                                                                                                                                                                                                                                                                              
                                                                                                                              On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma                                                                 <de...@gmail.com> wrote:                                                                
                                                                                                                                                                                                                                                                     Hi Sandeep                                                                                                                                                                                                      I am not sure if ORC can be read directly in R.                                                                                                                                                                                                      But there can be a workaround .First create hive table on top of ORC files and then access hive table in R.                                                                                                                                                                                                      
                                                                                                                                                                                                      Thanks                                                                                                                                                                                                      Deepak                                                                                                                                                                                                                                                                                                                                                                                                            
                                                                                                                                        On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana                                                                      <sa...@infoworks.io> wrote:                                                                     
                                                                                                                                                                                                                 Hello                                                                                                                                              
                                                                                                                                                                                                                     I need to read an ORC files in hdfs in R using spark. I am not able to find a package to do that.                                                                                                                                                                                                                      
                                                                                                                                                                                                                     Can anyone help with documentation or example for this purpose?                                                                        
                                                                                                                                                    
                                                                           -- 
                                                                                                                                                                                                                                                                                                                                                                                           Architect                                                                              
                                                                                                                                                          Infoworks.io                                                                             
                                                                                                                                                        http://Infoworks.io                                                                            
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
                                                                    
                                                                    
                                                                                                                                                                                                       -- 
                                                                                                                                                                                                              Thanks                                                                      
 Deepak                                                                      
                                                                      www.bigdatabig.com                                                                      
                                                                      www.keosha.net                                                                                                                                                                                                                                                                                                                                                                                                        
                                                               
                                                                                                                              
                                                                --                                                               
                                                                                                                                                                                                                                                                                                                                    Architect                                                                   
                                                                                                                                    Infoworks.io                                                                  
                                                                                                                                  http://Infoworks.io                                                                 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
                                                          
                                                          
 --                                                          
                                                                                                                                                                                Thanks                                                            
 Deepak                                                            
                                                            www.bigdatabig.com                                                            
                                                            www.keosha.net                                                                                                                                                                                                                                                                                                                                                                                                                                                             
                                                     
                                                                                                          
                                                      --                                                     
                                                                                                                                                                                                                                                                                  Architect                                                         
                                                                                                                Infoworks.io                                                        
                                                                                                              http://Infoworks.io                                                       
                                                                                                                                                                                                                                                                                                                                                                                                                     
                                                
                                                                                                
                                                 --                                                
                                                                                                                                                                                                                                                         Architect                                                    
                                                                                                      Infoworks.io                                                   
                                                                                                    http://Infoworks.io                                                  
                                                                                                                                                                                                                                                                                                                                                                             
                                                                                    
                                          
                                                                                                                                                                                                                                        
                                     
                                                                          
                                      --                                     
                                                                                                                                                                                                  Architect                                        
                                         Infoworks.io                                       
                                                                              http://Infoworks.io                                      
                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                         
                        
                                                 
                        --                         
                                                                                                                                  Architect                            
                           Infoworks.io                           
                                                    http://Infoworks.io                          
                                                                                                                                                                                                                                
                
                                 
                --                 
                                                                                          Architect                    
                   Infoworks.io                   
                                    http://Infoworks.io                  
                                                                                                               
           
                       
           --            
                                                                 Architect               
              Infoworks.io              
                          http://Infoworks.io             
                                                                       
      
             
      --       
                                        Architect          
         Infoworks.io         
                http://Infoworks.io        
                              


  

Re: sparkR ORC support.

Posted by Sandeep Khurana <sa...@infoworks.io>.
It worked for sometime. Then I did  sparkR.stop() an re-ran again to get
the same error. Any idea why it ran fine before ( while running fine it
kept giving warning reusing existing spark-context and that I should
restart) ? There is one more R code which instantiated spark , I ran that
too again.


On Tue, Jan 12, 2016 at 3:05 PM, Sandeep Khurana <sa...@infoworks.io>
wrote:

> Complete stacktrace is. Can it be something wih java versions?
>
>
> stop("invalid jobj ", value$id)
> 8
> writeJobj(con, object)
> 7
> writeObject(con, a)
> 6
> writeArgs(rc, args)
> 5
> invokeJava(isStatic = TRUE, className, methodName, ...)
> 4
> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
> source, options)
> 3
> read.df(sqlContext, path, source, schema, ...)
> 2
> loadDF(hivecontext, filepath, "orc")
>
> On Tue, Jan 12, 2016 at 2:41 PM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
>> Running this gave
>>
>> 16/01/12 04:06:54 INFO BlockManagerMaster: Registered BlockManagerError in writeJobj(con, object) : invalid jobj 3
>>
>>
>> How does it know which hive schema to connect to?
>>
>>
>>
>> On Tue, Jan 12, 2016 at 2:34 PM, Felix Cheung <fe...@hotmail.com>
>> wrote:
>>
>>> It looks like you have overwritten sc. Could you try this:
>>>
>>>
>>> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>>>
>>> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"),
>>> .libPaths()))
>>> library(SparkR)
>>>
>>> sc <- sparkR.init()
>>> hivecontext <- sparkRHive.init(sc)
>>> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>>>
>>>
>>>
>>> ------------------------------
>>> Date: Tue, 12 Jan 2016 14:28:58 +0530
>>> Subject: Re: sparkR ORC support.
>>> From: sandeep@infoworks.io
>>> To: felixcheung_m@hotmail.com
>>> CC: ybliang8@gmail.com; user@spark.apache.org; premsure542@gmail.com;
>>> deepakmca05@gmail.com
>>>
>>>
>>> The code is very simple, pasted below .
>>> hive-site.xml is in spark conf already. I still see this error
>>>
>>> Error in writeJobj(con, object) : invalid jobj 3
>>>
>>> after running the script  below
>>>
>>>
>>> script
>>> =======
>>> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>>>
>>>
>>> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"),
>>> .libPaths()))
>>> library(SparkR)
>>>
>>> sc <<- sparkR.init()
>>> sc <<- sparkRHive.init()
>>> hivecontext <<- sparkRHive.init(sc)
>>> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>>> #View(df)
>>>
>>>
>>> On Wed, Jan 6, 2016 at 11:08 PM, Felix Cheung <felixcheung_m@hotmail.com
>>> > wrote:
>>>
>>> Yes, as Yanbo suggested, it looks like there is something wrong with the
>>> sqlContext.
>>>
>>> Could you forward us your code please?
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Jan 6, 2016 at 5:52 AM -0800, "Yanbo Liang" <yb...@gmail.com>
>>> wrote:
>>>
>>> You should ensure your sqlContext is HiveContext.
>>>
>>> sc <- sparkR.init()
>>>
>>> sqlContext <- sparkRHive.init(sc)
>>>
>>>
>>> 2016-01-06 20:35 GMT+08:00 Sandeep Khurana <sa...@infoworks.io>:
>>>
>>> Felix
>>>
>>> I tried the option suggested by you.  It gave below error.  I am going
>>> to try the option suggested by Prem .
>>>
>>> Error in writeJobj(con, object) : invalid jobj 1
>>> 8
>>> stop("invalid jobj ", value$id)
>>> 7
>>> writeJobj(con, object)
>>> 6
>>> writeObject(con, a)
>>> 5
>>> writeArgs(rc, args)
>>> 4
>>> invokeJava(isStatic = TRUE, className, methodName, ...)
>>> 3
>>> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
>>> source, options)
>>> 2
>>> read.df(sqlContext, filepath, "orc") at
>>> spark_api.R#108
>>>
>>> On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <felixcheung_m@hotmail.com
>>> > wrote:
>>>
>>> Firstly I don't have ORC data to verify but this should work:
>>>
>>> df <- loadDF(sqlContext, "data/path", "orc")
>>>
>>> Secondly, could you check if sparkR.stop() was called? sparkRHive.init()
>>> should be called after sparkR.init() - please check if there is any error
>>> message there.
>>>
>>> _____________________________
>>> From: Prem Sure <pr...@gmail.com>
>>> Sent: Tuesday, January 5, 2016 8:12 AM
>>> Subject: Re: sparkR ORC support.
>>> To: Sandeep Khurana <sa...@infoworks.io>
>>> Cc: spark users <us...@spark.apache.org>, Deepak Sharma <
>>> deepakmca05@gmail.com>
>>>
>>>
>>>
>>> Yes Sandeep, also copy hive-site.xml too to spark conf directory.
>>>
>>>
>>> On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sa...@infoworks.io>
>>> wrote:
>>>
>>> Also, do I need to setup hive in spark as per the link
>>> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark
>>> ?
>>>
>>> We might need to copy hdfs-site.xml file to spark conf directory ?
>>>
>>> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sa...@infoworks.io>
>>> wrote:
>>>
>>> Deepak
>>>
>>> Tried this. Getting this error now
>>>
>>> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") :   unused argument ("")
>>>
>>>
>>> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com>
>>> wrote:
>>>
>>> Hi Sandeep
>>> can you try this ?
>>>
>>> results <- sql(hivecontext, "FROM test SELECT id","")
>>>
>>> Thanks
>>> Deepak
>>>
>>>
>>> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
>>> wrote:
>>>
>>> Thanks Deepak.
>>>
>>> I tried this as well. I created a hivecontext   with  "hivecontext <<-
>>> sparkRHive.init(sc) "  .
>>>
>>> When I tried to read hive table from this ,
>>>
>>> results <- sql(hivecontext, "FROM test SELECT id")
>>>
>>> I get below error,
>>>
>>> Error in callJMethod(sqlContext, "sql", sqlQuery) :   Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>>>
>>>
>>> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>>>
>>>
>>>
>>> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
>>> wrote:
>>>
>>> Hi Sandeep
>>> I am not sure if ORC can be read directly in R.
>>> But there can be a workaround .First create hive table on top of ORC
>>> files and then access hive table in R.
>>>
>>> Thanks
>>> Deepak
>>>
>>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io>
>>> wrote:
>>>
>>> Hello
>>>
>>> I need to read an ORC files in hdfs in R using spark. I am not able to
>>> find a package to do that.
>>>
>>> Can anyone help with documentation or example for this purpose?
>>>
>>> --
>>> Architect
>>> Infoworks.io <http://infoworks.io>
>>> http://Infoworks.io
>>>
>>>
>>>
>>>
>>> --
>>> Thanks
>>> Deepak
>>> www.bigdatabig.com
>>> www.keosha.net
>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io <http://infoworks.io>
>>> http://Infoworks.io
>>>
>>>
>>>
>>>
>>> --
>>> Thanks
>>> Deepak
>>> www.bigdatabig.com
>>> www.keosha.net
>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io <http://infoworks.io>
>>> http://Infoworks.io
>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io <http://infoworks.io>
>>> http://Infoworks.io
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io
>>> http://Infoworks.io
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io
>>> http://Infoworks.io
>>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io
>> http://Infoworks.io
>>
>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>



-- 
Architect
Infoworks.io
http://Infoworks.io

Re: sparkR ORC support.

Posted by Sandeep Khurana <sa...@infoworks.io>.
Complete stacktrace is. Can it be something wih java versions?


stop("invalid jobj ", value$id)
8
writeJobj(con, object)
7
writeObject(con, a)
6
writeArgs(rc, args)
5
invokeJava(isStatic = TRUE, className, methodName, ...)
4
callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
source, options)
3
read.df(sqlContext, path, source, schema, ...)
2
loadDF(hivecontext, filepath, "orc")

On Tue, Jan 12, 2016 at 2:41 PM, Sandeep Khurana <sa...@infoworks.io>
wrote:

> Running this gave
>
> 16/01/12 04:06:54 INFO BlockManagerMaster: Registered BlockManagerError in writeJobj(con, object) : invalid jobj 3
>
>
> How does it know which hive schema to connect to?
>
>
>
> On Tue, Jan 12, 2016 at 2:34 PM, Felix Cheung <fe...@hotmail.com>
> wrote:
>
>> It looks like you have overwritten sc. Could you try this:
>>
>>
>> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>>
>> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
>> library(SparkR)
>>
>> sc <- sparkR.init()
>> hivecontext <- sparkRHive.init(sc)
>> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>>
>>
>>
>> ------------------------------
>> Date: Tue, 12 Jan 2016 14:28:58 +0530
>> Subject: Re: sparkR ORC support.
>> From: sandeep@infoworks.io
>> To: felixcheung_m@hotmail.com
>> CC: ybliang8@gmail.com; user@spark.apache.org; premsure542@gmail.com;
>> deepakmca05@gmail.com
>>
>>
>> The code is very simple, pasted below .
>> hive-site.xml is in spark conf already. I still see this error
>>
>> Error in writeJobj(con, object) : invalid jobj 3
>>
>> after running the script  below
>>
>>
>> script
>> =======
>> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>>
>>
>> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
>> library(SparkR)
>>
>> sc <<- sparkR.init()
>> sc <<- sparkRHive.init()
>> hivecontext <<- sparkRHive.init(sc)
>> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>> #View(df)
>>
>>
>> On Wed, Jan 6, 2016 at 11:08 PM, Felix Cheung <fe...@hotmail.com>
>> wrote:
>>
>> Yes, as Yanbo suggested, it looks like there is something wrong with the
>> sqlContext.
>>
>> Could you forward us your code please?
>>
>>
>>
>>
>>
>> On Wed, Jan 6, 2016 at 5:52 AM -0800, "Yanbo Liang" <yb...@gmail.com>
>> wrote:
>>
>> You should ensure your sqlContext is HiveContext.
>>
>> sc <- sparkR.init()
>>
>> sqlContext <- sparkRHive.init(sc)
>>
>>
>> 2016-01-06 20:35 GMT+08:00 Sandeep Khurana <sa...@infoworks.io>:
>>
>> Felix
>>
>> I tried the option suggested by you.  It gave below error.  I am going to
>> try the option suggested by Prem .
>>
>> Error in writeJobj(con, object) : invalid jobj 1
>> 8
>> stop("invalid jobj ", value$id)
>> 7
>> writeJobj(con, object)
>> 6
>> writeObject(con, a)
>> 5
>> writeArgs(rc, args)
>> 4
>> invokeJava(isStatic = TRUE, className, methodName, ...)
>> 3
>> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
>> source, options)
>> 2
>> read.df(sqlContext, filepath, "orc") at
>> spark_api.R#108
>>
>> On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <fe...@hotmail.com>
>> wrote:
>>
>> Firstly I don't have ORC data to verify but this should work:
>>
>> df <- loadDF(sqlContext, "data/path", "orc")
>>
>> Secondly, could you check if sparkR.stop() was called? sparkRHive.init()
>> should be called after sparkR.init() - please check if there is any error
>> message there.
>>
>> _____________________________
>> From: Prem Sure <pr...@gmail.com>
>> Sent: Tuesday, January 5, 2016 8:12 AM
>> Subject: Re: sparkR ORC support.
>> To: Sandeep Khurana <sa...@infoworks.io>
>> Cc: spark users <us...@spark.apache.org>, Deepak Sharma <
>> deepakmca05@gmail.com>
>>
>>
>>
>> Yes Sandeep, also copy hive-site.xml too to spark conf directory.
>>
>>
>> On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sa...@infoworks.io>
>> wrote:
>>
>> Also, do I need to setup hive in spark as per the link
>> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark
>> ?
>>
>> We might need to copy hdfs-site.xml file to spark conf directory ?
>>
>> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sa...@infoworks.io>
>> wrote:
>>
>> Deepak
>>
>> Tried this. Getting this error now
>>
>> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") :   unused argument ("")
>>
>>
>> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com>
>> wrote:
>>
>> Hi Sandeep
>> can you try this ?
>>
>> results <- sql(hivecontext, "FROM test SELECT id","")
>>
>> Thanks
>> Deepak
>>
>>
>> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
>> wrote:
>>
>> Thanks Deepak.
>>
>> I tried this as well. I created a hivecontext   with  "hivecontext <<-
>> sparkRHive.init(sc) "  .
>>
>> When I tried to read hive table from this ,
>>
>> results <- sql(hivecontext, "FROM test SELECT id")
>>
>> I get below error,
>>
>> Error in callJMethod(sqlContext, "sql", sqlQuery) :   Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>>
>>
>> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>>
>>
>>
>> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
>> wrote:
>>
>> Hi Sandeep
>> I am not sure if ORC can be read directly in R.
>> But there can be a workaround .First create hive table on top of ORC
>> files and then access hive table in R.
>>
>> Thanks
>> Deepak
>>
>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io>
>> wrote:
>>
>> Hello
>>
>> I need to read an ORC files in hdfs in R using spark. I am not able to
>> find a package to do that.
>>
>> Can anyone help with documentation or example for this purpose?
>>
>> --
>> Architect
>> Infoworks.io <http://infoworks.io>
>> http://Infoworks.io
>>
>>
>>
>>
>> --
>> Thanks
>> Deepak
>> www.bigdatabig.com
>> www.keosha.net
>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io <http://infoworks.io>
>> http://Infoworks.io
>>
>>
>>
>>
>> --
>> Thanks
>> Deepak
>> www.bigdatabig.com
>> www.keosha.net
>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io <http://infoworks.io>
>> http://Infoworks.io
>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io <http://infoworks.io>
>> http://Infoworks.io
>>
>>
>>
>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io
>> http://Infoworks.io
>>
>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io
>> http://Infoworks.io
>>
>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>



-- 
Architect
Infoworks.io
http://Infoworks.io

Re: sparkR ORC support.

Posted by Sandeep Khurana <sa...@infoworks.io>.
Running this gave

16/01/12 04:06:54 INFO BlockManagerMaster: Registered
BlockManagerError in writeJobj(con, object) : invalid jobj 3


How does it know which hive schema to connect to?



On Tue, Jan 12, 2016 at 2:34 PM, Felix Cheung <fe...@hotmail.com>
wrote:

> It looks like you have overwritten sc. Could you try this:
>
>
> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>
> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
> library(SparkR)
>
> sc <- sparkR.init()
> hivecontext <- sparkRHive.init(sc)
> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>
>
>
> ------------------------------
> Date: Tue, 12 Jan 2016 14:28:58 +0530
> Subject: Re: sparkR ORC support.
> From: sandeep@infoworks.io
> To: felixcheung_m@hotmail.com
> CC: ybliang8@gmail.com; user@spark.apache.org; premsure542@gmail.com;
> deepakmca05@gmail.com
>
>
> The code is very simple, pasted below .
> hive-site.xml is in spark conf already. I still see this error
>
> Error in writeJobj(con, object) : invalid jobj 3
>
> after running the script  below
>
>
> script
> =======
> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>
>
> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
> library(SparkR)
>
> sc <<- sparkR.init()
> sc <<- sparkRHive.init()
> hivecontext <<- sparkRHive.init(sc)
> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
> #View(df)
>
>
> On Wed, Jan 6, 2016 at 11:08 PM, Felix Cheung <fe...@hotmail.com>
> wrote:
>
> Yes, as Yanbo suggested, it looks like there is something wrong with the
> sqlContext.
>
> Could you forward us your code please?
>
>
>
>
>
> On Wed, Jan 6, 2016 at 5:52 AM -0800, "Yanbo Liang" <yb...@gmail.com>
> wrote:
>
> You should ensure your sqlContext is HiveContext.
>
> sc <- sparkR.init()
>
> sqlContext <- sparkRHive.init(sc)
>
>
> 2016-01-06 20:35 GMT+08:00 Sandeep Khurana <sa...@infoworks.io>:
>
> Felix
>
> I tried the option suggested by you.  It gave below error.  I am going to
> try the option suggested by Prem .
>
> Error in writeJobj(con, object) : invalid jobj 1
> 8
> stop("invalid jobj ", value$id)
> 7
> writeJobj(con, object)
> 6
> writeObject(con, a)
> 5
> writeArgs(rc, args)
> 4
> invokeJava(isStatic = TRUE, className, methodName, ...)
> 3
> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
> source, options)
> 2
> read.df(sqlContext, filepath, "orc") at
> spark_api.R#108
>
> On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <fe...@hotmail.com>
> wrote:
>
> Firstly I don't have ORC data to verify but this should work:
>
> df <- loadDF(sqlContext, "data/path", "orc")
>
> Secondly, could you check if sparkR.stop() was called? sparkRHive.init()
> should be called after sparkR.init() - please check if there is any error
> message there.
>
> _____________________________
> From: Prem Sure <pr...@gmail.com>
> Sent: Tuesday, January 5, 2016 8:12 AM
> Subject: Re: sparkR ORC support.
> To: Sandeep Khurana <sa...@infoworks.io>
> Cc: spark users <us...@spark.apache.org>, Deepak Sharma <
> deepakmca05@gmail.com>
>
>
>
> Yes Sandeep, also copy hive-site.xml too to spark conf directory.
>
>
> On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
> Also, do I need to setup hive in spark as per the link
> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark
> ?
>
> We might need to copy hdfs-site.xml file to spark conf directory ?
>
> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
> Deepak
>
> Tried this. Getting this error now
>
> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") :   unused argument ("")
>
>
> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com>
> wrote:
>
> Hi Sandeep
> can you try this ?
>
> results <- sql(hivecontext, "FROM test SELECT id","")
>
> Thanks
> Deepak
>
>
> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
> Thanks Deepak.
>
> I tried this as well. I created a hivecontext   with  "hivecontext <<-
> sparkRHive.init(sc) "  .
>
> When I tried to read hive table from this ,
>
> results <- sql(hivecontext, "FROM test SELECT id")
>
> I get below error,
>
> Error in callJMethod(sqlContext, "sql", sqlQuery) :   Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>
>
> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>
>
>
> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
> wrote:
>
> Hi Sandeep
> I am not sure if ORC can be read directly in R.
> But there can be a workaround .First create hive table on top of ORC files
> and then access hive table in R.
>
> Thanks
> Deepak
>
> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
> Hello
>
> I need to read an ORC files in hdfs in R using spark. I am not able to
> find a package to do that.
>
> Can anyone help with documentation or example for this purpose?
>
> --
> Architect
> Infoworks.io <http://infoworks.io>
> http://Infoworks.io
>
>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>
>
>
>
> --
> Architect
> Infoworks.io <http://infoworks.io>
> http://Infoworks.io
>
>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>
>
>
>
> --
> Architect
> Infoworks.io <http://infoworks.io>
> http://Infoworks.io
>
>
>
>
> --
> Architect
> Infoworks.io <http://infoworks.io>
> http://Infoworks.io
>
>
>
>
>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>
>
>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>



-- 
Architect
Infoworks.io
http://Infoworks.io

RE: sparkR ORC support.

Posted by Felix Cheung <fe...@hotmail.com>.
It looks like you have overwritten sc. Could you try this:
 
 
Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))library(SparkR)
sc <- sparkR.init()hivecontext <- sparkRHive.init(sc)df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc") 

 
Date: Tue, 12 Jan 2016 14:28:58 +0530
Subject: Re: sparkR ORC support.
From: sandeep@infoworks.io
To: felixcheung_m@hotmail.com
CC: ybliang8@gmail.com; user@spark.apache.org; premsure542@gmail.com; deepakmca05@gmail.com

The code is very simple, pasted below .  hive-site.xml is in spark conf already. I still see this error Error in writeJobj(con, object) : invalid jobj 3
after running the script  below

script=======Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")

.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))library(SparkR)
sc <<- sparkR.init()sc <<- sparkRHive.init()hivecontext <<- sparkRHive.init(sc)df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")#View(df)

On Wed, Jan 6, 2016 at 11:08 PM, Felix Cheung <fe...@hotmail.com> wrote:





Yes, as Yanbo suggested, it looks like there is something wrong with the sqlContext.



Could you forward us your code please?













On Wed, Jan 6, 2016 at 5:52 AM -0800, "Yanbo Liang" 
<yb...@gmail.com> wrote:





You should ensure your sqlContext is HiveContext.

sc <- sparkR.init()
sqlContext <- sparkRHive.init(sc)




2016-01-06 20:35 GMT+08:00 Sandeep Khurana 
<sa...@infoworks.io>:


Felix



I tried the option suggested by you.  It gave below error.  I am going to try the option suggested by Prem .





Error in writeJobj(con, object) : invalid jobj 1




8

stop("invalid jobj ", value$id)




7

writeJobj(con, object)




6

writeObject(con, a)




5

writeArgs(rc, args)




4

invokeJava(isStatic = TRUE, className, methodName, ...)




3

callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext, source, options)




2

read.df(sqlContext, filepath, "orc") at
spark_api.R#108








On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung 
<fe...@hotmail.com> wrote:



Firstly I don't have ORC data to verify but this should work:



df <- loadDF(sqlContext, "data/path", "orc")



Secondly, could you check if sparkR.stop() was called? sparkRHive.init() should be called after sparkR.init() - please check if there is any error message there.






_____________________________

From: Prem Sure <pr...@gmail.com>

Sent: Tuesday, January 5, 2016 8:12 AM

Subject: Re: sparkR ORC support.

To: Sandeep Khurana <sa...@infoworks.io>

Cc: spark users <us...@spark.apache.org>, Deepak Sharma <de...@gmail.com>







Yes Sandeep, also copy hive-site.xml too to spark conf directory. 






On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana 
<sa...@infoworks.io> wrote:



Also, do I need to setup hive in spark as per the link  
http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark ?



We might need to copy hdfs-site.xml file to spark conf directory ? 





On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana 
<sa...@infoworks.io> wrote:



Deepak



Tried this. Getting this error now 

rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") :   unused argument ("")






On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma 
<de...@gmail.com> wrote:




Hi Sandeep 
can you try this ? 




results <- sql(hivecontext, "FROM test SELECT id","") 



Thanks 

Deepak 









On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana 
<sa...@infoworks.io> wrote:



Thanks Deepak.



I tried this as well. I created a hivecontext   with  "hivecontext <<- sparkRHive.init(sc) "  .




When I tried to read hive table from this ,  



results <- sql(hivecontext, "FROM test SELECT id") 



I get below error,  




Error in callJMethod(sqlContext, "sql", sqlQuery) :   Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.


Not sure what is causing this? Any leads or ideas? I am using rstudio. 








On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma 
<de...@gmail.com> wrote:




Hi Sandeep 
I am not sure if ORC can be read directly in R. 
But there can be a workaround .First create hive table on top of ORC files and then access hive table in R.




Thanks 
Deepak 





On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana 
<sa...@infoworks.io> wrote:



Hello



I need to read an ORC files in hdfs in R using spark. I am not able to find a package to do that. 




Can anyone help with documentation or example for this purpose? 




-- 




Architect 


Infoworks.io 


http://Infoworks.io 















-- 


Thanks 

Deepak 

www.bigdatabig.com 

www.keosha.net 











-- 




Architect 


Infoworks.io 


http://Infoworks.io 














-- 


Thanks 

Deepak 

www.bigdatabig.com 

www.keosha.net 













-- 




Architect 


Infoworks.io 


http://Infoworks.io 















-- 




Architect 


Infoworks.io 


http://Infoworks.io 


























-- 




Architect


Infoworks.io


http://Infoworks.io
















-- 
Architect
Infoworks.io
http://Infoworks.io

 		 	   		  

Re: sparkR ORC support.

Posted by Sandeep Khurana <sa...@infoworks.io>.
The code is very simple, pasted below .
hive-site.xml is in spark conf already. I still see this error

Error in writeJobj(con, object) : invalid jobj 3

after running the script  below


script
=======
Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")


.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)

sc <<- sparkR.init()
sc <<- sparkRHive.init()
hivecontext <<- sparkRHive.init(sc)
df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
#View(df)


On Wed, Jan 6, 2016 at 11:08 PM, Felix Cheung <fe...@hotmail.com>
wrote:

> Yes, as Yanbo suggested, it looks like there is something wrong with the
> sqlContext.
>
> Could you forward us your code please?
>
>
>
>
>
> On Wed, Jan 6, 2016 at 5:52 AM -0800, "Yanbo Liang" <yb...@gmail.com>
> wrote:
>
> You should ensure your sqlContext is HiveContext.
>
> sc <- sparkR.init()
>
> sqlContext <- sparkRHive.init(sc)
>
>
> 2016-01-06 20:35 GMT+08:00 Sandeep Khurana <sa...@infoworks.io>:
>
> Felix
>
> I tried the option suggested by you.  It gave below error.  I am going to
> try the option suggested by Prem .
>
> Error in writeJobj(con, object) : invalid jobj 1
> 8
> stop("invalid jobj ", value$id)
> 7
> writeJobj(con, object)
> 6
> writeObject(con, a)
> 5
> writeArgs(rc, args)
> 4
> invokeJava(isStatic = TRUE, className, methodName, ...)
> 3
> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
> source, options)
> 2
> read.df(sqlContext, filepath, "orc") at
> spark_api.R#108
>
> On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <fe...@hotmail.com>
> wrote:
>
> Firstly I don't have ORC data to verify but this should work:
>
> df <- loadDF(sqlContext, "data/path", "orc")
>
> Secondly, could you check if sparkR.stop() was called? sparkRHive.init()
> should be called after sparkR.init() - please check if there is any error
> message there.
>
> _____________________________
> From: Prem Sure <pr...@gmail.com>
> Sent: Tuesday, January 5, 2016 8:12 AM
> Subject: Re: sparkR ORC support.
> To: Sandeep Khurana <sa...@infoworks.io>
> Cc: spark users <us...@spark.apache.org>, Deepak Sharma <
> deepakmca05@gmail.com>
>
>
>
> Yes Sandeep, also copy hive-site.xml too to spark conf directory.
>
>
> On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
> Also, do I need to setup hive in spark as per the link
> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark
> ?
>
> We might need to copy hdfs-site.xml file to spark conf directory ?
>
> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
> Deepak
>
> Tried this. Getting this error now
>
> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") :   unused argument ("")
>
>
> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com>
> wrote:
>
> Hi Sandeep
> can you try this ?
>
> results <- sql(hivecontext, "FROM test SELECT id","")
>
> Thanks
> Deepak
>
>
> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
> Thanks Deepak.
>
> I tried this as well. I created a hivecontext   with  "hivecontext <<-
> sparkRHive.init(sc) "  .
>
> When I tried to read hive table from this ,
>
> results <- sql(hivecontext, "FROM test SELECT id")
>
> I get below error,
>
> Error in callJMethod(sqlContext, "sql", sqlQuery) :   Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>
>
> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>
>
>
> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
> wrote:
>
> Hi Sandeep
> I am not sure if ORC can be read directly in R.
> But there can be a workaround .First create hive table on top of ORC files
> and then access hive table in R.
>
> Thanks
> Deepak
>
> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
> Hello
>
> I need to read an ORC files in hdfs in R using spark. I am not able to
> find a package to do that.
>
> Can anyone help with documentation or example for this purpose?
>
> --
> Architect
> Infoworks.io <http://infoworks.io>
> http://Infoworks.io
>
>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>
>
>
>
> --
> Architect
> Infoworks.io <http://infoworks.io>
> http://Infoworks.io
>
>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>
>
>
>
> --
> Architect
> Infoworks.io <http://infoworks.io>
> http://Infoworks.io
>
>
>
>
> --
> Architect
> Infoworks.io <http://infoworks.io>
> http://Infoworks.io
>
>
>
>
>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>
>
>


-- 
Architect
Infoworks.io
http://Infoworks.io

Re: sparkR ORC support.

Posted by Felix Cheung <fe...@hotmail.com>.
Yes, as Yanbo suggested, it looks like there is something wrong with the sqlContext.
Could you forward us your code please?






On Wed, Jan 6, 2016 at 5:52 AM -0800, "Yanbo Liang" <yb...@gmail.com> wrote:





You should ensure your sqlContext is HiveContext.

sc <- sparkR.init()

sqlContext <- sparkRHive.init(sc)


2016-01-06 20:35 GMT+08:00 Sandeep Khurana <sa...@infoworks.io>:

> Felix
>
> I tried the option suggested by you.  It gave below error.  I am going to
> try the option suggested by Prem .
>
> Error in writeJobj(con, object) : invalid jobj 1
> 8
> stop("invalid jobj ", value$id)
> 7
> writeJobj(con, object)
> 6
> writeObject(con, a)
> 5
> writeArgs(rc, args)
> 4
> invokeJava(isStatic = TRUE, className, methodName, ...)
> 3
> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
> source, options)
> 2
> read.df(sqlContext, filepath, "orc") at
> spark_api.R#108
>
> On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <fe...@hotmail.com>
> wrote:
>
>> Firstly I don't have ORC data to verify but this should work:
>>
>> df <- loadDF(sqlContext, "data/path", "orc")
>>
>> Secondly, could you check if sparkR.stop() was called? sparkRHive.init()
>> should be called after sparkR.init() - please check if there is any error
>> message there.
>>
>> _____________________________
>> From: Prem Sure <pr...@gmail.com>
>> Sent: Tuesday, January 5, 2016 8:12 AM
>> Subject: Re: sparkR ORC support.
>> To: Sandeep Khurana <sa...@infoworks.io>
>> Cc: spark users <us...@spark.apache.org>, Deepak Sharma <
>> deepakmca05@gmail.com>
>>
>>
>>
>> Yes Sandeep, also copy hive-site.xml too to spark conf directory.
>>
>>
>> On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sa...@infoworks.io>
>> wrote:
>>
>>> Also, do I need to setup hive in spark as per the link
>>> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark
>>> ?
>>>
>>> We might need to copy hdfs-site.xml file to spark conf directory ?
>>>
>>> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sa...@infoworks.io>
>>> wrote:
>>>
>>>> Deepak
>>>>
>>>> Tried this. Getting this error now
>>>>
>>>> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") :   unused argument ("")
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Sandeep
>>>>> can you try this ?
>>>>>
>>>>> results <- sql(hivecontext, "FROM test SELECT id","")
>>>>>
>>>>> Thanks
>>>>> Deepak
>>>>>
>>>>>
>>>>> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
>>>>> wrote:
>>>>>
>>>>>> Thanks Deepak.
>>>>>>
>>>>>> I tried this as well. I created a hivecontext   with  "hivecontext
>>>>>> <<- sparkRHive.init(sc) "  .
>>>>>>
>>>>>> When I tried to read hive table from this ,
>>>>>>
>>>>>> results <- sql(hivecontext, "FROM test SELECT id")
>>>>>>
>>>>>> I get below error,
>>>>>>
>>>>>> Error in callJMethod(sqlContext, "sql", sqlQuery) :   Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>>>>>>
>>>>>>
>>>>>> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Sandeep
>>>>>>> I am not sure if ORC can be read directly in R.
>>>>>>> But there can be a workaround .First create hive table on top of ORC
>>>>>>> files and then access hive table in R.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Deepak
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <
>>>>>>> sandeep@infoworks.io> wrote:
>>>>>>>
>>>>>>>> Hello
>>>>>>>>
>>>>>>>> I need to read an ORC files in hdfs in R using spark. I am not able
>>>>>>>> to find a package to do that.
>>>>>>>>
>>>>>>>> Can anyone help with documentation or example for this purpose?
>>>>>>>>
>>>>>>>> --
>>>>>>>> Architect
>>>>>>>> Infoworks.io <http://infoworks.io>
>>>>>>>> http://Infoworks.io
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Thanks
>>>>>>> Deepak
>>>>>>> www.bigdatabig.com
>>>>>>> www.keosha.net
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Architect
>>>>>> Infoworks.io <http://infoworks.io>
>>>>>> http://Infoworks.io
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks
>>>>> Deepak
>>>>> www.bigdatabig.com
>>>>> www.keosha.net
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Architect
>>>> Infoworks.io <http://infoworks.io>
>>>> http://Infoworks.io
>>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io <http://infoworks.io>
>>> http://Infoworks.io
>>>
>>
>>
>>
>>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>

Re: sparkR ORC support.

Posted by Yanbo Liang <yb...@gmail.com>.
You should ensure your sqlContext is HiveContext.

sc <- sparkR.init()

sqlContext <- sparkRHive.init(sc)


2016-01-06 20:35 GMT+08:00 Sandeep Khurana <sa...@infoworks.io>:

> Felix
>
> I tried the option suggested by you.  It gave below error.  I am going to
> try the option suggested by Prem .
>
> Error in writeJobj(con, object) : invalid jobj 1
> 8
> stop("invalid jobj ", value$id)
> 7
> writeJobj(con, object)
> 6
> writeObject(con, a)
> 5
> writeArgs(rc, args)
> 4
> invokeJava(isStatic = TRUE, className, methodName, ...)
> 3
> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
> source, options)
> 2
> read.df(sqlContext, filepath, "orc") at
> spark_api.R#108
>
> On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <fe...@hotmail.com>
> wrote:
>
>> Firstly I don't have ORC data to verify but this should work:
>>
>> df <- loadDF(sqlContext, "data/path", "orc")
>>
>> Secondly, could you check if sparkR.stop() was called? sparkRHive.init()
>> should be called after sparkR.init() - please check if there is any error
>> message there.
>>
>> _____________________________
>> From: Prem Sure <pr...@gmail.com>
>> Sent: Tuesday, January 5, 2016 8:12 AM
>> Subject: Re: sparkR ORC support.
>> To: Sandeep Khurana <sa...@infoworks.io>
>> Cc: spark users <us...@spark.apache.org>, Deepak Sharma <
>> deepakmca05@gmail.com>
>>
>>
>>
>> Yes Sandeep, also copy hive-site.xml too to spark conf directory.
>>
>>
>> On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sa...@infoworks.io>
>> wrote:
>>
>>> Also, do I need to setup hive in spark as per the link
>>> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark
>>> ?
>>>
>>> We might need to copy hdfs-site.xml file to spark conf directory ?
>>>
>>> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sa...@infoworks.io>
>>> wrote:
>>>
>>>> Deepak
>>>>
>>>> Tried this. Getting this error now
>>>>
>>>> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") :   unused argument ("")
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Sandeep
>>>>> can you try this ?
>>>>>
>>>>> results <- sql(hivecontext, "FROM test SELECT id","")
>>>>>
>>>>> Thanks
>>>>> Deepak
>>>>>
>>>>>
>>>>> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
>>>>> wrote:
>>>>>
>>>>>> Thanks Deepak.
>>>>>>
>>>>>> I tried this as well. I created a hivecontext   with  "hivecontext
>>>>>> <<- sparkRHive.init(sc) "  .
>>>>>>
>>>>>> When I tried to read hive table from this ,
>>>>>>
>>>>>> results <- sql(hivecontext, "FROM test SELECT id")
>>>>>>
>>>>>> I get below error,
>>>>>>
>>>>>> Error in callJMethod(sqlContext, "sql", sqlQuery) :   Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>>>>>>
>>>>>>
>>>>>> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Sandeep
>>>>>>> I am not sure if ORC can be read directly in R.
>>>>>>> But there can be a workaround .First create hive table on top of ORC
>>>>>>> files and then access hive table in R.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Deepak
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <
>>>>>>> sandeep@infoworks.io> wrote:
>>>>>>>
>>>>>>>> Hello
>>>>>>>>
>>>>>>>> I need to read an ORC files in hdfs in R using spark. I am not able
>>>>>>>> to find a package to do that.
>>>>>>>>
>>>>>>>> Can anyone help with documentation or example for this purpose?
>>>>>>>>
>>>>>>>> --
>>>>>>>> Architect
>>>>>>>> Infoworks.io <http://infoworks.io>
>>>>>>>> http://Infoworks.io
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Thanks
>>>>>>> Deepak
>>>>>>> www.bigdatabig.com
>>>>>>> www.keosha.net
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Architect
>>>>>> Infoworks.io <http://infoworks.io>
>>>>>> http://Infoworks.io
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks
>>>>> Deepak
>>>>> www.bigdatabig.com
>>>>> www.keosha.net
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Architect
>>>> Infoworks.io <http://infoworks.io>
>>>> http://Infoworks.io
>>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io <http://infoworks.io>
>>> http://Infoworks.io
>>>
>>
>>
>>
>>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>

Re: sparkR ORC support.

Posted by Sandeep Khurana <sa...@infoworks.io>.
Felix

I tried the option suggested by you.  It gave below error.  I am going to
try the option suggested by Prem .

Error in writeJobj(con, object) : invalid jobj 1
8
stop("invalid jobj ", value$id)
7
writeJobj(con, object)
6
writeObject(con, a)
5
writeArgs(rc, args)
4
invokeJava(isStatic = TRUE, className, methodName, ...)
3
callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
source, options)
2
read.df(sqlContext, filepath, "orc") at
spark_api.R#108

On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <fe...@hotmail.com>
wrote:

> Firstly I don't have ORC data to verify but this should work:
>
> df <- loadDF(sqlContext, "data/path", "orc")
>
> Secondly, could you check if sparkR.stop() was called? sparkRHive.init()
> should be called after sparkR.init() - please check if there is any error
> message there.
>
> _____________________________
> From: Prem Sure <pr...@gmail.com>
> Sent: Tuesday, January 5, 2016 8:12 AM
> Subject: Re: sparkR ORC support.
> To: Sandeep Khurana <sa...@infoworks.io>
> Cc: spark users <us...@spark.apache.org>, Deepak Sharma <
> deepakmca05@gmail.com>
>
>
>
> Yes Sandeep, also copy hive-site.xml too to spark conf directory.
>
>
> On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
>> Also, do I need to setup hive in spark as per the link
>> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark
>> ?
>>
>> We might need to copy hdfs-site.xml file to spark conf directory ?
>>
>> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sa...@infoworks.io>
>> wrote:
>>
>>> Deepak
>>>
>>> Tried this. Getting this error now
>>>
>>> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") :   unused argument ("")
>>>
>>>
>>> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com>
>>> wrote:
>>>
>>>> Hi Sandeep
>>>> can you try this ?
>>>>
>>>> results <- sql(hivecontext, "FROM test SELECT id","")
>>>>
>>>> Thanks
>>>> Deepak
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
>>>> wrote:
>>>>
>>>>> Thanks Deepak.
>>>>>
>>>>> I tried this as well. I created a hivecontext   with  "hivecontext <<-
>>>>> sparkRHive.init(sc) "  .
>>>>>
>>>>> When I tried to read hive table from this ,
>>>>>
>>>>> results <- sql(hivecontext, "FROM test SELECT id")
>>>>>
>>>>> I get below error,
>>>>>
>>>>> Error in callJMethod(sqlContext, "sql", sqlQuery) :   Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>>>>>
>>>>>
>>>>> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Sandeep
>>>>>> I am not sure if ORC can be read directly in R.
>>>>>> But there can be a workaround .First create hive table on top of ORC
>>>>>> files and then access hive table in R.
>>>>>>
>>>>>> Thanks
>>>>>> Deepak
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sandeep@infoworks.io
>>>>>> > wrote:
>>>>>>
>>>>>>> Hello
>>>>>>>
>>>>>>> I need to read an ORC files in hdfs in R using spark. I am not able
>>>>>>> to find a package to do that.
>>>>>>>
>>>>>>> Can anyone help with documentation or example for this purpose?
>>>>>>>
>>>>>>> --
>>>>>>> Architect
>>>>>>> Infoworks.io <http://infoworks.io>
>>>>>>> http://Infoworks.io
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks
>>>>>> Deepak
>>>>>> www.bigdatabig.com
>>>>>> www.keosha.net
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Architect
>>>>> Infoworks.io <http://infoworks.io>
>>>>> http://Infoworks.io
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks
>>>> Deepak
>>>> www.bigdatabig.com
>>>> www.keosha.net
>>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io <http://infoworks.io>
>>> http://Infoworks.io
>>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io <http://infoworks.io>
>> http://Infoworks.io
>>
>
>
>
>


-- 
Architect
Infoworks.io
http://Infoworks.io

Re: sparkR ORC support.

Posted by Felix Cheung <fe...@hotmail.com>.
Firstly I don't have ORC data to verify but this should work:
df <- loadDF(sqlContext, "data/path", "orc")
Secondly, could you check if sparkR.stop() was called? sparkRHive.init() should be called after sparkR.init() - please check if there is any error message there.


    _____________________________
From: Prem Sure <pr...@gmail.com>
Sent: Tuesday, January 5, 2016 8:12 AM
Subject: Re: sparkR ORC support.
To: Sandeep Khurana <sa...@infoworks.io>
Cc: spark users <us...@spark.apache.org>, Deepak Sharma <de...@gmail.com>


       Yes Sandeep, also copy hive-site.xml too to spark conf directory.   
   
       
       On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana     <sa...@infoworks.io> wrote:    
               Also, do I need to setup hive in spark as per the link       http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark ?             
                   We might need to copy hdfs-site.xml file to spark conf directory ?                                     
                 On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana          <sa...@infoworks.io> wrote:         
                              Deepak                       
                                  Tried this. Getting this error now                                  rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") :   unused argument ("")                                                                   
                           On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma               <de...@gmail.com> wrote:              
                                                              Hi Sandeep                                                 can you try this ?                                                 
                                                                   results <- sql(hivecontext, "FROM test SELECT id","")                                                    
                                                    Thanks                                                                         Deepak                                                                    
                                                                                                 
                                     On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana                    <sa...@infoworks.io> wrote:                   
                                                            Thanks Deepak.                                           
                                                                I tried this as well. I created a hivecontext   with  "hivecontext <<- sparkRHive.init(sc) "  .                                                                
                                                                When I tried to read hive table from this ,                                                                 
                                                                results <- sql(hivecontext, "FROM test SELECT id")                                                                
                                                                I get below error,                                                                 
                                                                Error in callJMethod(sqlContext, "sql", sqlQuery) :   Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.                      
                      Not sure what is causing this? Any leads or ideas? I am using rstudio.                       
                                                                                                                               
                                               On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma                         <de...@gmail.com> wrote:                        
                                                                                                      Hi Sandeep                                                                               I am not sure if ORC can be read directly in R.                                                                               But there can be a workaround .First create hive table on top of ORC files and then access hive table in R.                                                                               
                                                                               Thanks                                                                               Deepak                                                                                                                                                             
                                                         On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana                              <sa...@infoworks.io> wrote:                             
                                                                                          Hello                                                               
                                                                                              I need to read an ORC files in hdfs in R using spark. I am not able to find a package to do that.                                                                                               
                                                                                              Can anyone help with documentation or example for this purpose?                                
                                                                     
                                  -- 
                                                                                                                                                                                    Architect                                      
                                     Infoworks.io                                     
                                                                        http://Infoworks.io                                    
                                                                                                                                                                                                                         
                            
                            
                                                                               -- 
                                                                                       Thanks                              
Deepak                              
                              www.bigdatabig.com                              
                              www.keosha.net                                                                                                                                                          
                       
                                               
                       --                        
                                                                                                                             Architect                           
                          Infoworks.io                          
                                                  http://Infoworks.io                         
                                                                                                                                                                       
                  
                  
--                   
                                                         Thanks                    
Deepak                    
                    www.bigdatabig.com                    
                    www.keosha.net                                                                                                                               
             
                           
             --              
                                                                           Architect                 
                Infoworks.io                
                              http://Infoworks.io               
                                                                                       
        
                 
        --         
                                                  Architect            
           Infoworks.io           
                    http://Infoworks.io          
                                               
    


  

Re: sparkR ORC support.

Posted by Prem Sure <pr...@gmail.com>.
Yes Sandeep, also copy hive-site.xml too to spark conf directory.


On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sa...@infoworks.io>
wrote:

> Also, do I need to setup hive in spark as per the link
> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark
> ?
>
> We might need to copy hdfs-site.xml file to spark conf directory ?
>
> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
>> Deepak
>>
>> Tried this. Getting this error now
>>
>> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") :
>>   unused argument ("")
>>
>>
>> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com>
>> wrote:
>>
>>> Hi Sandeep
>>> can you try this ?
>>>
>>> results <- sql(hivecontext, "FROM test SELECT id","")
>>>
>>> Thanks
>>> Deepak
>>>
>>>
>>> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
>>> wrote:
>>>
>>>> Thanks Deepak.
>>>>
>>>> I tried this as well. I created a hivecontext   with  "hivecontext <<-
>>>> sparkRHive.init(sc) "  .
>>>>
>>>> When I tried to read hive table from this ,
>>>>
>>>> results <- sql(hivecontext, "FROM test SELECT id")
>>>>
>>>> I get below error,
>>>>
>>>> Error in callJMethod(sqlContext, "sql", sqlQuery) :
>>>>   Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>>>>
>>>>
>>>> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>>>>
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Sandeep
>>>>> I am not sure if ORC can be read directly in R.
>>>>> But there can be a workaround .First create hive table on top of ORC
>>>>> files and then access hive table in R.
>>>>>
>>>>> Thanks
>>>>> Deepak
>>>>>
>>>>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io>
>>>>> wrote:
>>>>>
>>>>>> Hello
>>>>>>
>>>>>> I need to read an ORC files in hdfs in R using spark. I am not able
>>>>>> to find a package to do that.
>>>>>>
>>>>>> Can anyone help with documentation or example for this purpose?
>>>>>>
>>>>>> --
>>>>>> Architect
>>>>>> Infoworks.io
>>>>>> http://Infoworks.io
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks
>>>>> Deepak
>>>>> www.bigdatabig.com
>>>>> www.keosha.net
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Architect
>>>> Infoworks.io
>>>> http://Infoworks.io
>>>>
>>>
>>>
>>>
>>> --
>>> Thanks
>>> Deepak
>>> www.bigdatabig.com
>>> www.keosha.net
>>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io
>> http://Infoworks.io
>>
>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>

Re: sparkR ORC support.

Posted by Sandeep Khurana <sa...@infoworks.io>.
Also, do I need to setup hive in spark as per the link
http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark ?

We might need to copy hdfs-site.xml file to spark conf directory ?

On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sa...@infoworks.io>
wrote:

> Deepak
>
> Tried this. Getting this error now
>
> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") :
>   unused argument ("")
>
>
> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com>
> wrote:
>
>> Hi Sandeep
>> can you try this ?
>>
>> results <- sql(hivecontext, "FROM test SELECT id","")
>>
>> Thanks
>> Deepak
>>
>>
>> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
>> wrote:
>>
>>> Thanks Deepak.
>>>
>>> I tried this as well. I created a hivecontext   with  "hivecontext <<-
>>> sparkRHive.init(sc) "  .
>>>
>>> When I tried to read hive table from this ,
>>>
>>> results <- sql(hivecontext, "FROM test SELECT id")
>>>
>>> I get below error,
>>>
>>> Error in callJMethod(sqlContext, "sql", sqlQuery) :
>>>   Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>>>
>>>
>>> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>>>
>>>
>>>
>>> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
>>> wrote:
>>>
>>>> Hi Sandeep
>>>> I am not sure if ORC can be read directly in R.
>>>> But there can be a workaround .First create hive table on top of ORC
>>>> files and then access hive table in R.
>>>>
>>>> Thanks
>>>> Deepak
>>>>
>>>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io>
>>>> wrote:
>>>>
>>>>> Hello
>>>>>
>>>>> I need to read an ORC files in hdfs in R using spark. I am not able to
>>>>> find a package to do that.
>>>>>
>>>>> Can anyone help with documentation or example for this purpose?
>>>>>
>>>>> --
>>>>> Architect
>>>>> Infoworks.io
>>>>> http://Infoworks.io
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks
>>>> Deepak
>>>> www.bigdatabig.com
>>>> www.keosha.net
>>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io
>>> http://Infoworks.io
>>>
>>
>>
>>
>> --
>> Thanks
>> Deepak
>> www.bigdatabig.com
>> www.keosha.net
>>
>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>



-- 
Architect
Infoworks.io
http://Infoworks.io

Re: sparkR ORC support.

Posted by Sandeep Khurana <sa...@infoworks.io>.
Deepak

Tried this. Getting this error now

rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") :
  unused argument ("")


On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <de...@gmail.com> wrote:

> Hi Sandeep
> can you try this ?
>
> results <- sql(hivecontext, "FROM test SELECT id","")
>
> Thanks
> Deepak
>
>
> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
>> Thanks Deepak.
>>
>> I tried this as well. I created a hivecontext   with  "hivecontext <<-
>> sparkRHive.init(sc) "  .
>>
>> When I tried to read hive table from this ,
>>
>> results <- sql(hivecontext, "FROM test SELECT id")
>>
>> I get below error,
>>
>> Error in callJMethod(sqlContext, "sql", sqlQuery) :
>>   Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>>
>>
>> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>>
>>
>>
>> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
>> wrote:
>>
>>> Hi Sandeep
>>> I am not sure if ORC can be read directly in R.
>>> But there can be a workaround .First create hive table on top of ORC
>>> files and then access hive table in R.
>>>
>>> Thanks
>>> Deepak
>>>
>>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io>
>>> wrote:
>>>
>>>> Hello
>>>>
>>>> I need to read an ORC files in hdfs in R using spark. I am not able to
>>>> find a package to do that.
>>>>
>>>> Can anyone help with documentation or example for this purpose?
>>>>
>>>> --
>>>> Architect
>>>> Infoworks.io
>>>> http://Infoworks.io
>>>>
>>>
>>>
>>>
>>> --
>>> Thanks
>>> Deepak
>>> www.bigdatabig.com
>>> www.keosha.net
>>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io
>> http://Infoworks.io
>>
>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>



-- 
Architect
Infoworks.io
http://Infoworks.io

Re: sparkR ORC support.

Posted by Deepak Sharma <de...@gmail.com>.
Hi Sandeep
can you try this ?

results <- sql(hivecontext, "FROM test SELECT id","")

Thanks
Deepak


On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sa...@infoworks.io>
wrote:

> Thanks Deepak.
>
> I tried this as well. I created a hivecontext   with  "hivecontext <<-
> sparkRHive.init(sc) "  .
>
> When I tried to read hive table from this ,
>
> results <- sql(hivecontext, "FROM test SELECT id")
>
> I get below error,
>
> Error in callJMethod(sqlContext, "sql", sqlQuery) :
>   Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed.
>
>
> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>
>
>
> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com>
> wrote:
>
>> Hi Sandeep
>> I am not sure if ORC can be read directly in R.
>> But there can be a workaround .First create hive table on top of ORC
>> files and then access hive table in R.
>>
>> Thanks
>> Deepak
>>
>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io>
>> wrote:
>>
>>> Hello
>>>
>>> I need to read an ORC files in hdfs in R using spark. I am not able to
>>> find a package to do that.
>>>
>>> Can anyone help with documentation or example for this purpose?
>>>
>>> --
>>> Architect
>>> Infoworks.io
>>> http://Infoworks.io
>>>
>>
>>
>>
>> --
>> Thanks
>> Deepak
>> www.bigdatabig.com
>> www.keosha.net
>>
>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>



-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net

Re: sparkR ORC support.

Posted by Sandeep Khurana <sa...@infoworks.io>.
Thanks Deepak.

I tried this as well. I created a hivecontext   with  "hivecontext <<-
sparkRHive.init(sc) "  .

When I tried to read hive table from this ,

results <- sql(hivecontext, "FROM test SELECT id")

I get below error,

Error in callJMethod(sqlContext, "sql", sqlQuery) :
  Invalid jobj 2. If SparkR was restarted, Spark operations need to be
re-executed.


Not sure what is causing this? Any leads or ideas? I am using rstudio.



On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <de...@gmail.com> wrote:

> Hi Sandeep
> I am not sure if ORC can be read directly in R.
> But there can be a workaround .First create hive table on top of ORC files
> and then access hive table in R.
>
> Thanks
> Deepak
>
> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io>
> wrote:
>
>> Hello
>>
>> I need to read an ORC files in hdfs in R using spark. I am not able to
>> find a package to do that.
>>
>> Can anyone help with documentation or example for this purpose?
>>
>> --
>> Architect
>> Infoworks.io
>> http://Infoworks.io
>>
>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>



-- 
Architect
Infoworks.io
http://Infoworks.io

Re: sparkR ORC support.

Posted by Deepak Sharma <de...@gmail.com>.
Hi Sandeep
I am not sure if ORC can be read directly in R.
But there can be a workaround .First create hive table on top of ORC files
and then access hive table in R.

Thanks
Deepak

On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sa...@infoworks.io>
wrote:

> Hello
>
> I need to read an ORC files in hdfs in R using spark. I am not able to
> find a package to do that.
>
> Can anyone help with documentation or example for this purpose?
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>



-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net