You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Jeetendra Gangele <ga...@gmail.com> on 2015/03/31 21:51:02 UTC

java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result

When I am trying to get the result from Hbase and running mapToPair
function of RRD its giving the error
java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result

Here is the code

// private static JavaPairRDD<Integer, Result>
getCompanyDataRDD(JavaSparkContext sc) throws IOException {
// return sc.newAPIHadoopRDD(companyDAO.getCompnayDataConfiguration(),
TableInputFormat.class, ImmutableBytesWritable.class,
//    Result.class).mapToPair(new
PairFunction<Tuple2<ImmutableBytesWritable, Result>, Integer, Result>() {
//
// public Tuple2<Integer, Result> call(Tuple2<ImmutableBytesWritable,
Result> t) throws Exception {
// System.out.println("In getCompanyDataRDD"+t._2);
//
// String cknid = Bytes.toString(t._1.get());
// System.out.println("processing cknids is:"+cknid);
// Integer cknidInt = Integer.parseInt(cknid);
// Tuple2<Integer, Result> returnTuple = new Tuple2<Integer,
Result>(cknidInt, t._2);
// return returnTuple;
// }
// });
// }

Re: java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

One hack you can put in would be to bring Result class
<http://grepcode.com/file_/repository.cloudera.com/content/repositories/releases/com.cloudera.hbase/hbase/0.89.20100924-28/org/apache/hadoop/hbase/client/Result.java/?v=source>
locally and serialize it (implements serializable) and use it.

Thanks
Best Regards

On Tue, Apr 7, 2015 at 12:07 AM, Jeetendra Gangele <ga...@gmail.com>
wrote:

> I hit again same issue This time I tried to return the Object it failed
> with task not serialized below is the code
> here vendor record is serializable
>
> private static JavaRDD<VendorRecord>
> getVendorDataToProcess(JavaSparkContext sc) throws IOException {
>  return sc
>     .newAPIHadoopRDD(getVendorDataRowKeyScannerConfiguration(),
> TableInputFormat.class,
>         ImmutableBytesWritable.class, Result.class)
>     .map(new Function<Tuple2<ImmutableBytesWritable, Result>,
> VendorRecord>() {
>     @Override
>     public VendorRecord call(Tuple2<ImmutableBytesWritable, Result> v1)
> throws Exception {
>     String rowKey = new String(v1._1.get());
>      VendorRecord vd=vendorDataDAO.getVendorDataForRowkey(rowKey);
>      return vd;
>     }
>     });
>  }
>
>
> On 1 April 2015 at 02:07, Ted Yu <yu...@gmail.com> wrote:
>
>> Jeetendra:
>> Please extract the information you need from Result and return the
>> extracted portion - instead of returning Result itself.
>>
>> Cheers
>>
>> On Tue, Mar 31, 2015 at 1:14 PM, Nan Zhu <zh...@gmail.com> wrote:
>>
>>> The example in
>>> https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/HBaseTest.scala might
>>> help
>>>
>>> Best,
>>>
>>> --
>>> Nan Zhu
>>> http://codingcat.me
>>>
>>> On Tuesday, March 31, 2015 at 3:56 PM, Sean Owen wrote:
>>>
>>> Yep, it's not serializable:
>>>
>>> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html
>>>
>>> You can't return this from a distributed operation since that would
>>> mean it has to travel over the network and you haven't supplied any
>>> way to convert the thing into bytes.
>>>
>>> On Tue, Mar 31, 2015 at 8:51 PM, Jeetendra Gangele <ga...@gmail.com>
>>> wrote:
>>>
>>> When I am trying to get the result from Hbase and running mapToPair
>>> function
>>> of RRD its giving the error
>>> java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result
>>>
>>> Here is the code
>>>
>>> // private static JavaPairRDD<Integer, Result>
>>> getCompanyDataRDD(JavaSparkContext sc) throws IOException {
>>> // return sc.newAPIHadoopRDD(companyDAO.getCompnayDataConfiguration(),
>>> TableInputFormat.class, ImmutableBytesWritable.class,
>>> // Result.class).mapToPair(new
>>> PairFunction<Tuple2<ImmutableBytesWritable, Result>, Integer, Result>() {
>>> //
>>> // public Tuple2<Integer, Result> call(Tuple2<ImmutableBytesWritable,
>>> Result> t) throws Exception {
>>> // System.out.println("In getCompanyDataRDD"+t._2);
>>> //
>>> // String cknid = Bytes.toString(t._1.get());
>>> // System.out.println("processing cknids is:"+cknid);
>>> // Integer cknidInt = Integer.parseInt(cknid);
>>> // Tuple2<Integer, Result> returnTuple = new Tuple2<Integer,
>>> Result>(cknidInt, t._2);
>>> // return returnTuple;
>>> // }
>>> // });
>>> // }
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>>
>>
>
>
>
>

Re: java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result

Posted by Jeetendra Gangele <ga...@gmail.com>.

I hit again same issue This time I tried to return the Object it failed
with task not serialized below is the code
here vendor record is serializable

private static JavaRDD<VendorRecord>
getVendorDataToProcess(JavaSparkContext sc) throws IOException {
 return sc
    .newAPIHadoopRDD(getVendorDataRowKeyScannerConfiguration(),
TableInputFormat.class,
        ImmutableBytesWritable.class, Result.class)
    .map(new Function<Tuple2<ImmutableBytesWritable, Result>,
VendorRecord>() {
    @Override
    public VendorRecord call(Tuple2<ImmutableBytesWritable, Result> v1)
throws Exception {
    String rowKey = new String(v1._1.get());
     VendorRecord vd=vendorDataDAO.getVendorDataForRowkey(rowKey);
     return vd;
    }
    });
 }


On 1 April 2015 at 02:07, Ted Yu <yu...@gmail.com> wrote:

> Jeetendra:
> Please extract the information you need from Result and return the
> extracted portion - instead of returning Result itself.
>
> Cheers
>
> On Tue, Mar 31, 2015 at 1:14 PM, Nan Zhu <zh...@gmail.com> wrote:
>
>> The example in
>> https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/HBaseTest.scala might
>> help
>>
>> Best,
>>
>> --
>> Nan Zhu
>> http://codingcat.me
>>
>> On Tuesday, March 31, 2015 at 3:56 PM, Sean Owen wrote:
>>
>> Yep, it's not serializable:
>>
>> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html
>>
>> You can't return this from a distributed operation since that would
>> mean it has to travel over the network and you haven't supplied any
>> way to convert the thing into bytes.
>>
>> On Tue, Mar 31, 2015 at 8:51 PM, Jeetendra Gangele <ga...@gmail.com>
>> wrote:
>>
>> When I am trying to get the result from Hbase and running mapToPair
>> function
>> of RRD its giving the error
>> java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result
>>
>> Here is the code
>>
>> // private static JavaPairRDD<Integer, Result>
>> getCompanyDataRDD(JavaSparkContext sc) throws IOException {
>> // return sc.newAPIHadoopRDD(companyDAO.getCompnayDataConfiguration(),
>> TableInputFormat.class, ImmutableBytesWritable.class,
>> // Result.class).mapToPair(new
>> PairFunction<Tuple2<ImmutableBytesWritable, Result>, Integer, Result>() {
>> //
>> // public Tuple2<Integer, Result> call(Tuple2<ImmutableBytesWritable,
>> Result> t) throws Exception {
>> // System.out.println("In getCompanyDataRDD"+t._2);
>> //
>> // String cknid = Bytes.toString(t._1.get());
>> // System.out.println("processing cknids is:"+cknid);
>> // Integer cknidInt = Integer.parseInt(cknid);
>> // Tuple2<Integer, Result> returnTuple = new Tuple2<Integer,
>> Result>(cknidInt, t._2);
>> // return returnTuple;
>> // }
>> // });
>> // }
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>>
>

Re: java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result

Posted by Ted Yu <yu...@gmail.com>.

Jeetendra:
Please extract the information you need from Result and return the
extracted portion - instead of returning Result itself.

Cheers

On Tue, Mar 31, 2015 at 1:14 PM, Nan Zhu <zh...@gmail.com> wrote:

> The example in
> https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/HBaseTest.scala might
> help
>
> Best,
>
> --
> Nan Zhu
> http://codingcat.me
>
> On Tuesday, March 31, 2015 at 3:56 PM, Sean Owen wrote:
>
> Yep, it's not serializable:
> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html
>
> You can't return this from a distributed operation since that would
> mean it has to travel over the network and you haven't supplied any
> way to convert the thing into bytes.
>
> On Tue, Mar 31, 2015 at 8:51 PM, Jeetendra Gangele <ga...@gmail.com>
> wrote:
>
> When I am trying to get the result from Hbase and running mapToPair
> function
> of RRD its giving the error
> java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result
>
> Here is the code
>
> // private static JavaPairRDD<Integer, Result>
> getCompanyDataRDD(JavaSparkContext sc) throws IOException {
> // return sc.newAPIHadoopRDD(companyDAO.getCompnayDataConfiguration(),
> TableInputFormat.class, ImmutableBytesWritable.class,
> // Result.class).mapToPair(new
> PairFunction<Tuple2<ImmutableBytesWritable, Result>, Integer, Result>() {
> //
> // public Tuple2<Integer, Result> call(Tuple2<ImmutableBytesWritable,
> Result> t) throws Exception {
> // System.out.println("In getCompanyDataRDD"+t._2);
> //
> // String cknid = Bytes.toString(t._1.get());
> // System.out.println("processing cknids is:"+cknid);
> // Integer cknidInt = Integer.parseInt(cknid);
> // Tuple2<Integer, Result> returnTuple = new Tuple2<Integer,
> Result>(cknidInt, t._2);
> // return returnTuple;
> // }
> // });
> // }
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>
>

Re: java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result

Posted by Nan Zhu <zh...@gmail.com>.

The example in https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/HBaseTest.scala might help

Best, 

-- 
Nan Zhu
http://codingcat.me


On Tuesday, March 31, 2015 at 3:56 PM, Sean Owen wrote:

> Yep, it's not serializable:
> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html
> 
> You can't return this from a distributed operation since that would
> mean it has to travel over the network and you haven't supplied any
> way to convert the thing into bytes.
> 
> On Tue, Mar 31, 2015 at 8:51 PM, Jeetendra Gangele <gangele397@gmail.com (mailto:gangele397@gmail.com)> wrote:
> > When I am trying to get the result from Hbase and running mapToPair function
> > of RRD its giving the error
> > java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result
> > 
> > Here is the code
> > 
> > // private static JavaPairRDD<Integer, Result>
> > getCompanyDataRDD(JavaSparkContext sc) throws IOException {
> > // return sc.newAPIHadoopRDD(companyDAO.getCompnayDataConfiguration(),
> > TableInputFormat.class, ImmutableBytesWritable.class,
> > // Result.class).mapToPair(new
> > PairFunction<Tuple2<ImmutableBytesWritable, Result>, Integer, Result>() {
> > //
> > // public Tuple2<Integer, Result> call(Tuple2<ImmutableBytesWritable,
> > Result> t) throws Exception {
> > // System.out.println("In getCompanyDataRDD"+t._2);
> > //
> > // String cknid = Bytes.toString(t._1.get());
> > // System.out.println("processing cknids is:"+cknid);
> > // Integer cknidInt = Integer.parseInt(cknid);
> > // Tuple2<Integer, Result> returnTuple = new Tuple2<Integer,
> > Result>(cknidInt, t._2);
> > // return returnTuple;
> > // }
> > // });
> > // }
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org (mailto:user-unsubscribe@spark.apache.org)
> For additional commands, e-mail: user-help@spark.apache.org (mailto:user-help@spark.apache.org)
> 
>

Re: java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result

Posted by Sean Owen <so...@cloudera.com>.

Yep, it's not serializable:
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html

You can't return this from a distributed operation since that would
mean it has to travel over the network and you haven't supplied any
way to convert the thing into bytes.

On Tue, Mar 31, 2015 at 8:51 PM, Jeetendra Gangele <ga...@gmail.com> wrote:
> When I am trying to get the result from Hbase and running mapToPair function
> of RRD its giving the error
> java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result
>
> Here is the code
>
> // private static JavaPairRDD<Integer, Result>
> getCompanyDataRDD(JavaSparkContext sc) throws IOException {
> // return sc.newAPIHadoopRDD(companyDAO.getCompnayDataConfiguration(),
> TableInputFormat.class, ImmutableBytesWritable.class,
> //    Result.class).mapToPair(new
> PairFunction<Tuple2<ImmutableBytesWritable, Result>, Integer, Result>() {
> //
> // public Tuple2<Integer, Result> call(Tuple2<ImmutableBytesWritable,
> Result> t) throws Exception {
> // System.out.println("In getCompanyDataRDD"+t._2);
> //
> // String cknid = Bytes.toString(t._1.get());
> // System.out.println("processing cknids is:"+cknid);
> // Integer cknidInt = Integer.parseInt(cknid);
> // Tuple2<Integer, Result> returnTuple = new Tuple2<Integer,
> Result>(cknidInt, t._2);
> // return returnTuple;
> // }
> // });
> // }

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org