You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Antony Mayi <an...@yahoo.com.INVALID> on 2014/12/22 20:02:39 UTC

custom python converter from HBase Result to tuple

Hi,

can anyone please give me some help how to write custom converter of hbase data to (for example) tuples of ((family, qualifier, value), ) for pyspark:

I was trying something like (here trying to tuples of ("family:qualifier:value", )):


class HBaseResultToTupleConverter extends Converter[Any, List[String]] {
  override def convert(obj: Any): List[String] = {
    val result = obj.asInstanceOf[Result]
    result.rawCells().map(cell => List(Bytes.toString(CellUtil.cloneFamily(cell)),
      Bytes.toString(CellUtil.cloneQualifier(cell)),
      Bytes.toString(CellUtil.cloneValue(cell))).mkString(":")
    ).toList
  }
}


but then I get a error:

14/12/22 16:27:40 WARN python.SerDeUtil: 
Failed to pickle Java object as value: $colon$colon, falling back
to 'toString'. Error: couldn't introspect javabean: java.lang.IllegalArgumentException: wrong number of arguments


does anyone have a hint?

Thanks,
Antony.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: custom python converter from HBase Result to tuple

Posted by Ted Yu <yu...@gmail.com>.
Please see
http://stackoverflow.com/questions/18565953/wrong-number-of-arguments-when-a-calling-function-from-class-in-python

Cheers

On Mon, Dec 22, 2014 at 8:04 PM, Antony Mayi <an...@yahoo.com> wrote:

> using hbase 0.98.6
>
> there is no stack trace, just this short error.
>
> just noticed it does the fallback to toString as in the message as this is
> what I get back to python:
>
>
> hbase_rdd.collect()
>
> [(u'key1', u'List(cf1:12345:14567890, cf2:123:14567896)')]
>
> so the question is why it falls back to toString?
>
> thanks,
> Antony.
>
>
>
>   On Monday, 22 December 2014, 20:09, Ted Yu <yu...@gmail.com> wrote:
>
>
>
> Which HBase version are you using ?
>
> Can you show the full stack trace ?
>
> Cheers
>
> On Mon, Dec 22, 2014 at 11:02 AM, Antony Mayi <
> antonymayi@yahoo.com.invalid> wrote:
>
> Hi,
>
> can anyone please give me some help how to write custom converter of hbase
> data to (for example) tuples of ((family, qualifier, value), ) for pyspark:
>
> I was trying something like (here trying to tuples of
> ("family:qualifier:value", )):
>
>
> class HBaseResultToTupleConverter extends Converter[Any, List[String]] {
>   override def convert(obj: Any): List[String] = {
>     val result = obj.asInstanceOf[Result]
>     result.rawCells().map(cell =>
> List(Bytes.toString(CellUtil.cloneFamily(cell)),
>       Bytes.toString(CellUtil.cloneQualifier(cell)),
>       Bytes.toString(CellUtil.cloneValue(cell))).mkString(":")
>     ).toList
>   }
> }
>
>
> but then I get a error:
>
> 14/12/22 16:27:40 WARN python.SerDeUtil:
> Failed to pickle Java object as value: $colon$colon, falling back
> to 'toString'. Error: couldn't introspect javabean:
> java.lang.IllegalArgumentException: wrong number of arguments
>
>
> does anyone have a hint?
>
> Thanks,
> Antony.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>
>
>
>

Re: custom python converter from HBase Result to tuple

Posted by Antony Mayi <an...@yahoo.com.INVALID>.
using hbase 0.98.6
there is no stack trace, just this short error.
just noticed it does the fallback to toString as in the message as this is what I get back to python:

hbase_rdd.collect()
[(u'key1', u'List(cf1:12345:14567890, cf2:123:14567896)')]
so the question is why it falls back to toString?
thanks,Antony.
 

     On Monday, 22 December 2014, 20:09, Ted Yu <yu...@gmail.com> wrote:
   
 

 Which HBase version are you using ?
Can you show the full stack trace ?
Cheers
On Mon, Dec 22, 2014 at 11:02 AM, Antony Mayi <an...@yahoo.com.invalid> wrote:

Hi,

can anyone please give me some help how to write custom converter of hbase data to (for example) tuples of ((family, qualifier, value), ) for pyspark:

I was trying something like (here trying to tuples of ("family:qualifier:value", )):


class HBaseResultToTupleConverter extends Converter[Any, List[String]] {
  override def convert(obj: Any): List[String] = {
    val result = obj.asInstanceOf[Result]
    result.rawCells().map(cell => List(Bytes.toString(CellUtil.cloneFamily(cell)),
      Bytes.toString(CellUtil.cloneQualifier(cell)),
      Bytes.toString(CellUtil.cloneValue(cell))).mkString(":")
    ).toList
  }
}


but then I get a error:

14/12/22 16:27:40 WARN python.SerDeUtil:
Failed to pickle Java object as value: $colon$colon, falling back
to 'toString'. Error: couldn't introspect javabean: java.lang.IllegalArgumentException: wrong number of arguments


does anyone have a hint?

Thanks,
Antony.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org





 
   

Re: custom python converter from HBase Result to tuple

Posted by Ted Yu <yu...@gmail.com>.
Which HBase version are you using ?

Can you show the full stack trace ?

Cheers

On Mon, Dec 22, 2014 at 11:02 AM, Antony Mayi <an...@yahoo.com.invalid>
wrote:

> Hi,
>
> can anyone please give me some help how to write custom converter of hbase
> data to (for example) tuples of ((family, qualifier, value), ) for pyspark:
>
> I was trying something like (here trying to tuples of
> ("family:qualifier:value", )):
>
>
> class HBaseResultToTupleConverter extends Converter[Any, List[String]] {
>   override def convert(obj: Any): List[String] = {
>     val result = obj.asInstanceOf[Result]
>     result.rawCells().map(cell =>
> List(Bytes.toString(CellUtil.cloneFamily(cell)),
>       Bytes.toString(CellUtil.cloneQualifier(cell)),
>       Bytes.toString(CellUtil.cloneValue(cell))).mkString(":")
>     ).toList
>   }
> }
>
>
> but then I get a error:
>
> 14/12/22 16:27:40 WARN python.SerDeUtil:
> Failed to pickle Java object as value: $colon$colon, falling back
> to 'toString'. Error: couldn't introspect javabean:
> java.lang.IllegalArgumentException: wrong number of arguments
>
>
> does anyone have a hint?
>
> Thanks,
> Antony.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>