You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by tgbaggio <ge...@gmail.com> on 2015/01/05 15:42:56 UTC

python converter in HBaseConverter.scala(spark/examples)

Hi, 

In  HBaseConverter.scala
<https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/pythonconverters/HBaseConverters.scala> 
, the python converter HBaseResultToStringConverter return only the value of
first column in the result. In my opinion, it limits the utility of this
converter, because it returns only one value per row and moreover it loses
the other information of record, such as column:cell, timestamp. 

Therefore, I would like to propose some modifications about
HBaseResultToStringConverter which will be able to return all records in the
hbase with more complete information: I have already written some code in
pythonConverters.scala
<https://github.com/GenTang/spark_hbase/blob/master/src/main/scala/examples/pythonConverters.scala>  
and it works

Is it OK to modify the code in HBaseConverters.scala, please?
Thanks a lot in advance.

Cheers
Gen




--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/python-converter-in-HBaseConverter-scala-spark-examples-tp10001.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: python converter in HBaseConverter.scala(spark/examples)

Posted by Nick Pentreath <ni...@gmail.com>.

Absolutely; as I mentioned by all means submit a PR - I just wanted to point out that any specific converter is not "officially" supported, although the interface is of course.


I'm happy to review a PR just ping me when ready.


—
Sent from Mailbox

On Mon, Jan 5, 2015 at 7:06 PM, Ted Yu <yu...@gmail.com> wrote:

> HBaseConverter is in Spark source tree. Therefore I think it makes sense
> for this improvement to be accepted so that the example is more useful.
> Cheers
> On Mon, Jan 5, 2015 at 7:54 AM, Nick Pentreath <ni...@gmail.com>
> wrote:
>> Hey
>>
>> These converters are actually just intended to be examples of how to set
>> up a custom converter for a specific input format. The converter interface
>> is there to provide flexibility where needed, although with the new
>> SparkSQL data store interface the intention is that most common use cases
>> can be handled using that approach rather than custom converters.
>>
>> The intention is not to have specific converters living in Spark core,
>> which is why these are in the examples project.
>>
>> Having said that, if you wish to expand the example converter for others
>> reference do feel free to submit a PR.
>>
>> Ideally though, I would think that various custom converters would be part
>> of external projects that can be listed with http://spark-packages.org/ I
>> see your project is already listed there.
>>
>> —
>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>
>>
>> On Mon, Jan 5, 2015 at 5:37 PM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> In my opinion this would be useful - there was another thread where
>>> returning
>>> only the value of first column in the result was mentioned.
>>>
>>> Please create a SPARK JIRA and a pull request.
>>>
>>> Cheers
>>>
>>> On Mon, Jan 5, 2015 at 6:42 AM, tgbaggio <ge...@gmail.com> wrote:
>>>
>>> > Hi,
>>> >
>>> > In HBaseConverter.scala
>>> > <
>>> >
>>> https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/pythonconverters/HBaseConverters.scala
>>> > >
>>> > , the python converter HBaseResultToStringConverter return only the
>>> value
>>> > of
>>> > first column in the result. In my opinion, it limits the utility of
>>> this
>>> > converter, because it returns only one value per row and moreover it
>>> loses
>>> > the other information of record, such as column:cell, timestamp.
>>> >
>>> > Therefore, I would like to propose some modifications about
>>> > HBaseResultToStringConverter which will be able to return all records
>>> in
>>> > the
>>> > hbase with more complete information: I have already written some code
>>> in
>>> > pythonConverters.scala
>>> > <
>>> >
>>> https://github.com/GenTang/spark_hbase/blob/master/src/main/scala/examples/pythonConverters.scala
>>> > >
>>> > and it works
>>> >
>>> > Is it OK to modify the code in HBaseConverters.scala, please?
>>> > Thanks a lot in advance.
>>> >
>>> > Cheers
>>> > Gen
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > View this message in context:
>>> >
>>> http://apache-spark-developers-list.1001551.n3.nabble.com/python-converter-in-HBaseConverter-scala-spark-examples-tp10001.html
>>> > Sent from the Apache Spark Developers List mailing list archive at
>>> > Nabble.com.
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> > For additional commands, e-mail: dev-help@spark.apache.org
>>> >
>>> >
>>>
>>
>>

Re: python converter in HBaseConverter.scala(spark/examples)

Posted by Ted Yu <yu...@gmail.com>.

HBaseConverter is in Spark source tree. Therefore I think it makes sense
for this improvement to be accepted so that the example is more useful.

Cheers

On Mon, Jan 5, 2015 at 7:54 AM, Nick Pentreath <ni...@gmail.com>
wrote:

> Hey
>
> These converters are actually just intended to be examples of how to set
> up a custom converter for a specific input format. The converter interface
> is there to provide flexibility where needed, although with the new
> SparkSQL data store interface the intention is that most common use cases
> can be handled using that approach rather than custom converters.
>
> The intention is not to have specific converters living in Spark core,
> which is why these are in the examples project.
>
> Having said that, if you wish to expand the example converter for others
> reference do feel free to submit a PR.
>
> Ideally though, I would think that various custom converters would be part
> of external projects that can be listed with http://spark-packages.org/ I
> see your project is already listed there.
>
> —
> Sent from Mailbox <https://www.dropbox.com/mailbox>
>
>
> On Mon, Jan 5, 2015 at 5:37 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> In my opinion this would be useful - there was another thread where
>> returning
>> only the value of first column in the result was mentioned.
>>
>> Please create a SPARK JIRA and a pull request.
>>
>> Cheers
>>
>> On Mon, Jan 5, 2015 at 6:42 AM, tgbaggio <ge...@gmail.com> wrote:
>>
>> > Hi,
>> >
>> > In HBaseConverter.scala
>> > <
>> >
>> https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/pythonconverters/HBaseConverters.scala
>> > >
>> > , the python converter HBaseResultToStringConverter return only the
>> value
>> > of
>> > first column in the result. In my opinion, it limits the utility of
>> this
>> > converter, because it returns only one value per row and moreover it
>> loses
>> > the other information of record, such as column:cell, timestamp.
>> >
>> > Therefore, I would like to propose some modifications about
>> > HBaseResultToStringConverter which will be able to return all records
>> in
>> > the
>> > hbase with more complete information: I have already written some code
>> in
>> > pythonConverters.scala
>> > <
>> >
>> https://github.com/GenTang/spark_hbase/blob/master/src/main/scala/examples/pythonConverters.scala
>> > >
>> > and it works
>> >
>> > Is it OK to modify the code in HBaseConverters.scala, please?
>> > Thanks a lot in advance.
>> >
>> > Cheers
>> > Gen
>> >
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> >
>> http://apache-spark-developers-list.1001551.n3.nabble.com/python-converter-in-HBaseConverter-scala-spark-examples-tp10001.html
>> > Sent from the Apache Spark Developers List mailing list archive at
>> > Nabble.com.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> > For additional commands, e-mail: dev-help@spark.apache.org
>> >
>> >
>>
>
>

Re: python converter in HBaseConverter.scala(spark/examples)

Posted by Nick Pentreath <ni...@gmail.com>.

Hey 

These converters are actually just intended to be examples of how to set up a custom converter for a specific input format. The converter interface is there to provide flexibility where needed, although with the new SparkSQL data store interface the intention is that most common use cases can be handled using that approach rather than custom converters.

The intention is not to have specific converters living in Spark core, which is why these are in the examples project.

Having said that, if you wish to expand the example converter for others reference do feel free to submit a PR.

Ideally though, I would think that various custom converters would be part of external projects that can be listed with http://spark-packages.org/ I see your project is already listed there.

—
Sent from Mailbox

On Mon, Jan 5, 2015 at 5:37 PM, Ted Yu <yu...@gmail.com> wrote:

> In my opinion this would be useful - there was another thread where returning
> only the value of first column in the result was mentioned.
> Please create a SPARK JIRA and a pull request.
> Cheers
> On Mon, Jan 5, 2015 at 6:42 AM, tgbaggio <ge...@gmail.com> wrote:
>> Hi,
>>
>> In  HBaseConverter.scala
>> <
>> https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/pythonconverters/HBaseConverters.scala
>> >
>> , the python converter HBaseResultToStringConverter return only the value
>> of
>> first column in the result. In my opinion, it limits the utility of this
>> converter, because it returns only one value per row and moreover it loses
>> the other information of record, such as column:cell, timestamp.
>>
>> Therefore, I would like to propose some modifications about
>> HBaseResultToStringConverter which will be able to return all records in
>> the
>> hbase with more complete information: I have already written some code in
>> pythonConverters.scala
>> <
>> https://github.com/GenTang/spark_hbase/blob/master/src/main/scala/examples/pythonConverters.scala
>> >
>> and it works
>>
>> Is it OK to modify the code in HBaseConverters.scala, please?
>> Thanks a lot in advance.
>>
>> Cheers
>> Gen
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/python-converter-in-HBaseConverter-scala-spark-examples-tp10001.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>

RE: python converter in HBaseConverter.scala(spark/examples)

Posted by "Yan Zhou.sc" <Ya...@huawei.com>.

We are planning to support HBase as a "native" data source to Spark SQL in 1.3 (SPARK-3880). 
More details will come soon.


-----Original Message-----
From: Ted Yu [mailto:yuzhihong@gmail.com] 
Sent: Monday, January 05, 2015 7:37 AM
To: tgbaggio
Cc: dev@spark.apache.org
Subject: Re: python converter in HBaseConverter.scala(spark/examples)

In my opinion this would be useful - there was another thread where returning only the value of first column in the result was mentioned.

Please create a SPARK JIRA and a pull request.

Cheers

On Mon, Jan 5, 2015 at 6:42 AM, tgbaggio <ge...@gmail.com> wrote:

> Hi,
>
> In  HBaseConverter.scala
> <
> https://github.com/apache/spark/blob/master/examples/src/main/scala/or
> g/apache/spark/examples/pythonconverters/HBaseConverters.scala
> >
> , the python converter HBaseResultToStringConverter return only the 
> value of first column in the result. In my opinion, it limits the 
> utility of this converter, because it returns only one value per row 
> and moreover it loses the other information of record, such as 
> column:cell, timestamp.
>
> Therefore, I would like to propose some modifications about 
> HBaseResultToStringConverter which will be able to return all records 
> in the hbase with more complete information: I have already written 
> some code in pythonConverters.scala < 
> https://github.com/GenTang/spark_hbase/blob/master/src/main/scala/exam
> ples/pythonConverters.scala
> >
> and it works
>
> Is it OK to modify the code in HBaseConverters.scala, please?
> Thanks a lot in advance.
>
> Cheers
> Gen
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/python-conve
> rter-in-HBaseConverter-scala-spark-examples-tp10001.html
> Sent from the Apache Spark Developers List mailing list archive at 
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org For 
> additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: python converter in HBaseConverter.scala(spark/examples)

Posted by Ted Yu <yu...@gmail.com>.

In my opinion this would be useful - there was another thread where returning
only the value of first column in the result was mentioned.

Please create a SPARK JIRA and a pull request.

Cheers

On Mon, Jan 5, 2015 at 6:42 AM, tgbaggio <ge...@gmail.com> wrote:

> Hi,
>
> In  HBaseConverter.scala
> <
> https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/pythonconverters/HBaseConverters.scala
> >
> , the python converter HBaseResultToStringConverter return only the value
> of
> first column in the result. In my opinion, it limits the utility of this
> converter, because it returns only one value per row and moreover it loses
> the other information of record, such as column:cell, timestamp.
>
> Therefore, I would like to propose some modifications about
> HBaseResultToStringConverter which will be able to return all records in
> the
> hbase with more complete information: I have already written some code in
> pythonConverters.scala
> <
> https://github.com/GenTang/spark_hbase/blob/master/src/main/scala/examples/pythonConverters.scala
> >
> and it works
>
> Is it OK to modify the code in HBaseConverters.scala, please?
> Thanks a lot in advance.
>
> Cheers
> Gen
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/python-converter-in-HBaseConverter-scala-spark-examples-tp10001.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>