You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Roberto Gastaldelli <ga...@gmail.com> on 2014/05/28 09:45:23 UTC

Loading data with Sqoop

Hi there,

I came across the challenge of loading data from a RDBMS into a Phoenix
table using Sqoop, but that did not work well as Sqoop by default converts
all data types to string.

I came up with a solution to write a PutTransformer that maps the jdbc data
types to the Phoenix native data types.

Is there any interest to include this feature to the project? If so, I can
contribute.

Roberto.

Re: Loading data with Sqoop

Posted by Roberto Gastaldelli <ga...@gmail.com>.
Hi Ravi,

I'm using Phoenix native datatypes to serialize the data and identify
whether the type variable length or not and then constructing the rowkey,
then simply returning the list of put.

Sqoop implementation does not provide the required data to enable the
transformer to retrieve the metadata via Phoenix API, so I'm not querying
the metadata. The assumptions is that the source data type does not require
conversion, i.e. String to Date, etc.. As long as the input data type can
be mapped to a equivalent Phoenix Data type it will do automatically,
otherwise it will default to String.

Roberto.


On Fri, May 30, 2014 at 8:09 AM, Ravi Kiran <ma...@gmail.com>
wrote:

> Hi Roberto,
>    How are you constructing the composite row key and returning the Put
> from the transformer . Also, can you please throw some light on how you
> perform a look up on the data types of the columns within the transformer.
>
> Regards
> Ravi
>
>
>
> On Thu, May 29, 2014 at 3:02 PM, Roberto Gastaldelli <
> gastaldelli@gmail.com> wrote:
>
>> Hi James,
>>
>> I have extended the PutTransformer I've implemented and now it's loading
>> data into tables with composite primary key.
>>
>> Another scenario I'm still working on is to identify if the table is
>> salted, and load the data accordingly.
>>
>> Can you think in any other scenario?
>>
>> Roberto.
>> On 28/05/2014 6:01 PM, "Roberto Gastaldelli" <ga...@gmail.com>
>> wrote:
>>
>>> I haven't tested the load in tables with composite key, but I'll run
>>> some scenarios and check what can be done.
>>> On 28/05/2014 5:51 PM, "James Taylor" <ja...@apache.org> wrote:
>>>
>>>> Hi Roberto,
>>>>
>>>> Yes, thank you very much for asking - there's definitely interest. Does
>>>> it handle the case with a table that has a composite primary key definition?
>>>>
>>>> Thanks,
>>>> James
>>>>
>>>>
>>>> On Wed, May 28, 2014 at 12:45 AM, Roberto Gastaldelli <
>>>> gastaldelli@gmail.com> wrote:
>>>>
>>>>> Hi there,
>>>>>
>>>>> I came across the challenge of loading data from a RDBMS into a
>>>>> Phoenix table using Sqoop, but that did not work well as Sqoop by default
>>>>> converts all data types to string.
>>>>>
>>>>> I came up with a solution to write a PutTransformer that maps the jdbc
>>>>> data types to the Phoenix native data types.
>>>>>
>>>>> Is there any interest to include this feature to the project? If so, I
>>>>> can contribute.
>>>>>
>>>>> Roberto.
>>>>>
>>>>
>>>>
>

Re: Loading data with Sqoop

Posted by James Taylor <ja...@apache.org>.
Any updates on this one? It'd be very nice to have Sqoop integration.
Thanks,
James

On Tue, Jun 3, 2014 at 4:39 AM, Roberto Gastaldelli
<ga...@gmail.com> wrote:
> Hi Job,
>
> The tool is basically an implementation of PutTransformer class. When the
> Sqoop job is executed you can provide the jar file and the class using -D
> option.
>
> And to answer your second question, it will load the data directly to the
> HBase table.
>
> Roberto.
>
>
> On Mon, Jun 2, 2014 at 6:43 PM, Job Thomas <jo...@suntecgroup.com> wrote:
>>
>> Hi Roberto,
>>
>> How the tool is working?
>> Directly loading data from RDBMS to Phoenix using Sqoop? or loading after
>> bulk file is dumped into HDFS?
>>
>> Thanks & Regards
>> Job M Thomas
>>
>>
>> ________________________________
>>
>> From: Ravi Kiran [mailto:maghamravikiran@gmail.com]
>> Sent: Fri 5/30/2014 3:39 AM
>> To: user@phoenix.incubator.apache.org
>> Subject: Re: Loading data with Sqoop
>>
>>
>> Hi Roberto,
>>
>>    How are you constructing the composite row key and returning the Put
>> from the transformer . Also, can you please throw some light on how you
>> perform a look up on the data types of the columns within the transformer.
>>
>>
>> Regards
>> Ravi
>>
>>
>>
>>
>> On Thu, May 29, 2014 at 3:02 PM, Roberto Gastaldelli
>> <ga...@gmail.com> wrote:
>>
>>
>>         Hi James,
>>
>>         I have extended the PutTransformer I've implemented and now it's
>> loading data into tables with composite primary key.
>>
>>         Another scenario I'm still working on is to identify if the table
>> is salted, and load the data accordingly.
>>
>>         Can you think in any other scenario?
>>
>>         Roberto.
>>
>>         On 28/05/2014 6:01 PM, "Roberto Gastaldelli"
>> <ga...@gmail.com> wrote:
>>
>>
>>                 I haven't tested the load in tables with composite key,
>> but I'll run some scenarios and check what can be done.
>>
>>                 On 28/05/2014 5:51 PM, "James Taylor"
>> <ja...@apache.org> wrote:
>>
>>
>>                         Hi Roberto,
>>
>>                         Yes, thank you very much for asking - there's
>> definitely interest. Does it handle the case with a table that has a
>> composite primary key definition?
>>
>>                         Thanks,
>>                         James
>>
>>
>>                         On Wed, May 28, 2014 at 12:45 AM, Roberto
>> Gastaldelli <ga...@gmail.com> wrote:
>>
>>
>>                                 Hi there,
>>
>>                                 I came across the challenge of loading
>> data from a RDBMS into a Phoenix table using Sqoop, but that did not work
>> well as Sqoop by default converts all data types to string.
>>
>>                                 I came up with a solution to write a
>> PutTransformer that maps the jdbc data types to the Phoenix native data
>> types.
>>
>>                                 Is there any interest to include this
>> feature to the project? If so, I can contribute.
>>
>>                                 Roberto.
>>
>>
>>
>

Re: Loading data with Sqoop

Posted by Roberto Gastaldelli <ga...@gmail.com>.
Hi Job,

The tool is basically an implementation of PutTransformer class. When the
Sqoop job is executed you can provide the jar file and the class using -D
option.

And to answer your second question, it will load the data directly to the
HBase table.

Roberto.


On Mon, Jun 2, 2014 at 6:43 PM, Job Thomas <jo...@suntecgroup.com> wrote:

> Hi Roberto,
>
> How the tool is working?
> Directly loading data from RDBMS to Phoenix using Sqoop? or loading after
> bulk file is dumped into HDFS?
>
> Thanks & Regards
> Job M Thomas
>
>
> ________________________________
>
> From: Ravi Kiran [mailto:maghamravikiran@gmail.com]
> Sent: Fri 5/30/2014 3:39 AM
> To: user@phoenix.incubator.apache.org
> Subject: Re: Loading data with Sqoop
>
>
> Hi Roberto,
>
>    How are you constructing the composite row key and returning the Put
> from the transformer . Also, can you please throw some light on how you
> perform a look up on the data types of the columns within the transformer.
>
>
> Regards
> Ravi
>
>
>
>
> On Thu, May 29, 2014 at 3:02 PM, Roberto Gastaldelli <
> gastaldelli@gmail.com> wrote:
>
>
>         Hi James,
>
>         I have extended the PutTransformer I've implemented and now it's
> loading data into tables with composite primary key.
>
>         Another scenario I'm still working on is to identify if the table
> is salted, and load the data accordingly.
>
>         Can you think in any other scenario?
>
>         Roberto.
>
>         On 28/05/2014 6:01 PM, "Roberto Gastaldelli" <
> gastaldelli@gmail.com> wrote:
>
>
>                 I haven't tested the load in tables with composite key,
> but I'll run some scenarios and check what can be done.
>
>                 On 28/05/2014 5:51 PM, "James Taylor" <
> jamestaylor@apache.org> wrote:
>
>
>                         Hi Roberto,
>
>                         Yes, thank you very much for asking - there's
> definitely interest. Does it handle the case with a table that has a
> composite primary key definition?
>
>                         Thanks,
>                         James
>
>
>                         On Wed, May 28, 2014 at 12:45 AM, Roberto
> Gastaldelli <ga...@gmail.com> wrote:
>
>
>                                 Hi there,
>
>                                 I came across the challenge of loading
> data from a RDBMS into a Phoenix table using Sqoop, but that did not work
> well as Sqoop by default converts all data types to string.
>
>                                 I came up with a solution to write a
> PutTransformer that maps the jdbc data types to the Phoenix native data
> types.
>
>                                 Is there any interest to include this
> feature to the project? If so, I can contribute.
>
>                                 Roberto.
>
>
>
>

RE: Loading data with Sqoop

Posted by Job Thomas <jo...@suntecgroup.com>.
Hi Roberto,
 
How the tool is working?
Directly loading data from RDBMS to Phoenix using Sqoop? or loading after bulk file is dumped into HDFS?
 
Thanks & Regards
Job M Thomas 


________________________________

From: Ravi Kiran [mailto:maghamravikiran@gmail.com]
Sent: Fri 5/30/2014 3:39 AM
To: user@phoenix.incubator.apache.org
Subject: Re: Loading data with Sqoop


Hi Roberto,

   How are you constructing the composite row key and returning the Put from the transformer . Also, can you please throw some light on how you perform a look up on the data types of the columns within the transformer.


Regards
Ravi




On Thu, May 29, 2014 at 3:02 PM, Roberto Gastaldelli <ga...@gmail.com> wrote:


	Hi James,

	I have extended the PutTransformer I've implemented and now it's loading data into tables with composite primary key.

	Another scenario I'm still working on is to identify if the table is salted, and load the data accordingly.

	Can you think in any other scenario?

	Roberto.

	On 28/05/2014 6:01 PM, "Roberto Gastaldelli" <ga...@gmail.com> wrote:
	

		I haven't tested the load in tables with composite key, but I'll run some scenarios and check what can be done.

		On 28/05/2014 5:51 PM, "James Taylor" <ja...@apache.org> wrote:
		

			Hi Roberto, 

			Yes, thank you very much for asking - there's definitely interest. Does it handle the case with a table that has a composite primary key definition? 

			Thanks,
			James


			On Wed, May 28, 2014 at 12:45 AM, Roberto Gastaldelli <ga...@gmail.com> wrote:
			

				Hi there,

				I came across the challenge of loading data from a RDBMS into a Phoenix table using Sqoop, but that did not work well as Sqoop by default converts all data types to string.

				I came up with a solution to write a PutTransformer that maps the jdbc data types to the Phoenix native data types.

				Is there any interest to include this feature to the project? If so, I can contribute.

				Roberto.




Re: Loading data with Sqoop

Posted by Ravi Kiran <ma...@gmail.com>.
Hi Roberto,
   How are you constructing the composite row key and returning the Put
from the transformer . Also, can you please throw some light on how you
perform a look up on the data types of the columns within the transformer.

Regards
Ravi



On Thu, May 29, 2014 at 3:02 PM, Roberto Gastaldelli <ga...@gmail.com>
wrote:

> Hi James,
>
> I have extended the PutTransformer I've implemented and now it's loading
> data into tables with composite primary key.
>
> Another scenario I'm still working on is to identify if the table is
> salted, and load the data accordingly.
>
> Can you think in any other scenario?
>
> Roberto.
> On 28/05/2014 6:01 PM, "Roberto Gastaldelli" <ga...@gmail.com>
> wrote:
>
>> I haven't tested the load in tables with composite key, but I'll run some
>> scenarios and check what can be done.
>> On 28/05/2014 5:51 PM, "James Taylor" <ja...@apache.org> wrote:
>>
>>> Hi Roberto,
>>>
>>> Yes, thank you very much for asking - there's definitely interest. Does
>>> it handle the case with a table that has a composite primary key definition?
>>>
>>> Thanks,
>>> James
>>>
>>>
>>> On Wed, May 28, 2014 at 12:45 AM, Roberto Gastaldelli <
>>> gastaldelli@gmail.com> wrote:
>>>
>>>> Hi there,
>>>>
>>>> I came across the challenge of loading data from a RDBMS into a Phoenix
>>>> table using Sqoop, but that did not work well as Sqoop by default converts
>>>> all data types to string.
>>>>
>>>> I came up with a solution to write a PutTransformer that maps the jdbc
>>>> data types to the Phoenix native data types.
>>>>
>>>> Is there any interest to include this feature to the project? If so, I
>>>> can contribute.
>>>>
>>>> Roberto.
>>>>
>>>
>>>

Re: Loading data with Sqoop

Posted by Roberto Gastaldelli <ga...@gmail.com>.
Hi James,

The current Sqoop architecture does not allow for using Phoenix API to
insert the data. What I have done, and it is working for my requirements,
is leveraging Phoenix data types to serialize the data and using standard
HBase Put. There is not simple workaround to obtain the PHoenix metadata
and use Phoenix API to load the data without introducing changes to Sqoop.
But I would say it is worth to raise a JIRA for Sqoop to be more extensible.

So at the moment the load process will not work on tables where salt is
managed by Phoenix, and indexes will not be automatically updated. For the
former, a workaround is to use Sqoop to update the index table directly as
a separated job, and that is what I'm doing.

Roberto.


On Fri, May 30, 2014 at 8:08 AM, James Taylor <ja...@apache.org>
wrote:

> Excellent, Roberto. What Phoenix APIs are you using to do the insertion
> (as this may save you some effort)? The only other scenario is if a table
> has secondary indexes. If you're going through Phoenix APIs, this would be
> taken care of automatically.
> Thanks!
> James
>
>
> On Thu, May 29, 2014 at 3:02 PM, Roberto Gastaldelli <
> gastaldelli@gmail.com> wrote:
>
>> Hi James,
>>
>> I have extended the PutTransformer I've implemented and now it's loading
>> data into tables with composite primary key.
>>
>> Another scenario I'm still working on is to identify if the table is
>> salted, and load the data accordingly.
>>
>> Can you think in any other scenario?
>>
>> Roberto.
>> On 28/05/2014 6:01 PM, "Roberto Gastaldelli" <ga...@gmail.com>
>> wrote:
>>
>>> I haven't tested the load in tables with composite key, but I'll run
>>> some scenarios and check what can be done.
>>> On 28/05/2014 5:51 PM, "James Taylor" <ja...@apache.org> wrote:
>>>
>>>> Hi Roberto,
>>>>
>>>> Yes, thank you very much for asking - there's definitely interest. Does
>>>> it handle the case with a table that has a composite primary key definition?
>>>>
>>>> Thanks,
>>>> James
>>>>
>>>>
>>>> On Wed, May 28, 2014 at 12:45 AM, Roberto Gastaldelli <
>>>> gastaldelli@gmail.com> wrote:
>>>>
>>>>> Hi there,
>>>>>
>>>>> I came across the challenge of loading data from a RDBMS into a
>>>>> Phoenix table using Sqoop, but that did not work well as Sqoop by default
>>>>> converts all data types to string.
>>>>>
>>>>> I came up with a solution to write a PutTransformer that maps the jdbc
>>>>> data types to the Phoenix native data types.
>>>>>
>>>>> Is there any interest to include this feature to the project? If so, I
>>>>> can contribute.
>>>>>
>>>>> Roberto.
>>>>>
>>>>
>>>>
>

Re: Loading data with Sqoop

Posted by James Taylor <ja...@apache.org>.
Excellent, Roberto. What Phoenix APIs are you using to do the insertion (as
this may save you some effort)? The only other scenario is if a table has
secondary indexes. If you're going through Phoenix APIs, this would be
taken care of automatically.
Thanks!
James


On Thu, May 29, 2014 at 3:02 PM, Roberto Gastaldelli <ga...@gmail.com>
wrote:

> Hi James,
>
> I have extended the PutTransformer I've implemented and now it's loading
> data into tables with composite primary key.
>
> Another scenario I'm still working on is to identify if the table is
> salted, and load the data accordingly.
>
> Can you think in any other scenario?
>
> Roberto.
> On 28/05/2014 6:01 PM, "Roberto Gastaldelli" <ga...@gmail.com>
> wrote:
>
>> I haven't tested the load in tables with composite key, but I'll run some
>> scenarios and check what can be done.
>> On 28/05/2014 5:51 PM, "James Taylor" <ja...@apache.org> wrote:
>>
>>> Hi Roberto,
>>>
>>> Yes, thank you very much for asking - there's definitely interest. Does
>>> it handle the case with a table that has a composite primary key definition?
>>>
>>> Thanks,
>>> James
>>>
>>>
>>> On Wed, May 28, 2014 at 12:45 AM, Roberto Gastaldelli <
>>> gastaldelli@gmail.com> wrote:
>>>
>>>> Hi there,
>>>>
>>>> I came across the challenge of loading data from a RDBMS into a Phoenix
>>>> table using Sqoop, but that did not work well as Sqoop by default converts
>>>> all data types to string.
>>>>
>>>> I came up with a solution to write a PutTransformer that maps the jdbc
>>>> data types to the Phoenix native data types.
>>>>
>>>> Is there any interest to include this feature to the project? If so, I
>>>> can contribute.
>>>>
>>>> Roberto.
>>>>
>>>
>>>

Re: Loading data with Sqoop

Posted by Roberto Gastaldelli <ga...@gmail.com>.
Hi James,

I have extended the PutTransformer I've implemented and now it's loading
data into tables with composite primary key.

Another scenario I'm still working on is to identify if the table is
salted, and load the data accordingly.

Can you think in any other scenario?

Roberto.
On 28/05/2014 6:01 PM, "Roberto Gastaldelli" <ga...@gmail.com> wrote:

> I haven't tested the load in tables with composite key, but I'll run some
> scenarios and check what can be done.
> On 28/05/2014 5:51 PM, "James Taylor" <ja...@apache.org> wrote:
>
>> Hi Roberto,
>>
>> Yes, thank you very much for asking - there's definitely interest. Does
>> it handle the case with a table that has a composite primary key definition?
>>
>> Thanks,
>> James
>>
>>
>> On Wed, May 28, 2014 at 12:45 AM, Roberto Gastaldelli <
>> gastaldelli@gmail.com> wrote:
>>
>>> Hi there,
>>>
>>> I came across the challenge of loading data from a RDBMS into a Phoenix
>>> table using Sqoop, but that did not work well as Sqoop by default converts
>>> all data types to string.
>>>
>>> I came up with a solution to write a PutTransformer that maps the jdbc
>>> data types to the Phoenix native data types.
>>>
>>> Is there any interest to include this feature to the project? If so, I
>>> can contribute.
>>>
>>> Roberto.
>>>
>>
>>

Re: Loading data with Sqoop

Posted by Roberto Gastaldelli <ga...@gmail.com>.
I haven't tested the load in tables with composite key, but I'll run some
scenarios and check what can be done.
On 28/05/2014 5:51 PM, "James Taylor" <ja...@apache.org> wrote:

> Hi Roberto,
>
> Yes, thank you very much for asking - there's definitely interest. Does it
> handle the case with a table that has a composite primary key definition?
>
> Thanks,
> James
>
>
> On Wed, May 28, 2014 at 12:45 AM, Roberto Gastaldelli <
> gastaldelli@gmail.com> wrote:
>
>> Hi there,
>>
>> I came across the challenge of loading data from a RDBMS into a Phoenix
>> table using Sqoop, but that did not work well as Sqoop by default converts
>> all data types to string.
>>
>> I came up with a solution to write a PutTransformer that maps the jdbc
>> data types to the Phoenix native data types.
>>
>> Is there any interest to include this feature to the project? If so, I
>> can contribute.
>>
>> Roberto.
>>
>
>

Re: Loading data with Sqoop

Posted by James Taylor <ja...@apache.org>.
Hi Roberto,

Yes, thank you very much for asking - there's definitely interest. Does it
handle the case with a table that has a composite primary key definition?

Thanks,
James


On Wed, May 28, 2014 at 12:45 AM, Roberto Gastaldelli <gastaldelli@gmail.com
> wrote:

> Hi there,
>
> I came across the challenge of loading data from a RDBMS into a Phoenix
> table using Sqoop, but that did not work well as Sqoop by default converts
> all data types to string.
>
> I came up with a solution to write a PutTransformer that maps the jdbc
> data types to the Phoenix native data types.
>
> Is there any interest to include this feature to the project? If so, I can
> contribute.
>
> Roberto.
>