You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by anil gupta <an...@buffalo.edu> on 2012/03/16 23:09:07 UTC

Re: Importing more than one column family in Hbase through Sqoop

Hi Kathleen,

Sorry for the delayed reply as i started working on HBase rather than
Sqoop.
Here is an example code from the book "HBase:The Definitive Guide" which
will show that it is possible to load data into more than one column family
through java api which was exactly the point i was trying to make.

Have a look at these two classes:
https://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/util/HBaseHelper.java
https://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/filters/PrefixFilterExample.java

Please let me know if you have further questions.

Thanks,
Anil

On Fri, Feb 24, 2012 at 9:36 PM, Kathleen Ting <ka...@cloudera.com>wrote:

> Hi Anil,
>
> re: Is the above scenario not possible in Hbase Java api?
> I would suggest asking that on user@hbase.apache.org.
>
> Thanks,
> Kathleen
>
> On Wed, Feb 22, 2012 at 2:26 PM, anil gupta <an...@buffalo.edu> wrote:
>
>> Hi Kathleen,
>>
>> I think my previous messages were misinterpreted, in previous message i
>> was talking about generating separate put statement for separate
>> columnfamily. I am having hard time understanding how this would violate
>> the Hbase atomicity rule?
>>
>> For instance, on hbase shell my put statement would be like this for two
>> column family:
>> hbase shell>put 'merchant_data', '1', 'info:name', 'starbucks'
>> hbase shell>put 'merchant_data', '1', 'user_reviews:id', '4545'
>>
>> Similarly, this can be achieved by using java api of HBase which sqoop is
>> using. Is the above scenario not possible in Hbase Java api?
>>
>> Thanks,
>> Anil
>>
>>
>>
>> On Wed, Feb 22, 2012 at 2:02 PM, Kathleen Ting <ka...@cloudera.com>wrote:
>>
>>> Hi Anil -
>>>
>>> Good question and sorry for any confusion earlier. To be sure, because
>>> HBase permits atomic operations across a single column family only, Sqoop
>>> can not support multiple column families.
>>>
>>> Regards, Kathleen
>>>
>>> On Wed, Feb 22, 2012 at 12:43 PM, anil gupta <an...@buffalo.edu>wrote:
>>>
>>>> Hi Kathleen,
>>>>
>>>> Yes, that is always an option. Thanks for suggestion.
>>>>
>>>> I am a beginner at HBase. However, I was thinking of cutting down the
>>>> time to dump the data from Database. If i do it twice(assuming i have 2
>>>> column families) then it increases the time of load the entire HBase table.
>>>> AFAIK, Sqoop generates put statements to import data into HBase. If we
>>>> can generate put statements for more than one column family. Would it
>>>> violate the atomicity principle of HBase? I went through the atomicity
>>>> section of http://hbase.apache.org/acid-semantics.html and I cant find
>>>> anything which would stop sqoop loading more than one column family and
>>>> Hbase bulk load also allows more than one column family although the
>>>> approach of  HBase bulk loading might be different from Sqoop. Could you
>>>> provide me more insight?  Sorry, if my question is dumb.
>>>>
>>>> Thanks,
>>>> Anil Gupta
>>>>
>>>>
>>>> On Wed, Feb 22, 2012 at 11:51 AM, Kathleen Ting <ka...@cloudera.com>wrote:
>>>>
>>>>> Hi Anil,
>>>>>
>>>>> Sqoop does not support multiple column families because HBase only
>>>>> permits atomic operations.
>>>>>
>>>>> One workaround is to run two imports, specifying a different column
>>>>> family each time.
>>>>>
>>>>> Regards,
>>>>> Kathleen
>>>>>
>>>>> On Wed, Feb 22, 2012 at 11:31 AM, anil gupta <an...@gmail.com>wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I went through the User guide of Sqoop but i could not find anything
>>>>>> for importing more than one columnfamily in HBase. Am i missing something?
>>>>>> Is it planned for future release?
>>>>>>
>>>>>> --
>>>>>> Thanks & Regards,
>>>>>> Anil Gupta
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks & Regards,
>>>> Anil Gupta
>>>>
>>>
>>>
>>
>>
>> --
>> Thanks & Regards,
>> Anil Gupta
>>
>
>


-- 
Thanks & Regards,
Anil Gupta

Re: Importing more than one column family in Hbase through Sqoop

Posted by anil gupta <an...@buffalo.edu>.
Hi Kathleen,

Here is the jira i filed for this stuff:
https://issues.apache.org/jira/browse/SQOOP-472

Thanks,
Anil Gupta

On Mon, Mar 19, 2012 at 12:58 PM, Kathleen Ting <ka...@apache.org> wrote:

> Anil -
>
> Understood. As it happens, the HBase release that supported atomicity came
> after the Sqoop release that included HBase integration, hence the
> limitation.
>
> Please go ahead and file a Sqoop JIRA requesting that Sqoop needs a CLI
> way to let the user specify multiple column families.
>
> Regards, Kathleen
>
> On Fri, Mar 16, 2012 at 3:09 PM, anil gupta <an...@buffalo.edu> wrote:
>
>> Hi Kathleen,
>>
>> Sorry for the delayed reply as i started working on HBase rather than
>> Sqoop.
>> Here is an example code from the book "HBase:The Definitive Guide" which
>> will show that it is possible to load data into more than one column family
>> through java api which was exactly the point i was trying to make.
>>
>> Have a look at these two classes:
>>
>> https://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/util/HBaseHelper.java
>>
>> https://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/filters/PrefixFilterExample.java
>>
>> Please let me know if you have further questions.
>>
>> Thanks,
>> Anil
>>
>> On Fri, Feb 24, 2012 at 9:36 PM, Kathleen Ting <ka...@cloudera.com>wrote:
>>
>>> Hi Anil,
>>>
>>> re: Is the above scenario not possible in Hbase Java api?
>>> I would suggest asking that on user@hbase.apache.org.
>>>
>>> Thanks,
>>> Kathleen
>>>
>>> On Wed, Feb 22, 2012 at 2:26 PM, anil gupta <an...@buffalo.edu>wrote:
>>>
>>>> Hi Kathleen,
>>>>
>>>> I think my previous messages were misinterpreted, in previous message i
>>>> was talking about generating separate put statement for separate
>>>> columnfamily. I am having hard time understanding how this would violate
>>>> the Hbase atomicity rule?
>>>>
>>>> For instance, on hbase shell my put statement would be like this for
>>>> two column family:
>>>> hbase shell>put 'merchant_data', '1', 'info:name', 'starbucks'
>>>> hbase shell>put 'merchant_data', '1', 'user_reviews:id', '4545'
>>>>
>>>> Similarly, this can be achieved by using java api of HBase which sqoop
>>>> is using. Is the above scenario not possible in Hbase Java api?
>>>>
>>>> Thanks,
>>>> Anil
>>>>
>>>>
>>>>
>>>> On Wed, Feb 22, 2012 at 2:02 PM, Kathleen Ting <ka...@cloudera.com>wrote:
>>>>
>>>>> Hi Anil -
>>>>>
>>>>> Good question and sorry for any confusion earlier. To be sure, because
>>>>> HBase permits atomic operations across a single column family only, Sqoop
>>>>> can not support multiple column families.
>>>>>
>>>>> Regards, Kathleen
>>>>>
>>>>> On Wed, Feb 22, 2012 at 12:43 PM, anil gupta <an...@buffalo.edu>wrote:
>>>>>
>>>>>> Hi Kathleen,
>>>>>>
>>>>>> Yes, that is always an option. Thanks for suggestion.
>>>>>>
>>>>>> I am a beginner at HBase. However, I was thinking of cutting down the
>>>>>> time to dump the data from Database. If i do it twice(assuming i have 2
>>>>>> column families) then it increases the time of load the entire HBase table.
>>>>>> AFAIK, Sqoop generates put statements to import data into HBase. If
>>>>>> we can generate put statements for more than one column family. Would it
>>>>>> violate the atomicity principle of HBase? I went through the atomicity
>>>>>> section of http://hbase.apache.org/acid-semantics.html and I cant
>>>>>> find anything which would stop sqoop loading more than one column family
>>>>>> and Hbase bulk load also allows more than one column family although the
>>>>>> approach of  HBase bulk loading might be different from Sqoop. Could you
>>>>>> provide me more insight?  Sorry, if my question is dumb.
>>>>>>
>>>>>> Thanks,
>>>>>> Anil Gupta
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 22, 2012 at 11:51 AM, Kathleen Ting <
>>>>>> kathleen@cloudera.com> wrote:
>>>>>>
>>>>>>> Hi Anil,
>>>>>>>
>>>>>>> Sqoop does not support multiple column families because HBase only
>>>>>>> permits atomic operations.
>>>>>>>
>>>>>>> One workaround is to run two imports, specifying a different column
>>>>>>> family each time.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Kathleen
>>>>>>>
>>>>>>> On Wed, Feb 22, 2012 at 11:31 AM, anil gupta <an...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> I went through the User guide of Sqoop but i could not find
>>>>>>>> anything for importing more than one columnfamily in HBase. Am i missing
>>>>>>>> something? Is it planned for future release?
>>>>>>>>
>>>>>>>> --
>>>>>>>> Thanks & Regards,
>>>>>>>> Anil Gupta
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks & Regards,
>>>>>> Anil Gupta
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks & Regards,
>>>> Anil Gupta
>>>>
>>>
>>>
>>
>>
>> --
>> Thanks & Regards,
>> Anil Gupta
>>
>
>


-- 
Thanks & Regards,
Anil Gupta

Re: Importing more than one column family in Hbase through Sqoop

Posted by Kathleen Ting <ka...@apache.org>.
Anil -

Understood. As it happens, the HBase release that supported atomicity came
after the Sqoop release that included HBase integration, hence the
limitation.

Please go ahead and file a Sqoop JIRA requesting that Sqoop needs a CLI way
to let the user specify multiple column families.

Regards, Kathleen

On Fri, Mar 16, 2012 at 3:09 PM, anil gupta <an...@buffalo.edu> wrote:

> Hi Kathleen,
>
> Sorry for the delayed reply as i started working on HBase rather than
> Sqoop.
> Here is an example code from the book "HBase:The Definitive Guide" which
> will show that it is possible to load data into more than one column family
> through java api which was exactly the point i was trying to make.
>
> Have a look at these two classes:
>
> https://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/util/HBaseHelper.java
>
> https://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/filters/PrefixFilterExample.java
>
> Please let me know if you have further questions.
>
> Thanks,
> Anil
>
> On Fri, Feb 24, 2012 at 9:36 PM, Kathleen Ting <ka...@cloudera.com>wrote:
>
>> Hi Anil,
>>
>> re: Is the above scenario not possible in Hbase Java api?
>> I would suggest asking that on user@hbase.apache.org.
>>
>> Thanks,
>> Kathleen
>>
>> On Wed, Feb 22, 2012 at 2:26 PM, anil gupta <an...@buffalo.edu> wrote:
>>
>>> Hi Kathleen,
>>>
>>> I think my previous messages were misinterpreted, in previous message i
>>> was talking about generating separate put statement for separate
>>> columnfamily. I am having hard time understanding how this would violate
>>> the Hbase atomicity rule?
>>>
>>> For instance, on hbase shell my put statement would be like this for two
>>> column family:
>>> hbase shell>put 'merchant_data', '1', 'info:name', 'starbucks'
>>> hbase shell>put 'merchant_data', '1', 'user_reviews:id', '4545'
>>>
>>> Similarly, this can be achieved by using java api of HBase which sqoop
>>> is using. Is the above scenario not possible in Hbase Java api?
>>>
>>> Thanks,
>>> Anil
>>>
>>>
>>>
>>> On Wed, Feb 22, 2012 at 2:02 PM, Kathleen Ting <ka...@cloudera.com>wrote:
>>>
>>>> Hi Anil -
>>>>
>>>> Good question and sorry for any confusion earlier. To be sure, because
>>>> HBase permits atomic operations across a single column family only, Sqoop
>>>> can not support multiple column families.
>>>>
>>>> Regards, Kathleen
>>>>
>>>> On Wed, Feb 22, 2012 at 12:43 PM, anil gupta <an...@buffalo.edu>wrote:
>>>>
>>>>> Hi Kathleen,
>>>>>
>>>>> Yes, that is always an option. Thanks for suggestion.
>>>>>
>>>>> I am a beginner at HBase. However, I was thinking of cutting down the
>>>>> time to dump the data from Database. If i do it twice(assuming i have 2
>>>>> column families) then it increases the time of load the entire HBase table.
>>>>> AFAIK, Sqoop generates put statements to import data into HBase. If we
>>>>> can generate put statements for more than one column family. Would it
>>>>> violate the atomicity principle of HBase? I went through the atomicity
>>>>> section of http://hbase.apache.org/acid-semantics.html and I cant
>>>>> find anything which would stop sqoop loading more than one column family
>>>>> and Hbase bulk load also allows more than one column family although the
>>>>> approach of  HBase bulk loading might be different from Sqoop. Could you
>>>>> provide me more insight?  Sorry, if my question is dumb.
>>>>>
>>>>> Thanks,
>>>>> Anil Gupta
>>>>>
>>>>>
>>>>> On Wed, Feb 22, 2012 at 11:51 AM, Kathleen Ting <kathleen@cloudera.com
>>>>> > wrote:
>>>>>
>>>>>> Hi Anil,
>>>>>>
>>>>>> Sqoop does not support multiple column families because HBase only
>>>>>> permits atomic operations.
>>>>>>
>>>>>> One workaround is to run two imports, specifying a different column
>>>>>> family each time.
>>>>>>
>>>>>> Regards,
>>>>>> Kathleen
>>>>>>
>>>>>> On Wed, Feb 22, 2012 at 11:31 AM, anil gupta <an...@gmail.com>wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I went through the User guide of Sqoop but i could not find anything
>>>>>>> for importing more than one columnfamily in HBase. Am i missing something?
>>>>>>> Is it planned for future release?
>>>>>>>
>>>>>>> --
>>>>>>> Thanks & Regards,
>>>>>>> Anil Gupta
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks & Regards,
>>>>> Anil Gupta
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks & Regards,
>>> Anil Gupta
>>>
>>
>>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>