You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Vimal Jain <vk...@gmail.com> on 2013/06/27 09:11:23 UTC

Problems while exporting from Hbase to CSV file

Hi,
I am trying to export from hbase to a CSV file.
I am using "Scan" class to scan all data  in the table.
But i am facing some problems while doing it.

1) My table has around 1.5 million rows  and around 150 columns for each
row , so i can not use default scan() constructor as it will scan whole
table in one go which results in OutOfMemory error in client process.I
heard of using setCaching() and setBatch() but i am not able to understand
how it will solve OOM error.

I thought of providing startRow and stopRow in scan object but i want to
scan whole table so how will this help ?

2) As hbase stores data for a row only when we explicitly provide it and
their is no concept of default value as found in RDBMS , i want to have
each and evey column in the CSV file i generate for every user.In case
column values are not there in hbase , i want to use default  values for
them(I have list of default values for each column). Is there any method in
Result class or any other class to accomplish this ?


Please help here.

-- 
Thanks and Regards,
Vimal Jain

Re: Problems while exporting from Hbase to CSV file

Posted by Michael Segel <mi...@hotmail.com>.

Yeah, that's the point. 

You fetch, you iterate through the returned set, you get the next batch. 
The only way he could get OOM is in his code.


On Jun 28, 2013, at 7:23 AM, Anoop John <an...@gmail.com> wrote:

>> so i can not use default scan() constructor as it will scan whole
> table in one go which results in OutOfMemory error in client process
> 
> Not getting what you mean by this. Client calls next() on the Scanner and
> gets the rows. The setCaching() and setBatch() determines how much of data
> (rows, cells) will get retrieved from RS to client in one next() call to
> server.  So if caching is set as 100 you will be having 100 rows in the
> ClientScanner cache. Which version you are using? In older versions the
> caching default value was 1 only. Later it is changed to 100 .
> 
> 
> -Anoop-
> 
> On Fri, Jun 28, 2013 at 4:02 AM, Michael Segel <mi...@hotmail.com>wrote:
> 
>> Phoenix, Hive, Pig, Java would all work.
>> But to Azury Yu's post...
>> 
>> The OP is doing a simple scan() to get rows.
>> If the OP is hitting an OOM exception then its a code issue on the part of
>> the OP.
>> 
>> 
>> On Jun 27, 2013, at 2:22 AM, Azuryy Yu <az...@gmail.com> wrote:
>> 
>>> Sorry, maybe Phonex is not suitable for you.
>>> 
>>> 
>>> On Thu, Jun 27, 2013 at 3:21 PM, Azuryy Yu <az...@gmail.com> wrote:
>>> 
>>>> 1) Scan.setCaching() to specify the number of rows for caching that will
>>>> be passed to scanners.
>>>>   and what's your block cache size?
>>>> 
>>>>   but if OOM from the client, not sever side, then I don't think this
>> is
>>>> Scan related, please check your client code.
>>>> 
>>>> 2) we cannot add default value from HBase,  but you can add it on your
>>>> client when iterate the Result.
>>>> 
>>>> Also, you can using Phonex, this is cool for your scenario.
>>>> https://github.com/forcedotcom/phoenix
>>>> 
>>>> 
>>>> 
>>>> On Thu, Jun 27, 2013 at 3:11 PM, Vimal Jain <vk...@gmail.com> wrote:
>>>> 
>>>>> Hi,
>>>>> I am trying to export from hbase to a CSV file.
>>>>> I am using "Scan" class to scan all data  in the table.
>>>>> But i am facing some problems while doing it.
>>>>> 
>>>>> 1) My table has around 1.5 million rows  and around 150 columns for
>> each
>>>>> row , so i can not use default scan() constructor as it will scan whole
>>>>> table in one go which results in OutOfMemory error in client process.I
>>>>> heard of using setCaching() and setBatch() but i am not able to
>> understand
>>>>> how it will solve OOM error.
>>>>> 
>>>>> I thought of providing startRow and stopRow in scan object but i want
>> to
>>>>> scan whole table so how will this help ?
>>>>> 
>>>>> 2) As hbase stores data for a row only when we explicitly provide it
>> and
>>>>> their is no concept of default value as found in RDBMS , i want to have
>>>>> each and evey column in the CSV file i generate for every user.In case
>>>>> column values are not there in hbase , i want to use default  values
>> for
>>>>> them(I have list of default values for each column). Is there any
>> method
>>>>> in
>>>>> Result class or any other class to accomplish this ?
>>>>> 
>>>>> 
>>>>> Please help here.
>>>>> 
>>>>> --
>>>>> Thanks and Regards,
>>>>> Vimal Jain
>>>>> 
>>>> 
>>>> 
>> 
>>

Re: Problems while exporting from Hbase to CSV file

Posted by Anoop John <an...@gmail.com>.

> so i can not use default scan() constructor as it will scan whole
table in one go which results in OutOfMemory error in client process

Not getting what you mean by this. Client calls next() on the Scanner and
gets the rows. The setCaching() and setBatch() determines how much of data
(rows, cells) will get retrieved from RS to client in one next() call to
server.  So if caching is set as 100 you will be having 100 rows in the
ClientScanner cache. Which version you are using? In older versions the
caching default value was 1 only. Later it is changed to 100 .


-Anoop-

On Fri, Jun 28, 2013 at 4:02 AM, Michael Segel <mi...@hotmail.com>wrote:

> Phoenix, Hive, Pig, Java would all work.
> But to Azury Yu's post...
>
> The OP is doing a simple scan() to get rows.
> If the OP is hitting an OOM exception then its a code issue on the part of
> the OP.
>
>
> On Jun 27, 2013, at 2:22 AM, Azuryy Yu <az...@gmail.com> wrote:
>
> > Sorry, maybe Phonex is not suitable for you.
> >
> >
> > On Thu, Jun 27, 2013 at 3:21 PM, Azuryy Yu <az...@gmail.com> wrote:
> >
> >> 1) Scan.setCaching() to specify the number of rows for caching that will
> >> be passed to scanners.
> >>    and what's your block cache size?
> >>
> >>    but if OOM from the client, not sever side, then I don't think this
> is
> >> Scan related, please check your client code.
> >>
> >> 2) we cannot add default value from HBase,  but you can add it on your
> >> client when iterate the Result.
> >>
> >> Also, you can using Phonex, this is cool for your scenario.
> >> https://github.com/forcedotcom/phoenix
> >>
> >>
> >>
> >> On Thu, Jun 27, 2013 at 3:11 PM, Vimal Jain <vk...@gmail.com> wrote:
> >>
> >>> Hi,
> >>> I am trying to export from hbase to a CSV file.
> >>> I am using "Scan" class to scan all data  in the table.
> >>> But i am facing some problems while doing it.
> >>>
> >>> 1) My table has around 1.5 million rows  and around 150 columns for
> each
> >>> row , so i can not use default scan() constructor as it will scan whole
> >>> table in one go which results in OutOfMemory error in client process.I
> >>> heard of using setCaching() and setBatch() but i am not able to
> understand
> >>> how it will solve OOM error.
> >>>
> >>> I thought of providing startRow and stopRow in scan object but i want
> to
> >>> scan whole table so how will this help ?
> >>>
> >>> 2) As hbase stores data for a row only when we explicitly provide it
> and
> >>> their is no concept of default value as found in RDBMS , i want to have
> >>> each and evey column in the CSV file i generate for every user.In case
> >>> column values are not there in hbase , i want to use default  values
> for
> >>> them(I have list of default values for each column). Is there any
> method
> >>> in
> >>> Result class or any other class to accomplish this ?
> >>>
> >>>
> >>> Please help here.
> >>>
> >>> --
> >>> Thanks and Regards,
> >>> Vimal Jain
> >>>
> >>
> >>
>
>

Re: Problems while exporting from Hbase to CSV file

Posted by Michael Segel <mi...@hotmail.com>.

Phoenix, Hive, Pig, Java would all work. 
But to Azury Yu's post... 

The OP is doing a simple scan() to get rows. 
If the OP is hitting an OOM exception then its a code issue on the part of the OP. 


On Jun 27, 2013, at 2:22 AM, Azuryy Yu <az...@gmail.com> wrote:

> Sorry, maybe Phonex is not suitable for you.
> 
> 
> On Thu, Jun 27, 2013 at 3:21 PM, Azuryy Yu <az...@gmail.com> wrote:
> 
>> 1) Scan.setCaching() to specify the number of rows for caching that will
>> be passed to scanners.
>>    and what's your block cache size?
>> 
>>    but if OOM from the client, not sever side, then I don't think this is
>> Scan related, please check your client code.
>> 
>> 2) we cannot add default value from HBase,  but you can add it on your
>> client when iterate the Result.
>> 
>> Also, you can using Phonex, this is cool for your scenario.
>> https://github.com/forcedotcom/phoenix
>> 
>> 
>> 
>> On Thu, Jun 27, 2013 at 3:11 PM, Vimal Jain <vk...@gmail.com> wrote:
>> 
>>> Hi,
>>> I am trying to export from hbase to a CSV file.
>>> I am using "Scan" class to scan all data  in the table.
>>> But i am facing some problems while doing it.
>>> 
>>> 1) My table has around 1.5 million rows  and around 150 columns for each
>>> row , so i can not use default scan() constructor as it will scan whole
>>> table in one go which results in OutOfMemory error in client process.I
>>> heard of using setCaching() and setBatch() but i am not able to understand
>>> how it will solve OOM error.
>>> 
>>> I thought of providing startRow and stopRow in scan object but i want to
>>> scan whole table so how will this help ?
>>> 
>>> 2) As hbase stores data for a row only when we explicitly provide it and
>>> their is no concept of default value as found in RDBMS , i want to have
>>> each and evey column in the CSV file i generate for every user.In case
>>> column values are not there in hbase , i want to use default  values for
>>> them(I have list of default values for each column). Is there any method
>>> in
>>> Result class or any other class to accomplish this ?
>>> 
>>> 
>>> Please help here.
>>> 
>>> --
>>> Thanks and Regards,
>>> Vimal Jain
>>> 
>> 
>>

Re: Problems while exporting from Hbase to CSV file

Posted by Azuryy Yu <az...@gmail.com>.

Sorry, maybe Phonex is not suitable for you.


On Thu, Jun 27, 2013 at 3:21 PM, Azuryy Yu <az...@gmail.com> wrote:

> 1) Scan.setCaching() to specify the number of rows for caching that will
> be passed to scanners.
>     and what's your block cache size?
>
>     but if OOM from the client, not sever side, then I don't think this is
> Scan related, please check your client code.
>
> 2) we cannot add default value from HBase,  but you can add it on your
> client when iterate the Result.
>
> Also, you can using Phonex, this is cool for your scenario.
> https://github.com/forcedotcom/phoenix
>
>
>
> On Thu, Jun 27, 2013 at 3:11 PM, Vimal Jain <vk...@gmail.com> wrote:
>
>> Hi,
>> I am trying to export from hbase to a CSV file.
>> I am using "Scan" class to scan all data  in the table.
>> But i am facing some problems while doing it.
>>
>> 1) My table has around 1.5 million rows  and around 150 columns for each
>> row , so i can not use default scan() constructor as it will scan whole
>> table in one go which results in OutOfMemory error in client process.I
>> heard of using setCaching() and setBatch() but i am not able to understand
>> how it will solve OOM error.
>>
>> I thought of providing startRow and stopRow in scan object but i want to
>> scan whole table so how will this help ?
>>
>> 2) As hbase stores data for a row only when we explicitly provide it and
>> their is no concept of default value as found in RDBMS , i want to have
>> each and evey column in the CSV file i generate for every user.In case
>> column values are not there in hbase , i want to use default  values for
>> them(I have list of default values for each column). Is there any method
>> in
>> Result class or any other class to accomplish this ?
>>
>>
>> Please help here.
>>
>> --
>> Thanks and Regards,
>> Vimal Jain
>>
>
>

Re: Problems while exporting from Hbase to CSV file

Posted by Azuryy Yu <az...@gmail.com>.

1) Scan.setCaching() to specify the number of rows for caching that will be
passed to scanners.
    and what's your block cache size?

    but if OOM from the client, not sever side, then I don't think this is
Scan related, please check your client code.

2) we cannot add default value from HBase,  but you can add it on your
client when iterate the Result.

Also, you can using Phonex, this is cool for your scenario.
https://github.com/forcedotcom/phoenix



On Thu, Jun 27, 2013 at 3:11 PM, Vimal Jain <vk...@gmail.com> wrote:

> Hi,
> I am trying to export from hbase to a CSV file.
> I am using "Scan" class to scan all data  in the table.
> But i am facing some problems while doing it.
>
> 1) My table has around 1.5 million rows  and around 150 columns for each
> row , so i can not use default scan() constructor as it will scan whole
> table in one go which results in OutOfMemory error in client process.I
> heard of using setCaching() and setBatch() but i am not able to understand
> how it will solve OOM error.
>
> I thought of providing startRow and stopRow in scan object but i want to
> scan whole table so how will this help ?
>
> 2) As hbase stores data for a row only when we explicitly provide it and
> their is no concept of default value as found in RDBMS , i want to have
> each and evey column in the CSV file i generate for every user.In case
> column values are not there in hbase , i want to use default  values for
> them(I have list of default values for each column). Is there any method in
> Result class or any other class to accomplish this ?
>
>
> Please help here.
>
> --
> Thanks and Regards,
> Vimal Jain
>