You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jean-Daniel Cryans <jd...@apache.org> on 2011/01/04 00:31:13 UTC

Re: batch reads of columns?

I would be tempted to get a taller table table instead of a very very
wide one, scanning a lot of rows is often easier to use when
manipulating millions of cells instead of a single Get.

J-D

On Mon, Dec 27, 2010 at 10:12 PM, Hiller, Dean  (Contractor)
<de...@broadridge.com> wrote:
> I am about to do a bunch of Puts with
>
>
>
> int lastcolVal = //get count of columns somehow I think;  (How do I get
> the column count of a column family from a certain row?)
>
> for(int j = 0; j < 10; j++) {
>
>    Put put = new Put("activities", lastcolVal, activityId[j]);
>
>    context.write(accountNo, put);
>
> }
>
>
>
> I am looking at the source code of Get.java and trying to read in 100
> columns, then process, discard, read in next 100 records, process,
> etc.(ie. Batching like in hibernate so I don't blow up the memory).  I
> guess I could read in one at a time...is that expensive(I would tend to
> think so for very large sets)?
>
>
>
> If I have an account which has activity_id's as columns and I could have
> let's say 2 billion activities on one account, is there a way to batch
> read in the columns from the column family so I don't blow up the
> memory?  (ie. Let's say 4 gig RAM and I think 2 billion ints would be
> about 8 gig)
>
>
>
> To be honest, that for loop is a little of a lie....as we get activites,
> we actually will need to insert them so that they are in order by some
> kind of date...I am not sure how I am going to do that yet(I definitely
> don't want to grab 1 billion ids and sort them each time we reprocess).
>
>
>
> Thanks,
>
> Dean
>
>
> This message and any attachments are intended only for the use of the addressee and
> may contain information that is privileged and confidential. If the reader of the
> message is not the intended recipient or an authorized representative of the
> intended recipient, you are hereby notified that any dissemination of this
> communication is strictly prohibited. If you have received this communication in
> error, please notify us immediately by e-mail and delete the message and any
> attachments from your system.
>
>

Re: batch reads of columns?

Posted by Jean-Daniel Cryans <jd...@apache.org>.
That would be ways to do it yeah, definitely try it out.

J-D

On Mon, Jan 10, 2011 at 11:27 AM, Hiller, Dean  (Contractor)
<de...@broadridge.com> wrote:
> Oh, so basically I can do foreign keys in two ways then
>
> 1. account1 =
> {column name="acc1", column fk1="activity1", column fk2="activity2", etc. etc}
>
> 2. Or I could basically do
> Account1-fk1= {column fk="activity1"}
> Account1-fk2= {column fk="activity2"}
> Etc. etc.
>
> Correct?
>
> Is there another way to represent relationships that I might be missing or does it basically all boil down to those two strategies?
>
> Thanks,
> Dean
>
>
>
> -----Original Message-----
> From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
> Sent: Monday, January 03, 2011 4:31 PM
> To: user@hbase.apache.org
> Subject: Re: batch reads of columns?
>
> I would be tempted to get a taller table table instead of a very very
> wide one, scanning a lot of rows is often easier to use when
> manipulating millions of cells instead of a single Get.
>
> J-D
>
> On Mon, Dec 27, 2010 at 10:12 PM, Hiller, Dean  (Contractor)
> <de...@broadridge.com> wrote:
>> I am about to do a bunch of Puts with
>>
>>
>>
>> int lastcolVal = //get count of columns somehow I think;  (How do I get
>> the column count of a column family from a certain row?)
>>
>> for(int j = 0; j < 10; j++) {
>>
>>    Put put = new Put("activities", lastcolVal, activityId[j]);
>>
>>    context.write(accountNo, put);
>>
>> }
>>
>>
>>
>> I am looking at the source code of Get.java and trying to read in 100
>> columns, then process, discard, read in next 100 records, process,
>> etc.(ie. Batching like in hibernate so I don't blow up the memory).  I
>> guess I could read in one at a time...is that expensive(I would tend to
>> think so for very large sets)?
>>
>>
>>
>> If I have an account which has activity_id's as columns and I could have
>> let's say 2 billion activities on one account, is there a way to batch
>> read in the columns from the column family so I don't blow up the
>> memory?  (ie. Let's say 4 gig RAM and I think 2 billion ints would be
>> about 8 gig)
>>
>>
>>
>> To be honest, that for loop is a little of a lie....as we get activites,
>> we actually will need to insert them so that they are in order by some
>> kind of date...I am not sure how I am going to do that yet(I definitely
>> don't want to grab 1 billion ids and sort them each time we reprocess).
>>
>>
>>
>> Thanks,
>>
>> Dean
>>
>>
>> This message and any attachments are intended only for the use of the addressee and
>> may contain information that is privileged and confidential. If the reader of the
>> message is not the intended recipient or an authorized representative of the
>> intended recipient, you are hereby notified that any dissemination of this
>> communication is strictly prohibited. If you have received this communication in
>> error, please notify us immediately by e-mail and delete the message and any
>> attachments from your system.
>>
>>
> This message and any attachments are intended only for the use of the addressee and
> may contain information that is privileged and confidential. If the reader of the
> message is not the intended recipient or an authorized representative of the
> intended recipient, you are hereby notified that any dissemination of this
> communication is strictly prohibited. If you have received this communication in
> error, please notify us immediately by e-mail and delete the message and any
> attachments from your system.
>
>

RE: batch reads of columns?

Posted by "Hiller, Dean (Contractor)" <de...@broadridge.com>.
Oh, so basically I can do foreign keys in two ways then

1. account1 = 
{column name="acc1", column fk1="activity1", column fk2="activity2", etc. etc}

2. Or I could basically do
Account1-fk1= {column fk="activity1"}
Account1-fk2= {column fk="activity2"}
Etc. etc.

Correct?

Is there another way to represent relationships that I might be missing or does it basically all boil down to those two strategies?  

Thanks,
Dean



-----Original Message-----
From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
Sent: Monday, January 03, 2011 4:31 PM
To: user@hbase.apache.org
Subject: Re: batch reads of columns?

I would be tempted to get a taller table table instead of a very very
wide one, scanning a lot of rows is often easier to use when
manipulating millions of cells instead of a single Get.

J-D

On Mon, Dec 27, 2010 at 10:12 PM, Hiller, Dean  (Contractor)
<de...@broadridge.com> wrote:
> I am about to do a bunch of Puts with
>
>
>
> int lastcolVal = //get count of columns somehow I think;  (How do I get
> the column count of a column family from a certain row?)
>
> for(int j = 0; j < 10; j++) {
>
>    Put put = new Put("activities", lastcolVal, activityId[j]);
>
>    context.write(accountNo, put);
>
> }
>
>
>
> I am looking at the source code of Get.java and trying to read in 100
> columns, then process, discard, read in next 100 records, process,
> etc.(ie. Batching like in hibernate so I don't blow up the memory).  I
> guess I could read in one at a time...is that expensive(I would tend to
> think so for very large sets)?
>
>
>
> If I have an account which has activity_id's as columns and I could have
> let's say 2 billion activities on one account, is there a way to batch
> read in the columns from the column family so I don't blow up the
> memory?  (ie. Let's say 4 gig RAM and I think 2 billion ints would be
> about 8 gig)
>
>
>
> To be honest, that for loop is a little of a lie....as we get activites,
> we actually will need to insert them so that they are in order by some
> kind of date...I am not sure how I am going to do that yet(I definitely
> don't want to grab 1 billion ids and sort them each time we reprocess).
>
>
>
> Thanks,
>
> Dean
>
>
> This message and any attachments are intended only for the use of the addressee and
> may contain information that is privileged and confidential. If the reader of the
> message is not the intended recipient or an authorized representative of the
> intended recipient, you are hereby notified that any dissemination of this
> communication is strictly prohibited. If you have received this communication in
> error, please notify us immediately by e-mail and delete the message and any
> attachments from your system.
>
>
This message and any attachments are intended only for the use of the addressee and
may contain information that is privileged and confidential. If the reader of the 
message is not the intended recipient or an authorized representative of the
intended recipient, you are hereby notified that any dissemination of this
communication is strictly prohibited. If you have received this communication in
error, please notify us immediately by e-mail and delete the message and any
attachments from your system.