You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Something Something <ma...@gmail.com> on 2011/02/03 07:01:21 UTC
Fastest way to read only the keys of a HTable?
I want to read only the keys in a table. I tried this...
try {
HTable table = new HTable("myTable");
Scan scan = new Scan();
scan.addFamily(Bytes.toBytes("Info"));
ResultScanner scanner = table.getScanner(scan);
Result result = scanner.next();
while (result != null) {
& so on...
This was performing fairly well until I added another Family that contains
lots of key/value pairs. My understanding was that adding another family
wouldn't affect performance of this code because I am explicitly using
"Info", but it is.
Anyway, in this particular use case, I only care about the "Key" of the row.
I don't need any values from any of the families. What's the best way to
do this?
Please let me know. Thanks.
Re: Fastest way to read only the keys of a HTable?
Posted by Something Something <ma...@gmail.com>.
Awesome! It's instantaneous now. Thanks a bunch. Any such tricks for code
that looks like this...
Get get = new Get(Bytes.toBytes(code));
Result result = table.get(get);
NavigableMap<byte[], byte[]> map =
result.getFamilyMap(Bytes.toBytes("Keys"));
if (map != null) {
for (Map.Entry<byte[], byte[]> entry : map.entrySet()) {
String key = Bytes.toString(entry.getValue());
Get get1 = new Get(Bytes.toBytes(key));
Result imp = table2.get(get1);
// Do something with the result...
}
}
Basically, I am reading the first table by a key (code). The "Keys" family
contains keys of some other table, so I get each key from that family and
retrieve row from the other table.
Thanks again.
On Thu, Feb 3, 2011 at 2:17 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
> On the scan, you can setCaching with the number of rows you want to
> pre-fetch per RPC. Setting it to 2 is already 2x better than the
> default.
>
> J-D
>
> On Thu, Feb 3, 2011 at 1:35 PM, Something Something
> <ma...@gmail.com> wrote:
> > After adding the following line:
> >
> > scan.addFamily(Bytes.toBytes("Info"));
> >
> > performance improved dramatically (Thank you both!). But now I want it
> to
> > perform even faster, if possible -:) To read 43 rows, it's taking 2
> > seconds. Eventually, the 'partner' table may have over 500 entries. I
> > guess, I will try by moving the recently added family to a different
> table.
> > Do you think that might help?
> >
> > Thanks again.
> >
> >
> > On Thu, Feb 3, 2011 at 12:15 PM, Jonathan Gray <jg...@fb.com> wrote:
> >
> >> If you only need to consider a single column family, use
> Scan.addFamily()
> >> on your scanner. Then there will be no impact of the other column
> families.
> >>
> >> > -----Original Message-----
> >> > From: Something Something [mailto:mailinglists19@gmail.com]
> >> > Sent: Thursday, February 03, 2011 11:28 AM
> >> > To: user@hbase.apache.org
> >> > Subject: Re: Fastest way to read only the keys of a HTable?
> >> >
> >> > Hmm.. performance hasn't improved at all. Do you see anything wrong
> with
> >> > the following code:
> >> >
> >> >
> >> > public List<Partner> getPartners() {
> >> > ArrayList<Partner> partners = new ArrayList<Partner>();
> >> >
> >> > try {
> >> > HTable table = new HTable("partner");
> >> > Scan scan = new Scan();
> >> > scan.setFilter(new FirstKeyOnlyFilter());
> >> > ResultScanner scanner = table.getScanner(scan);
> >> > Result result = scanner.next();
> >> > while (result != null) {
> >> > Partner partner = new
> >> > Partner(Bytes.toString(result.getRow()));
> >> > partners.add(partner);
> >> > result = scanner.next();
> >> > }
> >> > } catch (IOException e) {
> >> > throw new RuntimeException(e);
> >> > }
> >> > return partners;
> >> > }
> >> >
> >> > May be I shouldn't use more than one "column family" in a HTable - but
> >> the
> >> > BigTable paper recommends that, doesn't it? Please advice and thanks
> for
> >> > your help.
> >> >
> >> >
> >> >
> >> >
> >> > On Wed, Feb 2, 2011 at 10:55 PM, Stack <st...@duboce.net> wrote:
> >> >
> >> > > I don't see a getKey on Result. Use
> >> > >
> >> > >
> >> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result
> .
> >> > > html#getRow()
> >> > > .
> >> > >
> >> > > Here is how its used in the shell table.rb class:
> >> > >
> >> > > # Count rows in a table
> >> > > def count(interval = 1000, caching_rows = 10)
> >> > > # We can safely set scanner caching with the first key only
> filter
> >> > > scan = org.apache.hadoop.hbase.client.Scan.new
> >> > > scan.cache_blocks = false
> >> > > scan.caching = caching_rows
> >> > >
> >> > >
> scan.setFilter(org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new)
> >> > >
> >> > > # Run the scanner
> >> > > scanner = @table.getScanner(scan)
> >> > > count = 0
> >> > > iter = scanner.iterator
> >> > >
> >> > > # Iterate results
> >> > > while iter.hasNext
> >> > > row = iter.next
> >> > > count += 1
> >> > > next unless (block_given? && count % interval == 0)
> >> > > # Allow command modules to visualize counting process
> >> > > yield(count, String.from_java_bytes(row.getRow))
> >> > > end
> >> > >
> >> > > # Return the counter
> >> > > return count
> >> > > end
> >> > >
> >> > >
> >> > > St.Ack
> >> > >
> >> > > On Thu, Feb 3, 2011 at 6:47 AM, Something Something
> >> > > <ma...@gmail.com> wrote:
> >> > > > Thanks. So I will add this...
> >> > > >
> >> > > > scan.setFilter(new FirstKeyOnlyFilter());
> >> > > >
> >> > > > But after I do this...
> >> > > >
> >> > > > Result result = scanner.next();
> >> > > >
> >> > > > There's no... result.getKey() - so what method would give me the
> >> > > > Key
> >> > > value?
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Wed, Feb 2, 2011 at 10:20 PM, Stack <st...@duboce.net> wrote:
> >> > > >
> >> > > >> See
> >> > > >>
> >> > >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKe
> >> > > yOnlyFilter.html
> >> > > >> St.Ack
> >> > > >>
> >> > > >> On Thu, Feb 3, 2011 at 6:01 AM, Something Something
> >> > > >> <ma...@gmail.com> wrote:
> >> > > >> > I want to read only the keys in a table. I tried this...
> >> > > >> >
> >> > > >> > try {
> >> > > >> >
> >> > > >> > HTable table = new HTable("myTable");
> >> > > >> >
> >> > > >> > Scan scan = new Scan();
> >> > > >> >
> >> > > >> > scan.addFamily(Bytes.toBytes("Info"));
> >> > > >> >
> >> > > >> > ResultScanner scanner = table.getScanner(scan);
> >> > > >> >
> >> > > >> > Result result = scanner.next();
> >> > > >> >
> >> > > >> > while (result != null) {
> >> > > >> >
> >> > > >> > & so on...
> >> > > >> >
> >> > > >> > This was performing fairly well until I added another Family
> that
> >> > > >> contains
> >> > > >> > lots of key/value pairs. My understanding was that adding
> >> > > >> > another
> >> > > family
> >> > > >> > wouldn't affect performance of this code because I am
> explicitly
> >> > > >> > using "Info", but it is.
> >> > > >> >
> >> > > >> > Anyway, in this particular use case, I only care about the
> "Key"
> >> > > >> > of
> >> > > the
> >> > > >> row.
> >> > > >> > I don't need any values from any of the families. What's the
> >> > > >> > best
> >> > > way
> >> > > >> to
> >> > > >> > do this?
> >> > > >> >
> >> > > >> > Please let me know. Thanks.
> >> > > >> >
> >> > > >>
> >> > > >
> >> > >
> >>
> >
>
Re: Fastest way to read only the keys of a HTable?
Posted by Jean-Daniel Cryans <jd...@apache.org>.
On the scan, you can setCaching with the number of rows you want to
pre-fetch per RPC. Setting it to 2 is already 2x better than the
default.
J-D
On Thu, Feb 3, 2011 at 1:35 PM, Something Something
<ma...@gmail.com> wrote:
> After adding the following line:
>
> scan.addFamily(Bytes.toBytes("Info"));
>
> performance improved dramatically (Thank you both!). But now I want it to
> perform even faster, if possible -:) To read 43 rows, it's taking 2
> seconds. Eventually, the 'partner' table may have over 500 entries. I
> guess, I will try by moving the recently added family to a different table.
> Do you think that might help?
>
> Thanks again.
>
>
> On Thu, Feb 3, 2011 at 12:15 PM, Jonathan Gray <jg...@fb.com> wrote:
>
>> If you only need to consider a single column family, use Scan.addFamily()
>> on your scanner. Then there will be no impact of the other column families.
>>
>> > -----Original Message-----
>> > From: Something Something [mailto:mailinglists19@gmail.com]
>> > Sent: Thursday, February 03, 2011 11:28 AM
>> > To: user@hbase.apache.org
>> > Subject: Re: Fastest way to read only the keys of a HTable?
>> >
>> > Hmm.. performance hasn't improved at all. Do you see anything wrong with
>> > the following code:
>> >
>> >
>> > public List<Partner> getPartners() {
>> > ArrayList<Partner> partners = new ArrayList<Partner>();
>> >
>> > try {
>> > HTable table = new HTable("partner");
>> > Scan scan = new Scan();
>> > scan.setFilter(new FirstKeyOnlyFilter());
>> > ResultScanner scanner = table.getScanner(scan);
>> > Result result = scanner.next();
>> > while (result != null) {
>> > Partner partner = new
>> > Partner(Bytes.toString(result.getRow()));
>> > partners.add(partner);
>> > result = scanner.next();
>> > }
>> > } catch (IOException e) {
>> > throw new RuntimeException(e);
>> > }
>> > return partners;
>> > }
>> >
>> > May be I shouldn't use more than one "column family" in a HTable - but
>> the
>> > BigTable paper recommends that, doesn't it? Please advice and thanks for
>> > your help.
>> >
>> >
>> >
>> >
>> > On Wed, Feb 2, 2011 at 10:55 PM, Stack <st...@duboce.net> wrote:
>> >
>> > > I don't see a getKey on Result. Use
>> > >
>> > >
>> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.
>> > > html#getRow()
>> > > .
>> > >
>> > > Here is how its used in the shell table.rb class:
>> > >
>> > > # Count rows in a table
>> > > def count(interval = 1000, caching_rows = 10)
>> > > # We can safely set scanner caching with the first key only filter
>> > > scan = org.apache.hadoop.hbase.client.Scan.new
>> > > scan.cache_blocks = false
>> > > scan.caching = caching_rows
>> > >
>> > > scan.setFilter(org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new)
>> > >
>> > > # Run the scanner
>> > > scanner = @table.getScanner(scan)
>> > > count = 0
>> > > iter = scanner.iterator
>> > >
>> > > # Iterate results
>> > > while iter.hasNext
>> > > row = iter.next
>> > > count += 1
>> > > next unless (block_given? && count % interval == 0)
>> > > # Allow command modules to visualize counting process
>> > > yield(count, String.from_java_bytes(row.getRow))
>> > > end
>> > >
>> > > # Return the counter
>> > > return count
>> > > end
>> > >
>> > >
>> > > St.Ack
>> > >
>> > > On Thu, Feb 3, 2011 at 6:47 AM, Something Something
>> > > <ma...@gmail.com> wrote:
>> > > > Thanks. So I will add this...
>> > > >
>> > > > scan.setFilter(new FirstKeyOnlyFilter());
>> > > >
>> > > > But after I do this...
>> > > >
>> > > > Result result = scanner.next();
>> > > >
>> > > > There's no... result.getKey() - so what method would give me the
>> > > > Key
>> > > value?
>> > > >
>> > > >
>> > > >
>> > > > On Wed, Feb 2, 2011 at 10:20 PM, Stack <st...@duboce.net> wrote:
>> > > >
>> > > >> See
>> > > >>
>> > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKe
>> > > yOnlyFilter.html
>> > > >> St.Ack
>> > > >>
>> > > >> On Thu, Feb 3, 2011 at 6:01 AM, Something Something
>> > > >> <ma...@gmail.com> wrote:
>> > > >> > I want to read only the keys in a table. I tried this...
>> > > >> >
>> > > >> > try {
>> > > >> >
>> > > >> > HTable table = new HTable("myTable");
>> > > >> >
>> > > >> > Scan scan = new Scan();
>> > > >> >
>> > > >> > scan.addFamily(Bytes.toBytes("Info"));
>> > > >> >
>> > > >> > ResultScanner scanner = table.getScanner(scan);
>> > > >> >
>> > > >> > Result result = scanner.next();
>> > > >> >
>> > > >> > while (result != null) {
>> > > >> >
>> > > >> > & so on...
>> > > >> >
>> > > >> > This was performing fairly well until I added another Family that
>> > > >> contains
>> > > >> > lots of key/value pairs. My understanding was that adding
>> > > >> > another
>> > > family
>> > > >> > wouldn't affect performance of this code because I am explicitly
>> > > >> > using "Info", but it is.
>> > > >> >
>> > > >> > Anyway, in this particular use case, I only care about the "Key"
>> > > >> > of
>> > > the
>> > > >> row.
>> > > >> > I don't need any values from any of the families. What's the
>> > > >> > best
>> > > way
>> > > >> to
>> > > >> > do this?
>> > > >> >
>> > > >> > Please let me know. Thanks.
>> > > >> >
>> > > >>
>> > > >
>> > >
>>
>
Re: Fastest way to read only the keys of a HTable?
Posted by Something Something <ma...@gmail.com>.
After adding the following line:
scan.addFamily(Bytes.toBytes("Info"));
performance improved dramatically (Thank you both!). But now I want it to
perform even faster, if possible -:) To read 43 rows, it's taking 2
seconds. Eventually, the 'partner' table may have over 500 entries. I
guess, I will try by moving the recently added family to a different table.
Do you think that might help?
Thanks again.
On Thu, Feb 3, 2011 at 12:15 PM, Jonathan Gray <jg...@fb.com> wrote:
> If you only need to consider a single column family, use Scan.addFamily()
> on your scanner. Then there will be no impact of the other column families.
>
> > -----Original Message-----
> > From: Something Something [mailto:mailinglists19@gmail.com]
> > Sent: Thursday, February 03, 2011 11:28 AM
> > To: user@hbase.apache.org
> > Subject: Re: Fastest way to read only the keys of a HTable?
> >
> > Hmm.. performance hasn't improved at all. Do you see anything wrong with
> > the following code:
> >
> >
> > public List<Partner> getPartners() {
> > ArrayList<Partner> partners = new ArrayList<Partner>();
> >
> > try {
> > HTable table = new HTable("partner");
> > Scan scan = new Scan();
> > scan.setFilter(new FirstKeyOnlyFilter());
> > ResultScanner scanner = table.getScanner(scan);
> > Result result = scanner.next();
> > while (result != null) {
> > Partner partner = new
> > Partner(Bytes.toString(result.getRow()));
> > partners.add(partner);
> > result = scanner.next();
> > }
> > } catch (IOException e) {
> > throw new RuntimeException(e);
> > }
> > return partners;
> > }
> >
> > May be I shouldn't use more than one "column family" in a HTable - but
> the
> > BigTable paper recommends that, doesn't it? Please advice and thanks for
> > your help.
> >
> >
> >
> >
> > On Wed, Feb 2, 2011 at 10:55 PM, Stack <st...@duboce.net> wrote:
> >
> > > I don't see a getKey on Result. Use
> > >
> > >
> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.
> > > html#getRow()
> > > .
> > >
> > > Here is how its used in the shell table.rb class:
> > >
> > > # Count rows in a table
> > > def count(interval = 1000, caching_rows = 10)
> > > # We can safely set scanner caching with the first key only filter
> > > scan = org.apache.hadoop.hbase.client.Scan.new
> > > scan.cache_blocks = false
> > > scan.caching = caching_rows
> > >
> > > scan.setFilter(org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new)
> > >
> > > # Run the scanner
> > > scanner = @table.getScanner(scan)
> > > count = 0
> > > iter = scanner.iterator
> > >
> > > # Iterate results
> > > while iter.hasNext
> > > row = iter.next
> > > count += 1
> > > next unless (block_given? && count % interval == 0)
> > > # Allow command modules to visualize counting process
> > > yield(count, String.from_java_bytes(row.getRow))
> > > end
> > >
> > > # Return the counter
> > > return count
> > > end
> > >
> > >
> > > St.Ack
> > >
> > > On Thu, Feb 3, 2011 at 6:47 AM, Something Something
> > > <ma...@gmail.com> wrote:
> > > > Thanks. So I will add this...
> > > >
> > > > scan.setFilter(new FirstKeyOnlyFilter());
> > > >
> > > > But after I do this...
> > > >
> > > > Result result = scanner.next();
> > > >
> > > > There's no... result.getKey() - so what method would give me the
> > > > Key
> > > value?
> > > >
> > > >
> > > >
> > > > On Wed, Feb 2, 2011 at 10:20 PM, Stack <st...@duboce.net> wrote:
> > > >
> > > >> See
> > > >>
> > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKe
> > > yOnlyFilter.html
> > > >> St.Ack
> > > >>
> > > >> On Thu, Feb 3, 2011 at 6:01 AM, Something Something
> > > >> <ma...@gmail.com> wrote:
> > > >> > I want to read only the keys in a table. I tried this...
> > > >> >
> > > >> > try {
> > > >> >
> > > >> > HTable table = new HTable("myTable");
> > > >> >
> > > >> > Scan scan = new Scan();
> > > >> >
> > > >> > scan.addFamily(Bytes.toBytes("Info"));
> > > >> >
> > > >> > ResultScanner scanner = table.getScanner(scan);
> > > >> >
> > > >> > Result result = scanner.next();
> > > >> >
> > > >> > while (result != null) {
> > > >> >
> > > >> > & so on...
> > > >> >
> > > >> > This was performing fairly well until I added another Family that
> > > >> contains
> > > >> > lots of key/value pairs. My understanding was that adding
> > > >> > another
> > > family
> > > >> > wouldn't affect performance of this code because I am explicitly
> > > >> > using "Info", but it is.
> > > >> >
> > > >> > Anyway, in this particular use case, I only care about the "Key"
> > > >> > of
> > > the
> > > >> row.
> > > >> > I don't need any values from any of the families. What's the
> > > >> > best
> > > way
> > > >> to
> > > >> > do this?
> > > >> >
> > > >> > Please let me know. Thanks.
> > > >> >
> > > >>
> > > >
> > >
>
RE: Fastest way to read only the keys of a HTable?
Posted by Jonathan Gray <jg...@fb.com>.
If you only need to consider a single column family, use Scan.addFamily() on your scanner. Then there will be no impact of the other column families.
> -----Original Message-----
> From: Something Something [mailto:mailinglists19@gmail.com]
> Sent: Thursday, February 03, 2011 11:28 AM
> To: user@hbase.apache.org
> Subject: Re: Fastest way to read only the keys of a HTable?
>
> Hmm.. performance hasn't improved at all. Do you see anything wrong with
> the following code:
>
>
> public List<Partner> getPartners() {
> ArrayList<Partner> partners = new ArrayList<Partner>();
>
> try {
> HTable table = new HTable("partner");
> Scan scan = new Scan();
> scan.setFilter(new FirstKeyOnlyFilter());
> ResultScanner scanner = table.getScanner(scan);
> Result result = scanner.next();
> while (result != null) {
> Partner partner = new
> Partner(Bytes.toString(result.getRow()));
> partners.add(partner);
> result = scanner.next();
> }
> } catch (IOException e) {
> throw new RuntimeException(e);
> }
> return partners;
> }
>
> May be I shouldn't use more than one "column family" in a HTable - but the
> BigTable paper recommends that, doesn't it? Please advice and thanks for
> your help.
>
>
>
>
> On Wed, Feb 2, 2011 at 10:55 PM, Stack <st...@duboce.net> wrote:
>
> > I don't see a getKey on Result. Use
> >
> >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.
> > html#getRow()
> > .
> >
> > Here is how its used in the shell table.rb class:
> >
> > # Count rows in a table
> > def count(interval = 1000, caching_rows = 10)
> > # We can safely set scanner caching with the first key only filter
> > scan = org.apache.hadoop.hbase.client.Scan.new
> > scan.cache_blocks = false
> > scan.caching = caching_rows
> >
> > scan.setFilter(org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new)
> >
> > # Run the scanner
> > scanner = @table.getScanner(scan)
> > count = 0
> > iter = scanner.iterator
> >
> > # Iterate results
> > while iter.hasNext
> > row = iter.next
> > count += 1
> > next unless (block_given? && count % interval == 0)
> > # Allow command modules to visualize counting process
> > yield(count, String.from_java_bytes(row.getRow))
> > end
> >
> > # Return the counter
> > return count
> > end
> >
> >
> > St.Ack
> >
> > On Thu, Feb 3, 2011 at 6:47 AM, Something Something
> > <ma...@gmail.com> wrote:
> > > Thanks. So I will add this...
> > >
> > > scan.setFilter(new FirstKeyOnlyFilter());
> > >
> > > But after I do this...
> > >
> > > Result result = scanner.next();
> > >
> > > There's no... result.getKey() - so what method would give me the
> > > Key
> > value?
> > >
> > >
> > >
> > > On Wed, Feb 2, 2011 at 10:20 PM, Stack <st...@duboce.net> wrote:
> > >
> > >> See
> > >>
> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKe
> > yOnlyFilter.html
> > >> St.Ack
> > >>
> > >> On Thu, Feb 3, 2011 at 6:01 AM, Something Something
> > >> <ma...@gmail.com> wrote:
> > >> > I want to read only the keys in a table. I tried this...
> > >> >
> > >> > try {
> > >> >
> > >> > HTable table = new HTable("myTable");
> > >> >
> > >> > Scan scan = new Scan();
> > >> >
> > >> > scan.addFamily(Bytes.toBytes("Info"));
> > >> >
> > >> > ResultScanner scanner = table.getScanner(scan);
> > >> >
> > >> > Result result = scanner.next();
> > >> >
> > >> > while (result != null) {
> > >> >
> > >> > & so on...
> > >> >
> > >> > This was performing fairly well until I added another Family that
> > >> contains
> > >> > lots of key/value pairs. My understanding was that adding
> > >> > another
> > family
> > >> > wouldn't affect performance of this code because I am explicitly
> > >> > using "Info", but it is.
> > >> >
> > >> > Anyway, in this particular use case, I only care about the "Key"
> > >> > of
> > the
> > >> row.
> > >> > I don't need any values from any of the families. What's the
> > >> > best
> > way
> > >> to
> > >> > do this?
> > >> >
> > >> > Please let me know. Thanks.
> > >> >
> > >>
> > >
> >
Re: Fastest way to read only the keys of a HTable?
Posted by Something Something <ma...@gmail.com>.
Hmm.. performance hasn't improved at all. Do you see anything wrong with
the following code:
public List<Partner> getPartners() {
ArrayList<Partner> partners = new ArrayList<Partner>();
try {
HTable table = new HTable("partner");
Scan scan = new Scan();
scan.setFilter(new FirstKeyOnlyFilter());
ResultScanner scanner = table.getScanner(scan);
Result result = scanner.next();
while (result != null) {
Partner partner = new
Partner(Bytes.toString(result.getRow()));
partners.add(partner);
result = scanner.next();
}
} catch (IOException e) {
throw new RuntimeException(e);
}
return partners;
}
May be I shouldn't use more than one "column family" in a HTable - but the
BigTable paper recommends that, doesn't it? Please advice and thanks for
your help.
On Wed, Feb 2, 2011 at 10:55 PM, Stack <st...@duboce.net> wrote:
> I don't see a getKey on Result. Use
>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html#getRow()
> .
>
> Here is how its used in the shell table.rb class:
>
> # Count rows in a table
> def count(interval = 1000, caching_rows = 10)
> # We can safely set scanner caching with the first key only filter
> scan = org.apache.hadoop.hbase.client.Scan.new
> scan.cache_blocks = false
> scan.caching = caching_rows
> scan.setFilter(org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new)
>
> # Run the scanner
> scanner = @table.getScanner(scan)
> count = 0
> iter = scanner.iterator
>
> # Iterate results
> while iter.hasNext
> row = iter.next
> count += 1
> next unless (block_given? && count % interval == 0)
> # Allow command modules to visualize counting process
> yield(count, String.from_java_bytes(row.getRow))
> end
>
> # Return the counter
> return count
> end
>
>
> St.Ack
>
> On Thu, Feb 3, 2011 at 6:47 AM, Something Something
> <ma...@gmail.com> wrote:
> > Thanks. So I will add this...
> >
> > scan.setFilter(new FirstKeyOnlyFilter());
> >
> > But after I do this...
> >
> > Result result = scanner.next();
> >
> > There's no... result.getKey() - so what method would give me the Key
> value?
> >
> >
> >
> > On Wed, Feb 2, 2011 at 10:20 PM, Stack <st...@duboce.net> wrote:
> >
> >> See
> >>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html
> >> St.Ack
> >>
> >> On Thu, Feb 3, 2011 at 6:01 AM, Something Something
> >> <ma...@gmail.com> wrote:
> >> > I want to read only the keys in a table. I tried this...
> >> >
> >> > try {
> >> >
> >> > HTable table = new HTable("myTable");
> >> >
> >> > Scan scan = new Scan();
> >> >
> >> > scan.addFamily(Bytes.toBytes("Info"));
> >> >
> >> > ResultScanner scanner = table.getScanner(scan);
> >> >
> >> > Result result = scanner.next();
> >> >
> >> > while (result != null) {
> >> >
> >> > & so on...
> >> >
> >> > This was performing fairly well until I added another Family that
> >> contains
> >> > lots of key/value pairs. My understanding was that adding another
> family
> >> > wouldn't affect performance of this code because I am explicitly using
> >> > "Info", but it is.
> >> >
> >> > Anyway, in this particular use case, I only care about the "Key" of
> the
> >> row.
> >> > I don't need any values from any of the families. What's the best
> way
> >> to
> >> > do this?
> >> >
> >> > Please let me know. Thanks.
> >> >
> >>
> >
>
Re: Fastest way to read only the keys of a HTable?
Posted by Stack <st...@duboce.net>.
I don't see a getKey on Result. Use
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html#getRow().
Here is how its used in the shell table.rb class:
# Count rows in a table
def count(interval = 1000, caching_rows = 10)
# We can safely set scanner caching with the first key only filter
scan = org.apache.hadoop.hbase.client.Scan.new
scan.cache_blocks = false
scan.caching = caching_rows
scan.setFilter(org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new)
# Run the scanner
scanner = @table.getScanner(scan)
count = 0
iter = scanner.iterator
# Iterate results
while iter.hasNext
row = iter.next
count += 1
next unless (block_given? && count % interval == 0)
# Allow command modules to visualize counting process
yield(count, String.from_java_bytes(row.getRow))
end
# Return the counter
return count
end
St.Ack
On Thu, Feb 3, 2011 at 6:47 AM, Something Something
<ma...@gmail.com> wrote:
> Thanks. So I will add this...
>
> scan.setFilter(new FirstKeyOnlyFilter());
>
> But after I do this...
>
> Result result = scanner.next();
>
> There's no... result.getKey() - so what method would give me the Key value?
>
>
>
> On Wed, Feb 2, 2011 at 10:20 PM, Stack <st...@duboce.net> wrote:
>
>> See
>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html
>> St.Ack
>>
>> On Thu, Feb 3, 2011 at 6:01 AM, Something Something
>> <ma...@gmail.com> wrote:
>> > I want to read only the keys in a table. I tried this...
>> >
>> > try {
>> >
>> > HTable table = new HTable("myTable");
>> >
>> > Scan scan = new Scan();
>> >
>> > scan.addFamily(Bytes.toBytes("Info"));
>> >
>> > ResultScanner scanner = table.getScanner(scan);
>> >
>> > Result result = scanner.next();
>> >
>> > while (result != null) {
>> >
>> > & so on...
>> >
>> > This was performing fairly well until I added another Family that
>> contains
>> > lots of key/value pairs. My understanding was that adding another family
>> > wouldn't affect performance of this code because I am explicitly using
>> > "Info", but it is.
>> >
>> > Anyway, in this particular use case, I only care about the "Key" of the
>> row.
>> > I don't need any values from any of the families. What's the best way
>> to
>> > do this?
>> >
>> > Please let me know. Thanks.
>> >
>>
>
Re: Fastest way to read only the keys of a HTable?
Posted by Something Something <ma...@gmail.com>.
Thanks. So I will add this...
scan.setFilter(new FirstKeyOnlyFilter());
But after I do this...
Result result = scanner.next();
There's no... result.getKey() - so what method would give me the Key value?
On Wed, Feb 2, 2011 at 10:20 PM, Stack <st...@duboce.net> wrote:
> See
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html
> St.Ack
>
> On Thu, Feb 3, 2011 at 6:01 AM, Something Something
> <ma...@gmail.com> wrote:
> > I want to read only the keys in a table. I tried this...
> >
> > try {
> >
> > HTable table = new HTable("myTable");
> >
> > Scan scan = new Scan();
> >
> > scan.addFamily(Bytes.toBytes("Info"));
> >
> > ResultScanner scanner = table.getScanner(scan);
> >
> > Result result = scanner.next();
> >
> > while (result != null) {
> >
> > & so on...
> >
> > This was performing fairly well until I added another Family that
> contains
> > lots of key/value pairs. My understanding was that adding another family
> > wouldn't affect performance of this code because I am explicitly using
> > "Info", but it is.
> >
> > Anyway, in this particular use case, I only care about the "Key" of the
> row.
> > I don't need any values from any of the families. What's the best way
> to
> > do this?
> >
> > Please let me know. Thanks.
> >
>
Re: Fastest way to read only the keys of a HTable?
Posted by Stack <st...@duboce.net>.
See http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html
St.Ack
On Thu, Feb 3, 2011 at 6:01 AM, Something Something
<ma...@gmail.com> wrote:
> I want to read only the keys in a table. I tried this...
>
> try {
>
> HTable table = new HTable("myTable");
>
> Scan scan = new Scan();
>
> scan.addFamily(Bytes.toBytes("Info"));
>
> ResultScanner scanner = table.getScanner(scan);
>
> Result result = scanner.next();
>
> while (result != null) {
>
> & so on...
>
> This was performing fairly well until I added another Family that contains
> lots of key/value pairs. My understanding was that adding another family
> wouldn't affect performance of this code because I am explicitly using
> "Info", but it is.
>
> Anyway, in this particular use case, I only care about the "Key" of the row.
> I don't need any values from any of the families. What's the best way to
> do this?
>
> Please let me know. Thanks.
>