You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Something Something <ma...@gmail.com> on 2011/02/03 07:01:21 UTC

Fastest way to read only the keys of a HTable?

I want to read only the keys in a table. I tried this...

    try {

 HTable table = new HTable("myTable");

 Scan scan = new Scan();

 scan.addFamily(Bytes.toBytes("Info"));

 ResultScanner scanner = table.getScanner(scan);

   Result result = scanner.next();

 while (result != null) {

& so on...

This was performing fairly well until I added another Family that contains
lots of key/value pairs.  My understanding was that adding another family
wouldn't affect performance of this code because I am explicitly using
"Info", but it is.

Anyway, in this particular use case, I only care about the "Key" of the row.
 I don't need any values from any of the families.  What's the best way to
do this?

Please let me know.  Thanks.

Re: Fastest way to read only the keys of a HTable?

Posted by Something Something <ma...@gmail.com>.
Awesome!  It's instantaneous now.  Thanks a bunch.  Any such tricks for code
that looks like this...

      Get get = new Get(Bytes.toBytes(code));
      Result result = table.get(get);
      NavigableMap<byte[], byte[]> map =
result.getFamilyMap(Bytes.toBytes("Keys"));
      if (map != null) {
        for (Map.Entry<byte[], byte[]> entry : map.entrySet()) {
          String key = Bytes.toString(entry.getValue());
          Get get1 = new Get(Bytes.toBytes(key));
          Result imp = table2.get(get1);
          // Do something with the result...
        }
      }

Basically, I am reading the first table by a key (code).  The "Keys" family
contains keys of some other table, so I get each key from that family and
retrieve row from the other table.

Thanks again.

On Thu, Feb 3, 2011 at 2:17 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> On the scan, you can setCaching with the number of rows you want to
> pre-fetch per RPC. Setting it to 2 is already 2x better than the
> default.
>
> J-D
>
> On Thu, Feb 3, 2011 at 1:35 PM, Something Something
> <ma...@gmail.com> wrote:
> > After adding the following line:
> >
> > scan.addFamily(Bytes.toBytes("Info"));
> >
> > performance improved dramatically (Thank you both!).  But now I want it
> to
> > perform even faster, if possible -:)  To read 43 rows, it's taking 2
> > seconds.  Eventually, the 'partner' table may have over 500 entries.  I
> > guess, I will try by moving the recently added family to a different
> table.
> >  Do you think that might help?
> >
> > Thanks again.
> >
> >
> > On Thu, Feb 3, 2011 at 12:15 PM, Jonathan Gray <jg...@fb.com> wrote:
> >
> >> If you only need to consider a single column family, use
> Scan.addFamily()
> >> on your scanner.  Then there will be no impact of the other column
> families.
> >>
> >> > -----Original Message-----
> >> > From: Something Something [mailto:mailinglists19@gmail.com]
> >> > Sent: Thursday, February 03, 2011 11:28 AM
> >> > To: user@hbase.apache.org
> >> > Subject: Re: Fastest way to read only the keys of a HTable?
> >> >
> >> > Hmm.. performance hasn't improved at all.  Do you see anything wrong
> with
> >> > the following code:
> >> >
> >> >
> >> >     public List<Partner> getPartners() {
> >> >       ArrayList<Partner> partners = new ArrayList<Partner>();
> >> >
> >> >       try {
> >> >           HTable table = new HTable("partner");
> >> >           Scan scan = new Scan();
> >> >           scan.setFilter(new FirstKeyOnlyFilter());
> >> >           ResultScanner scanner = table.getScanner(scan);
> >> >           Result result = scanner.next();
> >> >           while (result != null) {
> >> >               Partner partner = new
> >> > Partner(Bytes.toString(result.getRow()));
> >> >               partners.add(partner);
> >> >               result = scanner.next();
> >> >           }
> >> >       } catch (IOException e) {
> >> >           throw new RuntimeException(e);
> >> >       }
> >> >       return partners;
> >> >   }
> >> >
> >> > May be I shouldn't use more than one "column family" in a HTable - but
> >> the
> >> > BigTable paper recommends that, doesn't it?  Please advice and thanks
> for
> >> > your help.
> >> >
> >> >
> >> >
> >> >
> >> > On Wed, Feb 2, 2011 at 10:55 PM, Stack <st...@duboce.net> wrote:
> >> >
> >> > > I don't see a getKey on Result.  Use
> >> > >
> >> > >
> >> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result
> .
> >> > > html#getRow()
> >> > > .
> >> > >
> >> > > Here is how its used in the shell table.rb class:
> >> > >
> >> > >    # Count rows in a table
> >> > >    def count(interval = 1000, caching_rows = 10)
> >> > >      # We can safely set scanner caching with the first key only
> filter
> >> > >      scan = org.apache.hadoop.hbase.client.Scan.new
> >> > >      scan.cache_blocks = false
> >> > >      scan.caching = caching_rows
> >> > >
> >> > >
> scan.setFilter(org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new)
> >> > >
> >> > >      # Run the scanner
> >> > >      scanner = @table.getScanner(scan)
> >> > >      count = 0
> >> > >      iter = scanner.iterator
> >> > >
> >> > >      # Iterate results
> >> > >      while iter.hasNext
> >> > >        row = iter.next
> >> > >        count += 1
> >> > >        next unless (block_given? && count % interval == 0)
> >> > >        # Allow command modules to visualize counting process
> >> > >        yield(count, String.from_java_bytes(row.getRow))
> >> > >      end
> >> > >
> >> > >      # Return the counter
> >> > >      return count
> >> > >    end
> >> > >
> >> > >
> >> > > St.Ack
> >> > >
> >> > > On Thu, Feb 3, 2011 at 6:47 AM, Something Something
> >> > > <ma...@gmail.com> wrote:
> >> > > > Thanks.  So I will add this...
> >> > > >
> >> > > >   scan.setFilter(new FirstKeyOnlyFilter());
> >> > > >
> >> > > > But after I do this...
> >> > > >
> >> > > >   Result result = scanner.next();
> >> > > >
> >> > > > There's no...  result.getKey() - so what method would give me the
> >> > > > Key
> >> > > value?
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Wed, Feb 2, 2011 at 10:20 PM, Stack <st...@duboce.net> wrote:
> >> > > >
> >> > > >> See
> >> > > >>
> >> > >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKe
> >> > > yOnlyFilter.html
> >> > > >> St.Ack
> >> > > >>
> >> > > >> On Thu, Feb 3, 2011 at 6:01 AM, Something Something
> >> > > >> <ma...@gmail.com> wrote:
> >> > > >> > I want to read only the keys in a table. I tried this...
> >> > > >> >
> >> > > >> >    try {
> >> > > >> >
> >> > > >> >  HTable table = new HTable("myTable");
> >> > > >> >
> >> > > >> >  Scan scan = new Scan();
> >> > > >> >
> >> > > >> >  scan.addFamily(Bytes.toBytes("Info"));
> >> > > >> >
> >> > > >> >  ResultScanner scanner = table.getScanner(scan);
> >> > > >> >
> >> > > >> >   Result result = scanner.next();
> >> > > >> >
> >> > > >> >  while (result != null) {
> >> > > >> >
> >> > > >> > & so on...
> >> > > >> >
> >> > > >> > This was performing fairly well until I added another Family
> that
> >> > > >> contains
> >> > > >> > lots of key/value pairs.  My understanding was that adding
> >> > > >> > another
> >> > > family
> >> > > >> > wouldn't affect performance of this code because I am
> explicitly
> >> > > >> > using "Info", but it is.
> >> > > >> >
> >> > > >> > Anyway, in this particular use case, I only care about the
> "Key"
> >> > > >> > of
> >> > > the
> >> > > >> row.
> >> > > >> >  I don't need any values from any of the families.  What's the
> >> > > >> > best
> >> > > way
> >> > > >> to
> >> > > >> > do this?
> >> > > >> >
> >> > > >> > Please let me know.  Thanks.
> >> > > >> >
> >> > > >>
> >> > > >
> >> > >
> >>
> >
>

Re: Fastest way to read only the keys of a HTable?

Posted by Jean-Daniel Cryans <jd...@apache.org>.
On the scan, you can setCaching with the number of rows you want to
pre-fetch per RPC. Setting it to 2 is already 2x better than the
default.

J-D

On Thu, Feb 3, 2011 at 1:35 PM, Something Something
<ma...@gmail.com> wrote:
> After adding the following line:
>
> scan.addFamily(Bytes.toBytes("Info"));
>
> performance improved dramatically (Thank you both!).  But now I want it to
> perform even faster, if possible -:)  To read 43 rows, it's taking 2
> seconds.  Eventually, the 'partner' table may have over 500 entries.  I
> guess, I will try by moving the recently added family to a different table.
>  Do you think that might help?
>
> Thanks again.
>
>
> On Thu, Feb 3, 2011 at 12:15 PM, Jonathan Gray <jg...@fb.com> wrote:
>
>> If you only need to consider a single column family, use Scan.addFamily()
>> on your scanner.  Then there will be no impact of the other column families.
>>
>> > -----Original Message-----
>> > From: Something Something [mailto:mailinglists19@gmail.com]
>> > Sent: Thursday, February 03, 2011 11:28 AM
>> > To: user@hbase.apache.org
>> > Subject: Re: Fastest way to read only the keys of a HTable?
>> >
>> > Hmm.. performance hasn't improved at all.  Do you see anything wrong with
>> > the following code:
>> >
>> >
>> >     public List<Partner> getPartners() {
>> >       ArrayList<Partner> partners = new ArrayList<Partner>();
>> >
>> >       try {
>> >           HTable table = new HTable("partner");
>> >           Scan scan = new Scan();
>> >           scan.setFilter(new FirstKeyOnlyFilter());
>> >           ResultScanner scanner = table.getScanner(scan);
>> >           Result result = scanner.next();
>> >           while (result != null) {
>> >               Partner partner = new
>> > Partner(Bytes.toString(result.getRow()));
>> >               partners.add(partner);
>> >               result = scanner.next();
>> >           }
>> >       } catch (IOException e) {
>> >           throw new RuntimeException(e);
>> >       }
>> >       return partners;
>> >   }
>> >
>> > May be I shouldn't use more than one "column family" in a HTable - but
>> the
>> > BigTable paper recommends that, doesn't it?  Please advice and thanks for
>> > your help.
>> >
>> >
>> >
>> >
>> > On Wed, Feb 2, 2011 at 10:55 PM, Stack <st...@duboce.net> wrote:
>> >
>> > > I don't see a getKey on Result.  Use
>> > >
>> > >
>> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.
>> > > html#getRow()
>> > > .
>> > >
>> > > Here is how its used in the shell table.rb class:
>> > >
>> > >    # Count rows in a table
>> > >    def count(interval = 1000, caching_rows = 10)
>> > >      # We can safely set scanner caching with the first key only filter
>> > >      scan = org.apache.hadoop.hbase.client.Scan.new
>> > >      scan.cache_blocks = false
>> > >      scan.caching = caching_rows
>> > >
>> > > scan.setFilter(org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new)
>> > >
>> > >      # Run the scanner
>> > >      scanner = @table.getScanner(scan)
>> > >      count = 0
>> > >      iter = scanner.iterator
>> > >
>> > >      # Iterate results
>> > >      while iter.hasNext
>> > >        row = iter.next
>> > >        count += 1
>> > >        next unless (block_given? && count % interval == 0)
>> > >        # Allow command modules to visualize counting process
>> > >        yield(count, String.from_java_bytes(row.getRow))
>> > >      end
>> > >
>> > >      # Return the counter
>> > >      return count
>> > >    end
>> > >
>> > >
>> > > St.Ack
>> > >
>> > > On Thu, Feb 3, 2011 at 6:47 AM, Something Something
>> > > <ma...@gmail.com> wrote:
>> > > > Thanks.  So I will add this...
>> > > >
>> > > >   scan.setFilter(new FirstKeyOnlyFilter());
>> > > >
>> > > > But after I do this...
>> > > >
>> > > >   Result result = scanner.next();
>> > > >
>> > > > There's no...  result.getKey() - so what method would give me the
>> > > > Key
>> > > value?
>> > > >
>> > > >
>> > > >
>> > > > On Wed, Feb 2, 2011 at 10:20 PM, Stack <st...@duboce.net> wrote:
>> > > >
>> > > >> See
>> > > >>
>> > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKe
>> > > yOnlyFilter.html
>> > > >> St.Ack
>> > > >>
>> > > >> On Thu, Feb 3, 2011 at 6:01 AM, Something Something
>> > > >> <ma...@gmail.com> wrote:
>> > > >> > I want to read only the keys in a table. I tried this...
>> > > >> >
>> > > >> >    try {
>> > > >> >
>> > > >> >  HTable table = new HTable("myTable");
>> > > >> >
>> > > >> >  Scan scan = new Scan();
>> > > >> >
>> > > >> >  scan.addFamily(Bytes.toBytes("Info"));
>> > > >> >
>> > > >> >  ResultScanner scanner = table.getScanner(scan);
>> > > >> >
>> > > >> >   Result result = scanner.next();
>> > > >> >
>> > > >> >  while (result != null) {
>> > > >> >
>> > > >> > & so on...
>> > > >> >
>> > > >> > This was performing fairly well until I added another Family that
>> > > >> contains
>> > > >> > lots of key/value pairs.  My understanding was that adding
>> > > >> > another
>> > > family
>> > > >> > wouldn't affect performance of this code because I am explicitly
>> > > >> > using "Info", but it is.
>> > > >> >
>> > > >> > Anyway, in this particular use case, I only care about the "Key"
>> > > >> > of
>> > > the
>> > > >> row.
>> > > >> >  I don't need any values from any of the families.  What's the
>> > > >> > best
>> > > way
>> > > >> to
>> > > >> > do this?
>> > > >> >
>> > > >> > Please let me know.  Thanks.
>> > > >> >
>> > > >>
>> > > >
>> > >
>>
>

Re: Fastest way to read only the keys of a HTable?

Posted by Something Something <ma...@gmail.com>.
After adding the following line:

scan.addFamily(Bytes.toBytes("Info"));

performance improved dramatically (Thank you both!).  But now I want it to
perform even faster, if possible -:)  To read 43 rows, it's taking 2
seconds.  Eventually, the 'partner' table may have over 500 entries.  I
guess, I will try by moving the recently added family to a different table.
 Do you think that might help?

Thanks again.


On Thu, Feb 3, 2011 at 12:15 PM, Jonathan Gray <jg...@fb.com> wrote:

> If you only need to consider a single column family, use Scan.addFamily()
> on your scanner.  Then there will be no impact of the other column families.
>
> > -----Original Message-----
> > From: Something Something [mailto:mailinglists19@gmail.com]
> > Sent: Thursday, February 03, 2011 11:28 AM
> > To: user@hbase.apache.org
> > Subject: Re: Fastest way to read only the keys of a HTable?
> >
> > Hmm.. performance hasn't improved at all.  Do you see anything wrong with
> > the following code:
> >
> >
> >     public List<Partner> getPartners() {
> >       ArrayList<Partner> partners = new ArrayList<Partner>();
> >
> >       try {
> >           HTable table = new HTable("partner");
> >           Scan scan = new Scan();
> >           scan.setFilter(new FirstKeyOnlyFilter());
> >           ResultScanner scanner = table.getScanner(scan);
> >           Result result = scanner.next();
> >           while (result != null) {
> >               Partner partner = new
> > Partner(Bytes.toString(result.getRow()));
> >               partners.add(partner);
> >               result = scanner.next();
> >           }
> >       } catch (IOException e) {
> >           throw new RuntimeException(e);
> >       }
> >       return partners;
> >   }
> >
> > May be I shouldn't use more than one "column family" in a HTable - but
> the
> > BigTable paper recommends that, doesn't it?  Please advice and thanks for
> > your help.
> >
> >
> >
> >
> > On Wed, Feb 2, 2011 at 10:55 PM, Stack <st...@duboce.net> wrote:
> >
> > > I don't see a getKey on Result.  Use
> > >
> > >
> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.
> > > html#getRow()
> > > .
> > >
> > > Here is how its used in the shell table.rb class:
> > >
> > >    # Count rows in a table
> > >    def count(interval = 1000, caching_rows = 10)
> > >      # We can safely set scanner caching with the first key only filter
> > >      scan = org.apache.hadoop.hbase.client.Scan.new
> > >      scan.cache_blocks = false
> > >      scan.caching = caching_rows
> > >
> > > scan.setFilter(org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new)
> > >
> > >      # Run the scanner
> > >      scanner = @table.getScanner(scan)
> > >      count = 0
> > >      iter = scanner.iterator
> > >
> > >      # Iterate results
> > >      while iter.hasNext
> > >        row = iter.next
> > >        count += 1
> > >        next unless (block_given? && count % interval == 0)
> > >        # Allow command modules to visualize counting process
> > >        yield(count, String.from_java_bytes(row.getRow))
> > >      end
> > >
> > >      # Return the counter
> > >      return count
> > >    end
> > >
> > >
> > > St.Ack
> > >
> > > On Thu, Feb 3, 2011 at 6:47 AM, Something Something
> > > <ma...@gmail.com> wrote:
> > > > Thanks.  So I will add this...
> > > >
> > > >   scan.setFilter(new FirstKeyOnlyFilter());
> > > >
> > > > But after I do this...
> > > >
> > > >   Result result = scanner.next();
> > > >
> > > > There's no...  result.getKey() - so what method would give me the
> > > > Key
> > > value?
> > > >
> > > >
> > > >
> > > > On Wed, Feb 2, 2011 at 10:20 PM, Stack <st...@duboce.net> wrote:
> > > >
> > > >> See
> > > >>
> > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKe
> > > yOnlyFilter.html
> > > >> St.Ack
> > > >>
> > > >> On Thu, Feb 3, 2011 at 6:01 AM, Something Something
> > > >> <ma...@gmail.com> wrote:
> > > >> > I want to read only the keys in a table. I tried this...
> > > >> >
> > > >> >    try {
> > > >> >
> > > >> >  HTable table = new HTable("myTable");
> > > >> >
> > > >> >  Scan scan = new Scan();
> > > >> >
> > > >> >  scan.addFamily(Bytes.toBytes("Info"));
> > > >> >
> > > >> >  ResultScanner scanner = table.getScanner(scan);
> > > >> >
> > > >> >   Result result = scanner.next();
> > > >> >
> > > >> >  while (result != null) {
> > > >> >
> > > >> > & so on...
> > > >> >
> > > >> > This was performing fairly well until I added another Family that
> > > >> contains
> > > >> > lots of key/value pairs.  My understanding was that adding
> > > >> > another
> > > family
> > > >> > wouldn't affect performance of this code because I am explicitly
> > > >> > using "Info", but it is.
> > > >> >
> > > >> > Anyway, in this particular use case, I only care about the "Key"
> > > >> > of
> > > the
> > > >> row.
> > > >> >  I don't need any values from any of the families.  What's the
> > > >> > best
> > > way
> > > >> to
> > > >> > do this?
> > > >> >
> > > >> > Please let me know.  Thanks.
> > > >> >
> > > >>
> > > >
> > >
>

RE: Fastest way to read only the keys of a HTable?

Posted by Jonathan Gray <jg...@fb.com>.
If you only need to consider a single column family, use Scan.addFamily() on your scanner.  Then there will be no impact of the other column families.

> -----Original Message-----
> From: Something Something [mailto:mailinglists19@gmail.com]
> Sent: Thursday, February 03, 2011 11:28 AM
> To: user@hbase.apache.org
> Subject: Re: Fastest way to read only the keys of a HTable?
> 
> Hmm.. performance hasn't improved at all.  Do you see anything wrong with
> the following code:
> 
> 
>     public List<Partner> getPartners() {
>       ArrayList<Partner> partners = new ArrayList<Partner>();
> 
>       try {
>           HTable table = new HTable("partner");
>           Scan scan = new Scan();
>           scan.setFilter(new FirstKeyOnlyFilter());
>           ResultScanner scanner = table.getScanner(scan);
>           Result result = scanner.next();
>           while (result != null) {
>               Partner partner = new
> Partner(Bytes.toString(result.getRow()));
>               partners.add(partner);
>               result = scanner.next();
>           }
>       } catch (IOException e) {
>           throw new RuntimeException(e);
>       }
>       return partners;
>   }
> 
> May be I shouldn't use more than one "column family" in a HTable - but the
> BigTable paper recommends that, doesn't it?  Please advice and thanks for
> your help.
> 
> 
> 
> 
> On Wed, Feb 2, 2011 at 10:55 PM, Stack <st...@duboce.net> wrote:
> 
> > I don't see a getKey on Result.  Use
> >
> >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.
> > html#getRow()
> > .
> >
> > Here is how its used in the shell table.rb class:
> >
> >    # Count rows in a table
> >    def count(interval = 1000, caching_rows = 10)
> >      # We can safely set scanner caching with the first key only filter
> >      scan = org.apache.hadoop.hbase.client.Scan.new
> >      scan.cache_blocks = false
> >      scan.caching = caching_rows
> >
> > scan.setFilter(org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new)
> >
> >      # Run the scanner
> >      scanner = @table.getScanner(scan)
> >      count = 0
> >      iter = scanner.iterator
> >
> >      # Iterate results
> >      while iter.hasNext
> >        row = iter.next
> >        count += 1
> >        next unless (block_given? && count % interval == 0)
> >        # Allow command modules to visualize counting process
> >        yield(count, String.from_java_bytes(row.getRow))
> >      end
> >
> >      # Return the counter
> >      return count
> >    end
> >
> >
> > St.Ack
> >
> > On Thu, Feb 3, 2011 at 6:47 AM, Something Something
> > <ma...@gmail.com> wrote:
> > > Thanks.  So I will add this...
> > >
> > >   scan.setFilter(new FirstKeyOnlyFilter());
> > >
> > > But after I do this...
> > >
> > >   Result result = scanner.next();
> > >
> > > There's no...  result.getKey() - so what method would give me the
> > > Key
> > value?
> > >
> > >
> > >
> > > On Wed, Feb 2, 2011 at 10:20 PM, Stack <st...@duboce.net> wrote:
> > >
> > >> See
> > >>
> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKe
> > yOnlyFilter.html
> > >> St.Ack
> > >>
> > >> On Thu, Feb 3, 2011 at 6:01 AM, Something Something
> > >> <ma...@gmail.com> wrote:
> > >> > I want to read only the keys in a table. I tried this...
> > >> >
> > >> >    try {
> > >> >
> > >> >  HTable table = new HTable("myTable");
> > >> >
> > >> >  Scan scan = new Scan();
> > >> >
> > >> >  scan.addFamily(Bytes.toBytes("Info"));
> > >> >
> > >> >  ResultScanner scanner = table.getScanner(scan);
> > >> >
> > >> >   Result result = scanner.next();
> > >> >
> > >> >  while (result != null) {
> > >> >
> > >> > & so on...
> > >> >
> > >> > This was performing fairly well until I added another Family that
> > >> contains
> > >> > lots of key/value pairs.  My understanding was that adding
> > >> > another
> > family
> > >> > wouldn't affect performance of this code because I am explicitly
> > >> > using "Info", but it is.
> > >> >
> > >> > Anyway, in this particular use case, I only care about the "Key"
> > >> > of
> > the
> > >> row.
> > >> >  I don't need any values from any of the families.  What's the
> > >> > best
> > way
> > >> to
> > >> > do this?
> > >> >
> > >> > Please let me know.  Thanks.
> > >> >
> > >>
> > >
> >

Re: Fastest way to read only the keys of a HTable?

Posted by Something Something <ma...@gmail.com>.
Hmm.. performance hasn't improved at all.  Do you see anything wrong with
the following code:


    public List<Partner> getPartners() {
      ArrayList<Partner> partners = new ArrayList<Partner>();

      try {
          HTable table = new HTable("partner");
          Scan scan = new Scan();
          scan.setFilter(new FirstKeyOnlyFilter());
          ResultScanner scanner = table.getScanner(scan);
          Result result = scanner.next();
          while (result != null) {
              Partner partner = new
Partner(Bytes.toString(result.getRow()));
              partners.add(partner);
              result = scanner.next();
          }
      } catch (IOException e) {
          throw new RuntimeException(e);
      }
      return partners;
  }

May be I shouldn't use more than one "column family" in a HTable - but the
BigTable paper recommends that, doesn't it?  Please advice and thanks for
your help.




On Wed, Feb 2, 2011 at 10:55 PM, Stack <st...@duboce.net> wrote:

> I don't see a getKey on Result.  Use
>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html#getRow()
> .
>
> Here is how its used in the shell table.rb class:
>
>    # Count rows in a table
>    def count(interval = 1000, caching_rows = 10)
>      # We can safely set scanner caching with the first key only filter
>      scan = org.apache.hadoop.hbase.client.Scan.new
>      scan.cache_blocks = false
>      scan.caching = caching_rows
>      scan.setFilter(org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new)
>
>      # Run the scanner
>      scanner = @table.getScanner(scan)
>      count = 0
>      iter = scanner.iterator
>
>      # Iterate results
>      while iter.hasNext
>        row = iter.next
>        count += 1
>        next unless (block_given? && count % interval == 0)
>        # Allow command modules to visualize counting process
>        yield(count, String.from_java_bytes(row.getRow))
>      end
>
>      # Return the counter
>      return count
>    end
>
>
> St.Ack
>
> On Thu, Feb 3, 2011 at 6:47 AM, Something Something
> <ma...@gmail.com> wrote:
> > Thanks.  So I will add this...
> >
> >   scan.setFilter(new FirstKeyOnlyFilter());
> >
> > But after I do this...
> >
> >   Result result = scanner.next();
> >
> > There's no...  result.getKey() - so what method would give me the Key
> value?
> >
> >
> >
> > On Wed, Feb 2, 2011 at 10:20 PM, Stack <st...@duboce.net> wrote:
> >
> >> See
> >>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html
> >> St.Ack
> >>
> >> On Thu, Feb 3, 2011 at 6:01 AM, Something Something
> >> <ma...@gmail.com> wrote:
> >> > I want to read only the keys in a table. I tried this...
> >> >
> >> >    try {
> >> >
> >> >  HTable table = new HTable("myTable");
> >> >
> >> >  Scan scan = new Scan();
> >> >
> >> >  scan.addFamily(Bytes.toBytes("Info"));
> >> >
> >> >  ResultScanner scanner = table.getScanner(scan);
> >> >
> >> >   Result result = scanner.next();
> >> >
> >> >  while (result != null) {
> >> >
> >> > & so on...
> >> >
> >> > This was performing fairly well until I added another Family that
> >> contains
> >> > lots of key/value pairs.  My understanding was that adding another
> family
> >> > wouldn't affect performance of this code because I am explicitly using
> >> > "Info", but it is.
> >> >
> >> > Anyway, in this particular use case, I only care about the "Key" of
> the
> >> row.
> >> >  I don't need any values from any of the families.  What's the best
> way
> >> to
> >> > do this?
> >> >
> >> > Please let me know.  Thanks.
> >> >
> >>
> >
>

Re: Fastest way to read only the keys of a HTable?

Posted by Stack <st...@duboce.net>.
I don't see a getKey on Result.  Use
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html#getRow().

Here is how its used in the shell table.rb class:

    # Count rows in a table
    def count(interval = 1000, caching_rows = 10)
      # We can safely set scanner caching with the first key only filter
      scan = org.apache.hadoop.hbase.client.Scan.new
      scan.cache_blocks = false
      scan.caching = caching_rows
      scan.setFilter(org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new)

      # Run the scanner
      scanner = @table.getScanner(scan)
      count = 0
      iter = scanner.iterator

      # Iterate results
      while iter.hasNext
        row = iter.next
        count += 1
        next unless (block_given? && count % interval == 0)
        # Allow command modules to visualize counting process
        yield(count, String.from_java_bytes(row.getRow))
      end

      # Return the counter
      return count
    end


St.Ack

On Thu, Feb 3, 2011 at 6:47 AM, Something Something
<ma...@gmail.com> wrote:
> Thanks.  So I will add this...
>
>   scan.setFilter(new FirstKeyOnlyFilter());
>
> But after I do this...
>
>   Result result = scanner.next();
>
> There's no...  result.getKey() - so what method would give me the Key value?
>
>
>
> On Wed, Feb 2, 2011 at 10:20 PM, Stack <st...@duboce.net> wrote:
>
>> See
>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html
>> St.Ack
>>
>> On Thu, Feb 3, 2011 at 6:01 AM, Something Something
>> <ma...@gmail.com> wrote:
>> > I want to read only the keys in a table. I tried this...
>> >
>> >    try {
>> >
>> >  HTable table = new HTable("myTable");
>> >
>> >  Scan scan = new Scan();
>> >
>> >  scan.addFamily(Bytes.toBytes("Info"));
>> >
>> >  ResultScanner scanner = table.getScanner(scan);
>> >
>> >   Result result = scanner.next();
>> >
>> >  while (result != null) {
>> >
>> > & so on...
>> >
>> > This was performing fairly well until I added another Family that
>> contains
>> > lots of key/value pairs.  My understanding was that adding another family
>> > wouldn't affect performance of this code because I am explicitly using
>> > "Info", but it is.
>> >
>> > Anyway, in this particular use case, I only care about the "Key" of the
>> row.
>> >  I don't need any values from any of the families.  What's the best way
>> to
>> > do this?
>> >
>> > Please let me know.  Thanks.
>> >
>>
>

Re: Fastest way to read only the keys of a HTable?

Posted by Something Something <ma...@gmail.com>.
Thanks.  So I will add this...

   scan.setFilter(new FirstKeyOnlyFilter());

But after I do this...

   Result result = scanner.next();

There's no...  result.getKey() - so what method would give me the Key value?



On Wed, Feb 2, 2011 at 10:20 PM, Stack <st...@duboce.net> wrote:

> See
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html
> St.Ack
>
> On Thu, Feb 3, 2011 at 6:01 AM, Something Something
> <ma...@gmail.com> wrote:
> > I want to read only the keys in a table. I tried this...
> >
> >    try {
> >
> >  HTable table = new HTable("myTable");
> >
> >  Scan scan = new Scan();
> >
> >  scan.addFamily(Bytes.toBytes("Info"));
> >
> >  ResultScanner scanner = table.getScanner(scan);
> >
> >   Result result = scanner.next();
> >
> >  while (result != null) {
> >
> > & so on...
> >
> > This was performing fairly well until I added another Family that
> contains
> > lots of key/value pairs.  My understanding was that adding another family
> > wouldn't affect performance of this code because I am explicitly using
> > "Info", but it is.
> >
> > Anyway, in this particular use case, I only care about the "Key" of the
> row.
> >  I don't need any values from any of the families.  What's the best way
> to
> > do this?
> >
> > Please let me know.  Thanks.
> >
>

Re: Fastest way to read only the keys of a HTable?

Posted by Stack <st...@duboce.net>.
See http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html
St.Ack

On Thu, Feb 3, 2011 at 6:01 AM, Something Something
<ma...@gmail.com> wrote:
> I want to read only the keys in a table. I tried this...
>
>    try {
>
>  HTable table = new HTable("myTable");
>
>  Scan scan = new Scan();
>
>  scan.addFamily(Bytes.toBytes("Info"));
>
>  ResultScanner scanner = table.getScanner(scan);
>
>   Result result = scanner.next();
>
>  while (result != null) {
>
> & so on...
>
> This was performing fairly well until I added another Family that contains
> lots of key/value pairs.  My understanding was that adding another family
> wouldn't affect performance of this code because I am explicitly using
> "Info", but it is.
>
> Anyway, in this particular use case, I only care about the "Key" of the row.
>  I don't need any values from any of the families.  What's the best way to
> do this?
>
> Please let me know.  Thanks.
>