You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jack Chan <cd...@gmail.com> on 2013/10/17 08:13:59 UTC

why HTableDescriptor.getFamiliesKeys is so lag?

Hi all~
    I need to get all column families from specified table,When I look into the class "org.apache.hadoop.hbase.HTableDescriptor",I found that 
there are more than three methods can be used.
    See the code below,there are method1,method2,method3 to do the same thing:

/*___________code begin___________*/

HTable table = new HTable(config, "mytable");
HTableDescriptor htd = table.getTableDescriptor();
//method 1
TimeCounter tc = new TimeCounter().run();
HColumnDescriptor[] cfs = htd.getColumnFamilies();
for(int i=0;i< cfs.length;i++){
    System.out.println("column family:"+new String(cfs[i].getName()));
}
System.out.println("time with getColumnFamilies-->"+tc.stop().getMicroSeconds());

//method2
TimeCounter tc2 = new TimeCounter().run();
Set<byte[]> family_keys = htd.getFamiliesKeys();
for(byte[] _f :family_keys){
    System.out.println("column family:"+new String(_f));
}
System.out.println("time with getFamiliesKeys-->"+tc2.stop().getMicroSeconds());

//method3
TimeCounter tc3 = new TimeCounter().run();
Collection<HColumnDescriptor> family_co = htd.getFamilies();
for(HColumnDescriptor family_co_entry :family_co){
    System.out.println("column family:"+new String(family_co_entry.getName()));
}
System.out.println("time with getFamilies-->"+tc3.stop().getMicroSeconds());

/*___________________code end_____________________*/

I found that the efficience of method 1 and method 3 are the same,about 120 us.
but the method2 is lagging,about 500us.

I just need to retieve the column families' names.So method2 is just meet my need.
but why is it so lag?

Thanks.



Jack Chan. 
A new Apache-Camel rider.
sina-weibo:@for-each

Re: Re: why HTableDescriptor.getFamiliesKeys is so lag?

Posted by Jack Chan <cd...@gmail.com>.
Hi JM,

Many thanks :)

I'm just curious to know what makes it lag.

Now it's much more clear to me.

Best Regards.




Jack Chan. 

From: Jean-Marc Spaggiari
Date: 2013-10-19 04:46
To: user@hbase.apache.org; cdj0579
Subject: Re: why HTableDescriptor.getFamiliesKeys is so lag?
Hi Jack,


From the code...


// method 1 will call
  /** 
   * Returns an array all the {@link HColumnDescriptor} of the column families 
   * of the table.
   *  
   * @return Array of all the HColumnDescriptors of the current table 
   * 
   * @see #getFamilies()
   */
  public HColumnDescriptor[] getColumnFamilies() {
    return getFamilies().toArray(new HColumnDescriptor[0]);
  }


Where getFamilies is return Collections.unmodifiableCollection(this.families.values());


// method 2 will call
  /**
   * Returns all the column family names of the current table. The map of 
   * HTableDescriptor contains mapping of family name to HColumnDescriptors. 
   * This returns all the keys of the family map which represents the column 
   * family names of the table. 
   * 
   * @return Immutable sorted set of the keys of the families.
   */
  public Set<byte[]> getFamiliesKeys() {
    return Collections.unmodifiableSet(this.families.keySet());
  }


// method 3 will call
  /**
   * Returns an unmodifiable collection of all the {@link HColumnDescriptor} 
   * of all the column families of the table.
   *  
   * @return Immutable collection of {@link HColumnDescriptor} of all the
   * column families. 
   */
  public Collection<HColumnDescriptor> getFamilies() {
    return Collections.unmodifiableCollection(this.families.values());
  }




So method 1 and 3 are almost the same thing. 1 is a wrapper around 3.


So let's see the difference betwee, 2 and 3. They both do almost the samething, but one arround keySet() and the otherone around values(). Both of them are calling those mehods on families which is a TreeMap. So sound like TreeMap.values() is faster than TreeMap.keySet();


Looking into the TreeMap code (and we are no more into HBase here):
    public Collection<V> values() {
        Collection<V> vs = values;
        return (vs != null) ? vs : (values = new Values());
    }




values() will just return the internal values object if it exist (which is most probably the case), while keySet() will do almost the same thing but has to call another method too:


    /**
     * Returns a {@link Set} view of the keys contained in this map.
     * The set's iterator returns the keys in ascending order.
     * The set is backed by the map, so changes to the map are
     * reflected in the set, and vice-versa.  If the map is modified
     * while an iteration over the set is in progress (except through
     * the iterator's own <tt>remove</tt> operation), the results of
     * the iteration are undefined.  The set supports element removal,
     * which removes the corresponding mapping from the map, via the
     * <tt>Iterator.remove</tt>, <tt>Set.remove</tt>,
     * <tt>removeAll</tt>, <tt>retainAll</tt>, and <tt>clear</tt>
     * operations.  It does not support the <tt>add</tt> or <tt>addAll</tt>
     * operations.
     */
    public Set<K> keySet() {
        return navigableKeySet();
    }


    /**
     * @since 1.6
     */
    public NavigableSet<K> navigableKeySet() {
        KeySet<K> nks = navigableKeySet;
        return (nks != null) ? nks : (navigableKeySet = new KeySet(this));
    }




So now, 2 options.


1) If you can run each of your method twice, most probably the 2nd time they will all be as fast.
2) the navigableKeySet() call from keySet costs 100ms, which will really surprise me since I guess the compiler will optimize that.


Last, I'm not sure why those 100ms are important for you, but if they are because you need to call this method multiple times, then just cache the result on the client side.


HTH.


JM


Le jeudi 17 octobre 2013, Jack Chan a écrit :

Hi all~
    I need to get all column families from specified table,When I look into the class "org.apache.hadoop.hbase.HTableDescriptor",I found that
there are more than three methods can be used.
    See the code below,there are method1,method2,method3 to do the same thing:

/*___________code begin___________*/

HTable table = new HTable(config, "mytable");
HTableDescriptor htd = table.getTableDescriptor();
//method 1
TimeCounter tc = new TimeCounter().run();
HColumnDescriptor[] cfs = htd.getColumnFamilies();
for(int i=0;i< cfs.length;i++){
    System.out.println("column family:"+new String(cfs[i].getName()));
}
System.out.println("time with getColumnFamilies-->"+tc.stop().getMicroSeconds());

//method2
TimeCounter tc2 = new TimeCounter().run();
Set<byte[]> family_keys = htd.getFamiliesKeys();
for(byte[] _f :family_keys){
    System.out.println("column family:"+new String(_f));
}
System.out.println("time with getFamiliesKeys-->"+tc2.stop().getMicroSeconds());

//method3
TimeCounter tc3 = new TimeCounter().run();
Collection<HColumnDescriptor> family_co = htd.getFamilies();
for(HColumnDescriptor family_co_entry :family_co){
    System.out.println("column family:"+new String(family_co_entry.getName()));
}
System.out.println("time with getFamilies-->"+tc3.stop().getMicroSeconds());

/*___________________code end_____________________*/

I found that the efficience of method 1 and method 3 are the same,about 120 us.
but the method2 is lagging,about 500us.

I just need to retieve the column families' names.So method2 is just meet my need.
but why is it so lag?

Thanks.



Jack Chan.
A new Apache-Camel rider.
sina-weibo:@for-each

Re: why HTableDescriptor.getFamiliesKeys is so lag?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Jack,

>From the code...

// method 1 will call
  /**
   * Returns an array all the {@link HColumnDescriptor} of the column
families
   * of the table.
   *
   * @return Array of all the HColumnDescriptors of the current table
   *
   * @see #getFamilies()
   */
  public HColumnDescriptor[] getColumnFamilies() {
    return getFamilies().toArray(new HColumnDescriptor[0]);
  }

Where getFamilies is return
Collections.unmodifiableCollection(this.families.values());

// method 2 will call
  /**
   * Returns all the column family names of the current table. The map of
   * HTableDescriptor contains mapping of family name to
HColumnDescriptors.
   * This returns all the keys of the family map which represents the
column
   * family names of the table.
   *
   * @return Immutable sorted set of the keys of the families.
   */
  public Set<byte[]> getFamiliesKeys() {
    return Collections.unmodifiableSet(this.families.keySet());
  }

// method 3 will call
  /**
   * Returns an unmodifiable collection of all the {@link
HColumnDescriptor}
   * of all the column families of the table.
   *
   * @return Immutable collection of {@link HColumnDescriptor} of all the
   * column families.
   */
  public Collection<HColumnDescriptor> getFamilies() {
    return Collections.unmodifiableCollection(this.families.values());
  }


So method 1 and 3 are almost the same thing. 1 is a wrapper around 3.

So let's see the difference betwee, 2 and 3. They both do almost the
samething, but one arround keySet() and the otherone around values(). Both
of them are calling those mehods on families which is a TreeMap. So sound
like TreeMap.values() is faster than TreeMap.keySet();

Looking into the TreeMap code (and we are no more into HBase here):
    public Collection<V> values() {
        Collection<V> vs = values;
        return (vs != null) ? vs : (values = new Values());
    }


values() will just return the internal values object if it exist (which is
most probably the case), while keySet() will do almost the same thing but
has to call another method too:

    /**
     * Returns a {@link Set} view of the keys contained in this map.
     * The set's iterator returns the keys in ascending order.
     * The set is backed by the map, so changes to the map are
     * reflected in the set, and vice-versa.  If the map is modified
     * while an iteration over the set is in progress (except through
     * the iterator's own <tt>remove</tt> operation), the results of
     * the iteration are undefined.  The set supports element removal,
     * which removes the corresponding mapping from the map, via the
     * <tt>Iterator.remove</tt>, <tt>Set.remove</tt>,
     * <tt>removeAll</tt>, <tt>retainAll</tt>, and <tt>clear</tt>
     * operations.  It does not support the <tt>add</tt> or <tt>addAll</tt>
     * operations.
     */
    public Set<K> keySet() {
        return navigableKeySet();
    }

    /**
     * @since 1.6
     */
    public NavigableSet<K> navigableKeySet() {
        KeySet<K> nks = navigableKeySet;
        return (nks != null) ? nks : (navigableKeySet = new KeySet(this));
    }


So now, 2 options.

1) If you can run each of your method twice, most probably the 2nd time
they will all be as fast.
2) the navigableKeySet() call from keySet costs 100ms, which will really
surprise me since I guess the compiler will optimize that.

Last, I'm not sure why those 100ms are important for you, but if they are
because you need to call this method multiple times, then just cache the
result on the client side.

HTH.

JM

Le jeudi 17 octobre 2013, Jack Chan a écrit :

> Hi all~
>     I need to get all column families from specified table,When I look
> into the class "org.apache.hadoop.hbase.HTableDescriptor",I found that
> there are more than three methods can be used.
>     See the code below,there are method1,method2,method3 to do the same
> thing:
>
> /*___________code begin___________*/
>
> HTable table = new HTable(config, "mytable");
> HTableDescriptor htd = table.getTableDescriptor();
> //method 1
> TimeCounter tc = new TimeCounter().run();
> HColumnDescriptor[] cfs = htd.getColumnFamilies();
> for(int i=0;i< cfs.length;i++){
>     System.out.println("column family:"+new String(cfs[i].getName()));
> }
> System.out.println("time with
> getColumnFamilies-->"+tc.stop().getMicroSeconds());
>
> //method2
> TimeCounter tc2 = new TimeCounter().run();
> Set<byte[]> family_keys = htd.getFamiliesKeys();
> for(byte[] _f :family_keys){
>     System.out.println("column family:"+new String(_f));
> }
> System.out.println("time with
> getFamiliesKeys-->"+tc2.stop().getMicroSeconds());
>
> //method3
> TimeCounter tc3 = new TimeCounter().run();
> Collection<HColumnDescriptor> family_co = htd.getFamilies();
> for(HColumnDescriptor family_co_entry :family_co){
>     System.out.println("column family:"+new
> String(family_co_entry.getName()));
> }
> System.out.println("time with
> getFamilies-->"+tc3.stop().getMicroSeconds());
>
> /*___________________code end_____________________*/
>
> I found that the efficience of method 1 and method 3 are the same,about
> 120 us.
> but the method2 is lagging,about 500us.
>
> I just need to retieve the column families' names.So method2 is just meet
> my need.
> but why is it so lag?
>
> Thanks.
>
>
>
> Jack Chan.
> A new Apache-Camel rider.
> sina-weibo:@for-each