You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Sachin Jain <sa...@gmail.com> on 2016/07/21 06:28:14 UTC

How to get size of Hbase Table

*Context*
I am using Spark (1.5.1) with HBase (1.1.2) to dump the output of Spark
Jobs into HBase which will be further available as lookups from HBase
Table. BaseRelation extends HadoopFSRelation and is used to read and write
to HBase. Spark Default Source API is used.

*Use Case*
Now, whenever I perform join operation, Spark creates a logical plan and
decides which type of join it should execute and as per Spark Strategies
[0] it checks the size of HBase Table. If it is less than some threshold
(10 MB) it selects Broadcast Hash join otherwise Sort Merge join.

*Problem Statement*
I want to know if there is an API or some approach to calculate the size of
an HBase table.

[0]:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L118

Thanks
-Sachin

Re: How to get size of Hbase Table

Posted by Sachin Jain <sa...@gmail.com>.
Thanks Ted,

Do you see a good approach for calculating size of HBase Table and telling
it to Spark and letting it decide which type of join it should perform.
As far as I have understood, from HBaseAdmin we calculate number of regions
and for each region we compute Hfiles and get their weight [0].

[0]:
https://github.com/apache/hbase/blob/rel/1.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/HDFSBlocksDistribution.java


On Fri, Jul 22, 2016 at 4:24 AM, Ted Yu <yu...@gmail.com> wrote:

> Please take a look at the following methods:
>
> From HBaseAdmin:
>
>   public List<HRegionInfo> getTableRegions(final TableName tableName)
>
> From HRegion:
>
>   public static HDFSBlocksDistribution computeHDFSBlocksDistribution(final
> Configuration conf,
>
>       final HTableDescriptor tableDescriptor, final HRegionInfo regionInfo)
> throws IOException {
>
> FYI
>
> On Wed, Jul 20, 2016 at 11:28 PM, Sachin Jain <sa...@gmail.com>
> wrote:
>
> > *Context*
> > I am using Spark (1.5.1) with HBase (1.1.2) to dump the output of Spark
> > Jobs into HBase which will be further available as lookups from HBase
> > Table. BaseRelation extends HadoopFSRelation and is used to read and
> write
> > to HBase. Spark Default Source API is used.
> >
> > *Use Case*
> > Now, whenever I perform join operation, Spark creates a logical plan and
> > decides which type of join it should execute and as per Spark Strategies
> > [0] it checks the size of HBase Table. If it is less than some threshold
> > (10 MB) it selects Broadcast Hash join otherwise Sort Merge join.
> >
> > *Problem Statement*
> > I want to know if there is an API or some approach to calculate the size
> of
> > an HBase table.
> >
> > [0]:
> >
> >
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L118
> >
> > Thanks
> > -Sachin
> >
>

Re: How to get size of Hbase Table

Posted by Ted Yu <yu...@gmail.com>.
Please take a look at the following methods:

From HBaseAdmin:

  public List<HRegionInfo> getTableRegions(final TableName tableName)

From HRegion:

  public static HDFSBlocksDistribution computeHDFSBlocksDistribution(final
Configuration conf,

      final HTableDescriptor tableDescriptor, final HRegionInfo regionInfo)
throws IOException {

FYI

On Wed, Jul 20, 2016 at 11:28 PM, Sachin Jain <sa...@gmail.com>
wrote:

> *Context*
> I am using Spark (1.5.1) with HBase (1.1.2) to dump the output of Spark
> Jobs into HBase which will be further available as lookups from HBase
> Table. BaseRelation extends HadoopFSRelation and is used to read and write
> to HBase. Spark Default Source API is used.
>
> *Use Case*
> Now, whenever I perform join operation, Spark creates a logical plan and
> decides which type of join it should execute and as per Spark Strategies
> [0] it checks the size of HBase Table. If it is less than some threshold
> (10 MB) it selects Broadcast Hash join otherwise Sort Merge join.
>
> *Problem Statement*
> I want to know if there is an API or some approach to calculate the size of
> an HBase table.
>
> [0]:
>
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L118
>
> Thanks
> -Sachin
>