You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Sachin Jain <sa...@gmail.com> on 2016/07/21 06:28:14 UTC
How to get size of Hbase Table
*Context*
I am using Spark (1.5.1) with HBase (1.1.2) to dump the output of Spark
Jobs into HBase which will be further available as lookups from HBase
Table. BaseRelation extends HadoopFSRelation and is used to read and write
to HBase. Spark Default Source API is used.
*Use Case*
Now, whenever I perform join operation, Spark creates a logical plan and
decides which type of join it should execute and as per Spark Strategies
[0] it checks the size of HBase Table. If it is less than some threshold
(10 MB) it selects Broadcast Hash join otherwise Sort Merge join.
*Problem Statement*
I want to know if there is an API or some approach to calculate the size of
an HBase table.
[0]:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L118
Thanks
-Sachin
Re: How to get size of Hbase Table
Posted by Sachin Jain <sa...@gmail.com>.
Thanks Ted,
Do you see a good approach for calculating size of HBase Table and telling
it to Spark and letting it decide which type of join it should perform.
As far as I have understood, from HBaseAdmin we calculate number of regions
and for each region we compute Hfiles and get their weight [0].
[0]:
https://github.com/apache/hbase/blob/rel/1.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/HDFSBlocksDistribution.java
On Fri, Jul 22, 2016 at 4:24 AM, Ted Yu <yu...@gmail.com> wrote:
> Please take a look at the following methods:
>
> From HBaseAdmin:
>
> public List<HRegionInfo> getTableRegions(final TableName tableName)
>
> From HRegion:
>
> public static HDFSBlocksDistribution computeHDFSBlocksDistribution(final
> Configuration conf,
>
> final HTableDescriptor tableDescriptor, final HRegionInfo regionInfo)
> throws IOException {
>
> FYI
>
> On Wed, Jul 20, 2016 at 11:28 PM, Sachin Jain <sa...@gmail.com>
> wrote:
>
> > *Context*
> > I am using Spark (1.5.1) with HBase (1.1.2) to dump the output of Spark
> > Jobs into HBase which will be further available as lookups from HBase
> > Table. BaseRelation extends HadoopFSRelation and is used to read and
> write
> > to HBase. Spark Default Source API is used.
> >
> > *Use Case*
> > Now, whenever I perform join operation, Spark creates a logical plan and
> > decides which type of join it should execute and as per Spark Strategies
> > [0] it checks the size of HBase Table. If it is less than some threshold
> > (10 MB) it selects Broadcast Hash join otherwise Sort Merge join.
> >
> > *Problem Statement*
> > I want to know if there is an API or some approach to calculate the size
> of
> > an HBase table.
> >
> > [0]:
> >
> >
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L118
> >
> > Thanks
> > -Sachin
> >
>
Re: How to get size of Hbase Table
Posted by Ted Yu <yu...@gmail.com>.
Please take a look at the following methods:
From HBaseAdmin:
public List<HRegionInfo> getTableRegions(final TableName tableName)
From HRegion:
public static HDFSBlocksDistribution computeHDFSBlocksDistribution(final
Configuration conf,
final HTableDescriptor tableDescriptor, final HRegionInfo regionInfo)
throws IOException {
FYI
On Wed, Jul 20, 2016 at 11:28 PM, Sachin Jain <sa...@gmail.com>
wrote:
> *Context*
> I am using Spark (1.5.1) with HBase (1.1.2) to dump the output of Spark
> Jobs into HBase which will be further available as lookups from HBase
> Table. BaseRelation extends HadoopFSRelation and is used to read and write
> to HBase. Spark Default Source API is used.
>
> *Use Case*
> Now, whenever I perform join operation, Spark creates a logical plan and
> decides which type of join it should execute and as per Spark Strategies
> [0] it checks the size of HBase Table. If it is less than some threshold
> (10 MB) it selects Broadcast Hash join otherwise Sort Merge join.
>
> *Problem Statement*
> I want to know if there is an API or some approach to calculate the size of
> an HBase table.
>
> [0]:
>
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L118
>
> Thanks
> -Sachin
>