You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Antonio Si <an...@gmail.com> on 2018/08/22 18:28:32 UTC

how to get rowkey with largest number of versions

Hi,

I am new to hbase. I am wondering how I could find out which rowkey has the
largest number of versions in a column family.

Any pointer would be very helpful.

Thanks.

Antonio.

Re: time out when running CellCounter

Posted by Ted Yu <yu...@gmail.com>.
Please also take a look at:
hbase-examples/src/main//java/org/apache/hadoop/hbase/coprocessor/example/RowCountEndpoint.java

You can reuse some code from CellCounter to enhance the above example
endpoint so that counting versions is done on server side instead of
through mapreduce.

FYI

On Sat, Aug 25, 2018 at 2:49 PM Antonio Si <an...@gmail.com> wrote:

> Thanks Ted.
>
> I try passing "-Dhbase.client.scanner.timeout.period=1800000" when I invoke
> CellCounter, but it is still saying timeout after 600 sec.
>
> Thanks.
>
> Antonio.
>
> On Sat, Aug 25, 2018 at 2:09 PM Ted Yu <yu...@gmail.com> wrote:
>
> > It seems CellCounter doesn't have such (commandline) option.
> >
> > You can specify, e.g. scan timerange, scan max versions, start row, stop
> > row, etc. so that individual run has shorter runtime.
> >
> > Cheers
> >
> > On Sat, Aug 25, 2018 at 9:35 AM Antonio Si <an...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > When I run  org.apache.hadoop.hbase.mapreduce.*CellCounter*, I am
> getting
> > > Timed
> > > out after 600 secs. Is there a way to override the timeout value rather
> > > than changing it in hbase-site.xml and restart hbase?
> > >
> > > Any suggestions would be helpful.
> > >
> > > Thank you.
> > >
> > > Antonio.
> > >
> >
>

Re: time out when running CellCounter

Posted by Antonio Si <an...@gmail.com>.
Thanks Ted.

I try passing "-Dhbase.client.scanner.timeout.period=1800000" when I invoke
CellCounter, but it is still saying timeout after 600 sec.

Thanks.

Antonio.

On Sat, Aug 25, 2018 at 2:09 PM Ted Yu <yu...@gmail.com> wrote:

> It seems CellCounter doesn't have such (commandline) option.
>
> You can specify, e.g. scan timerange, scan max versions, start row, stop
> row, etc. so that individual run has shorter runtime.
>
> Cheers
>
> On Sat, Aug 25, 2018 at 9:35 AM Antonio Si <an...@gmail.com> wrote:
>
> > Hi,
> >
> > When I run  org.apache.hadoop.hbase.mapreduce.*CellCounter*, I am getting
> > Timed
> > out after 600 secs. Is there a way to override the timeout value rather
> > than changing it in hbase-site.xml and restart hbase?
> >
> > Any suggestions would be helpful.
> >
> > Thank you.
> >
> > Antonio.
> >
>

Re: time out when running CellCounter

Posted by Ted Yu <yu...@gmail.com>.
It seems CellCounter doesn't have such (commandline) option.

You can specify, e.g. scan timerange, scan max versions, start row, stop
row, etc. so that individual run has shorter runtime.

Cheers

On Sat, Aug 25, 2018 at 9:35 AM Antonio Si <an...@gmail.com> wrote:

> Hi,
>
> When I run  org.apache.hadoop.hbase.mapreduce.*CellCounter*, I am getting
> Timed
> out after 600 secs. Is there a way to override the timeout value rather
> than changing it in hbase-site.xml and restart hbase?
>
> Any suggestions would be helpful.
>
> Thank you.
>
> Antonio.
>

time out when running CellCounter

Posted by Antonio Si <an...@gmail.com>.
Hi,

When I run  org.apache.hadoop.hbase.mapreduce.*CellCounter*, I am getting Timed
out after 600 secs. Is there a way to override the timeout value rather
than changing it in hbase-site.xml and restart hbase?

Any suggestions would be helpful.

Thank you.

Antonio.

Re: how to get rowkey with largest number of versions

Posted by Antonio Si <an...@gmail.com>.
Thanks for all the info. I will give it a try.

On Wed, Aug 22, 2018 at 12:13 PM, Ted Yu <yu...@gmail.com> wrote:

> Antonio:
> Please take a look at CellCounter under hbase-mapreduce module which may be
> of use to you:
>
>  * 6. Total number of versions of each qualifier.
>
>
> Please note that the max versions may fluctuate depending on when major
> compaction kicks in.
>
>
> FYI
>
> On Wed, Aug 22, 2018 at 11:53 AM Ankit Singhal <an...@gmail.com>
> wrote:
>
> > I don't think so if there is any direct way.
> > You may need to do a raw scan of a full table and count the number of
> > versions of a column returned for each row to calculate the max. (you can
> > optimize this with custom coprocessor by returning a single row key
> having
> > the largest versions of a column through each regionserver and at client
> > select max out of all results)
> >
> > On Wed, Aug 22, 2018 at 11:28 AM Antonio Si <an...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > I am new to hbase. I am wondering how I could find out which rowkey has
> > the
> > > largest number of versions in a column family.
> > >
> > > Any pointer would be very helpful.
> > >
> > > Thanks.
> > >
> > > Antonio.
> > >
> >
>

Re: how to get rowkey with largest number of versions

Posted by Ted Yu <yu...@gmail.com>.
Antonio:
Please take a look at CellCounter under hbase-mapreduce module which may be
of use to you:

 * 6. Total number of versions of each qualifier.


Please note that the max versions may fluctuate depending on when major
compaction kicks in.


FYI

On Wed, Aug 22, 2018 at 11:53 AM Ankit Singhal <an...@gmail.com>
wrote:

> I don't think so if there is any direct way.
> You may need to do a raw scan of a full table and count the number of
> versions of a column returned for each row to calculate the max. (you can
> optimize this with custom coprocessor by returning a single row key having
> the largest versions of a column through each regionserver and at client
> select max out of all results)
>
> On Wed, Aug 22, 2018 at 11:28 AM Antonio Si <an...@gmail.com> wrote:
>
> > Hi,
> >
> > I am new to hbase. I am wondering how I could find out which rowkey has
> the
> > largest number of versions in a column family.
> >
> > Any pointer would be very helpful.
> >
> > Thanks.
> >
> > Antonio.
> >
>

Re: how to get rowkey with largest number of versions

Posted by Ankit Singhal <an...@gmail.com>.
I don't think so if there is any direct way.
You may need to do a raw scan of a full table and count the number of
versions of a column returned for each row to calculate the max. (you can
optimize this with custom coprocessor by returning a single row key having
the largest versions of a column through each regionserver and at client
select max out of all results)

On Wed, Aug 22, 2018 at 11:28 AM Antonio Si <an...@gmail.com> wrote:

> Hi,
>
> I am new to hbase. I am wondering how I could find out which rowkey has the
> largest number of versions in a column family.
>
> Any pointer would be very helpful.
>
> Thanks.
>
> Antonio.
>