You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ganesh Viswanathan <ga...@gmail.com> on 2017/02/03 20:34:25 UTC

Dropping a very large table - 75million rows

Hello,

I need to drop an old HBase table that is quite large. It has anywhere
between 2million and 70million datapoints. I turned off the count after it
ran on the HBase shell for half a day. I have 4 other tables that have
around 75million rows in total and also take heavy PUT and GET traffic.

What is the best practice for disabling and dropping such a large table in
HBase so that I have minimal impact on the rest of the cluster?
1) I hear there are ways to disable (and drop?) specific regions? Would
that work?
2) Should I scan and delete a few rows at a time until the size becomes
manageable and then disable/drop the table?
  If so, what is a good number of rows to delete at a time, should I run a
major compaction after these row deletes on specific regions, and what is a
good sized table that can be easily dropped (and has been validated)
without causing issues on the larger cluster.


Thanks!
Ganesh

Re: Dropping a very large table - 75million rows

Posted by Ganesh Viswanathan <ga...@gmail.com>.
One additional question:

bq. "This wouldn't have affected your system's performance because the
locality for the table didn't change -- just the system-wide locality."

What does "locality for the table" mean and how is that different from
system-wide locality?
Do you mean other tables (in the system) could have lower locality, or do
you mean the drop in locality for the table did not affect system-wide RWs
because of replication (I have 3 for block repl in HDFS)?



On Thu, Feb 9, 2017 at 2:17 PM, Ganesh Viswanathan <ga...@gmail.com> wrote:

> Thanks Ted/Josh.
>
> Ted-
> I store historical metrics on the locality of regions in each regionserver
> in the cluster. I noticed that the old table had many regions with low
> locality before the drop while the other newer tables had very few cases of
> low locality. After the drop, the new table's regions showed a large drop
> in locality.
>
> Josh-
> I did go from 2200regions before the drop to about 540regions. So,
> relatively speaking, yes it could be that the old table had more regions
> with overall higher locality (though it showed up as more regions in each
> regionserver with lower locality).
>
> At what point (in terms of put/get load and row count/storage size in
> HBase), does it make sense to dive into the data-locality settings and try
> to tune it so that compaction and locality changes are more predictable?
> Would looking at how to create regions (instead of auto-sharding) provide
> greater benefit?
>
> Thanks!
> Ganesh
>
>
> On Thu, Feb 9, 2017 at 11:51 AM, Josh Elser <el...@apache.org> wrote:
>
>> It could be that the table you dropped had a very good locality while the
>> other tables had less. So, your overall locality went down (when the "good"
>> locality regions were no longer included). This wouldn't have affected your
>> system's performance because the locality for the table didn't change --
>> just the system-wide locality.
>>
>>
>> Ted Yu wrote:
>>
>>> bq. The locality of regions for OTHER tables on the same regionserver
>>> also
>>> fell drastically
>>>
>>> Can you be a bit more specific on how you came to the above conclusion ?
>>> Dropping one table shouldn't affect locality of other tables - unless
>>> number of regions on each server becomes unbalanced which triggers
>>> balancer
>>> activities.
>>>
>>> Thanks
>>>
>>> On Thu, Feb 9, 2017 at 7:34 AM, Ganesh Viswanathan<ga...@gmail.com>
>>> wrote:
>>>
>>> So here is what I observed.
>>>> Dropping this large table had an immediate effect on average locality
>>>> for
>>>> the entire cluster. The locality of regions for OTHER tables on the same
>>>> regionserver also fell drastically in the cluster. This was unexpected
>>>> (I
>>>> only thought locality of regions for the dropped table would be
>>>> impacted).
>>>> Is this because of compaction? Does the locality computation use the
>>>> size
>>>> of other regions on each regionserver?
>>>>
>>>> The large drop in locality, however, did not cause latency issues on
>>>> read
>>>> writes for the other tables. Why is that? Is it because I did not try to
>>>> hit all low locality regions?
>>>>
>>>> (On another note, I was able to test and perform deletions on per region
>>>> basis, but that requires hbck -repair and it seemed more invasive on the
>>>> entire cluster health.)
>>>>
>>>> Thanks,
>>>> Ganesh
>>>>
>>>>
>>>> On Sat, Feb 4, 2017 at 11:20 AM Josh Elser<el...@apache.org>  wrote:
>>>>
>>>> Ganesh,
>>>>>
>>>>> Just drop the table. You are worried about nothing.
>>>>>
>>>>> On Feb 3, 2017 16:51, "Ganesh Viswanathan"<ga...@gmail.com>  wrote:
>>>>>
>>>>> Hello Josh-
>>>>>>
>>>>>> I am trying to delete the entire table and recover the disk space. I
>>>>>> do
>>>>>>
>>>>> not
>>>>>
>>>>>> need to pick specific contents of the table (if thats what you are
>>>>>>
>>>>> asking
>>>>
>>>>> with #2).
>>>>>> My question is would disabling and dropping such a large table affect
>>>>>>
>>>>> data
>>>>>
>>>>>> locality in a bad way, or slow down the cluster when major_compaction
>>>>>>
>>>>> (or
>>>>
>>>>> whatever cleans up the tombstoned rows) happens. I also read from
>>>>>>
>>>>> another
>>>>
>>>>> post that it can spawn zookeeper transactions and even lock the
>>>>>>
>>>>> zookeeper
>>>>
>>>>> nodes. Is there any concern around zookeeper functionality when
>>>>>>
>>>>> dropping
>>>>
>>>>> large HBase tables.
>>>>>>
>>>>>> Thanks again for taking the time to respond to my questions!
>>>>>>
>>>>>> Ganesh
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Feb 3, 2017 at 1:12 PM, Josh Elser<el...@apache.org>  wrote:
>>>>>>
>>>>>> Ganesh -- I was trying to get at maybe there is a terminology issue
>>>>>>>
>>>>>> here.
>>>>>
>>>>>> If you disable+drop the table, this is an operation on the order of
>>>>>>>
>>>>>> Regions
>>>>>>
>>>>>>> you have. The number of rows/entries is irrelevant. Closing and
>>>>>>>
>>>>>> deleting
>>>>>
>>>>>> a
>>>>>>
>>>>>>> region is a relatively fast operation.
>>>>>>>
>>>>>>> Can you please confirm: are you trying to delete the entire table or
>>>>>>>
>>>>>> are
>>>>>
>>>>>> you trying to delete the *contents* of a table?
>>>>>>>
>>>>>>> If it is the former, I stand by my "you're worried about nothing"
>>>>>>>
>>>>>> comment
>>>>>
>>>>>> :)
>>>>>>>
>>>>>>>
>>>>>>> Ganesh Viswanathan wrote:
>>>>>>>
>>>>>>> Thanks Josh.
>>>>>>>>
>>>>>>>> Also, I realized I didnt give the full size of the table. It takes
>>>>>>>>
>>>>>>> in
>>>>
>>>>> ~75million rows per minute and stores for 15days. So around
>>>>>>>>
>>>>>>> 1.125billion
>>>>>
>>>>>> rows total.
>>>>>>>>
>>>>>>>> On Fri, Feb 3, 2017 at 12:52 PM, Josh Elser<el...@apache.org>
>>>>>>>>
>>>>>>> wrote:
>>>>>
>>>>>> I think you are worried about nothing, Ganesh.
>>>>>>>>
>>>>>>>>> If you want to drop (delete) the entire table, just disable and
>>>>>>>>>
>>>>>>>> drop
>>>>
>>>>> it
>>>>>
>>>>>> from the shell. This operation is not going to have a significant
>>>>>>>>>
>>>>>>>> impact
>>>>>>
>>>>>>> on
>>>>>>>>> your cluster (save a few flush'es). This would only happen if you
>>>>>>>>>
>>>>>>>> have
>>>>>
>>>>>> had
>>>>>>>>> recent writes to this table (which seems unlikely if you want to
>>>>>>>>>
>>>>>>>> drop
>>>>
>>>>> it).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ganesh Viswanathan wrote:
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>>> I need to drop an old HBase table that is quite large. It has
>>>>>>>>>>
>>>>>>>>> anywhere
>>>>>
>>>>>> between 2million and 70million datapoints. I turned off the count
>>>>>>>>>>
>>>>>>>>> after
>>>>>>
>>>>>>> it
>>>>>>>>>> ran on the HBase shell for half a day. I have 4 other tables that
>>>>>>>>>>
>>>>>>>>> have
>>>>>
>>>>>> around 75million rows in total and also take heavy PUT and GET
>>>>>>>>>>
>>>>>>>>> traffic.
>>>>>>
>>>>>>> What is the best practice for disabling and dropping such a large
>>>>>>>>>>
>>>>>>>>> table
>>>>>>
>>>>>>> in
>>>>>>>>>> HBase so that I have minimal impact on the rest of the cluster?
>>>>>>>>>> 1) I hear there are ways to disable (and drop?) specific regions?
>>>>>>>>>>
>>>>>>>>> Would
>>>>>>
>>>>>>> that work?
>>>>>>>>>> 2) Should I scan and delete a few rows at a time until the size
>>>>>>>>>>
>>>>>>>>> becomes
>>>>>>
>>>>>>> manageable and then disable/drop the table?
>>>>>>>>>>      If so, what is a good number of rows to delete at a time,
>>>>>>>>>>
>>>>>>>>> should I
>>>>>
>>>>>> run
>>>>>>>>>> a
>>>>>>>>>> major compaction after these row deletes on specific regions, and
>>>>>>>>>>
>>>>>>>>> what
>>>>>
>>>>>> is
>>>>>>>>>> a
>>>>>>>>>> good sized table that can be easily dropped (and has been
>>>>>>>>>>
>>>>>>>>> validated)
>>>>
>>>>> without causing issues on the larger cluster.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>> Ganesh
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>
>

Re: Dropping a very large table - 75million rows

Posted by Ted Yu <yu...@gmail.com>.
bq. After the drop, the new table's regions showed a large drop in locality.

Close to 1700 regions were dropped, please check master log for the
duration table was dropped to see how many regions (of the other tables)
were moved.
Region movement might result in drop in locality.

In StochasticLoadBalancer, there is LocalityCostFunction where:

    private static final String LOCALITY_COST_KEY =
"hbase.master.balancer.stochastic.localityCost";

    private static final float DEFAULT_LOCALITY_COST = 25;

The default weight is very low, compared to region count weight (500).

If you want balancer to favor locality when moving regions, please give the
above config higher weight (on same level as 500).


Cheers

On Thu, Feb 9, 2017 at 2:17 PM, Ganesh Viswanathan <ga...@gmail.com> wrote:

> Thanks Ted/Josh.
>
> Ted-
> I store historical metrics on the locality of regions in each regionserver
> in the cluster. I noticed that the old table had many regions with low
> locality before the drop while the other newer tables had very few cases of
> low locality. After the drop, the new table's regions showed a large drop
> in locality.
>
> Josh-
> I did go from 2200regions before the drop to about 540regions. So,
> relatively speaking, yes it could be that the old table had more regions
> with overall higher locality (though it showed up as more regions in each
> regionserver with lower locality).
>
> At what point (in terms of put/get load and row count/storage size in
> HBase), does it make sense to dive into the data-locality settings and try
> to tune it so that compaction and locality changes are more predictable?
> Would looking at how to create regions (instead of auto-sharding) provide
> greater benefit?
>
> Thanks!
> Ganesh
>
>
> On Thu, Feb 9, 2017 at 11:51 AM, Josh Elser <el...@apache.org> wrote:
>
> > It could be that the table you dropped had a very good locality while the
> > other tables had less. So, your overall locality went down (when the
> "good"
> > locality regions were no longer included). This wouldn't have affected
> your
> > system's performance because the locality for the table didn't change --
> > just the system-wide locality.
> >
> >
> > Ted Yu wrote:
> >
> >> bq. The locality of regions for OTHER tables on the same regionserver
> also
> >> fell drastically
> >>
> >> Can you be a bit more specific on how you came to the above conclusion ?
> >> Dropping one table shouldn't affect locality of other tables - unless
> >> number of regions on each server becomes unbalanced which triggers
> >> balancer
> >> activities.
> >>
> >> Thanks
> >>
> >> On Thu, Feb 9, 2017 at 7:34 AM, Ganesh Viswanathan<ga...@gmail.com>
> >> wrote:
> >>
> >> So here is what I observed.
> >>> Dropping this large table had an immediate effect on average locality
> for
> >>> the entire cluster. The locality of regions for OTHER tables on the
> same
> >>> regionserver also fell drastically in the cluster. This was unexpected
> (I
> >>> only thought locality of regions for the dropped table would be
> >>> impacted).
> >>> Is this because of compaction? Does the locality computation use the
> size
> >>> of other regions on each regionserver?
> >>>
> >>> The large drop in locality, however, did not cause latency issues on
> read
> >>> writes for the other tables. Why is that? Is it because I did not try
> to
> >>> hit all low locality regions?
> >>>
> >>> (On another note, I was able to test and perform deletions on per
> region
> >>> basis, but that requires hbck -repair and it seemed more invasive on
> the
> >>> entire cluster health.)
> >>>
> >>> Thanks,
> >>> Ganesh
> >>>
> >>>
> >>> On Sat, Feb 4, 2017 at 11:20 AM Josh Elser<el...@apache.org>  wrote:
> >>>
> >>> Ganesh,
> >>>>
> >>>> Just drop the table. You are worried about nothing.
> >>>>
> >>>> On Feb 3, 2017 16:51, "Ganesh Viswanathan"<ga...@gmail.com>  wrote:
> >>>>
> >>>> Hello Josh-
> >>>>>
> >>>>> I am trying to delete the entire table and recover the disk space. I
> do
> >>>>>
> >>>> not
> >>>>
> >>>>> need to pick specific contents of the table (if thats what you are
> >>>>>
> >>>> asking
> >>>
> >>>> with #2).
> >>>>> My question is would disabling and dropping such a large table affect
> >>>>>
> >>>> data
> >>>>
> >>>>> locality in a bad way, or slow down the cluster when major_compaction
> >>>>>
> >>>> (or
> >>>
> >>>> whatever cleans up the tombstoned rows) happens. I also read from
> >>>>>
> >>>> another
> >>>
> >>>> post that it can spawn zookeeper transactions and even lock the
> >>>>>
> >>>> zookeeper
> >>>
> >>>> nodes. Is there any concern around zookeeper functionality when
> >>>>>
> >>>> dropping
> >>>
> >>>> large HBase tables.
> >>>>>
> >>>>> Thanks again for taking the time to respond to my questions!
> >>>>>
> >>>>> Ganesh
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Fri, Feb 3, 2017 at 1:12 PM, Josh Elser<el...@apache.org>
> wrote:
> >>>>>
> >>>>> Ganesh -- I was trying to get at maybe there is a terminology issue
> >>>>>>
> >>>>> here.
> >>>>
> >>>>> If you disable+drop the table, this is an operation on the order of
> >>>>>>
> >>>>> Regions
> >>>>>
> >>>>>> you have. The number of rows/entries is irrelevant. Closing and
> >>>>>>
> >>>>> deleting
> >>>>
> >>>>> a
> >>>>>
> >>>>>> region is a relatively fast operation.
> >>>>>>
> >>>>>> Can you please confirm: are you trying to delete the entire table or
> >>>>>>
> >>>>> are
> >>>>
> >>>>> you trying to delete the *contents* of a table?
> >>>>>>
> >>>>>> If it is the former, I stand by my "you're worried about nothing"
> >>>>>>
> >>>>> comment
> >>>>
> >>>>> :)
> >>>>>>
> >>>>>>
> >>>>>> Ganesh Viswanathan wrote:
> >>>>>>
> >>>>>> Thanks Josh.
> >>>>>>>
> >>>>>>> Also, I realized I didnt give the full size of the table. It takes
> >>>>>>>
> >>>>>> in
> >>>
> >>>> ~75million rows per minute and stores for 15days. So around
> >>>>>>>
> >>>>>> 1.125billion
> >>>>
> >>>>> rows total.
> >>>>>>>
> >>>>>>> On Fri, Feb 3, 2017 at 12:52 PM, Josh Elser<el...@apache.org>
> >>>>>>>
> >>>>>> wrote:
> >>>>
> >>>>> I think you are worried about nothing, Ganesh.
> >>>>>>>
> >>>>>>>> If you want to drop (delete) the entire table, just disable and
> >>>>>>>>
> >>>>>>> drop
> >>>
> >>>> it
> >>>>
> >>>>> from the shell. This operation is not going to have a significant
> >>>>>>>>
> >>>>>>> impact
> >>>>>
> >>>>>> on
> >>>>>>>> your cluster (save a few flush'es). This would only happen if you
> >>>>>>>>
> >>>>>>> have
> >>>>
> >>>>> had
> >>>>>>>> recent writes to this table (which seems unlikely if you want to
> >>>>>>>>
> >>>>>>> drop
> >>>
> >>>> it).
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Ganesh Viswanathan wrote:
> >>>>>>>>
> >>>>>>>> Hello,
> >>>>>>>>
> >>>>>>>>> I need to drop an old HBase table that is quite large. It has
> >>>>>>>>>
> >>>>>>>> anywhere
> >>>>
> >>>>> between 2million and 70million datapoints. I turned off the count
> >>>>>>>>>
> >>>>>>>> after
> >>>>>
> >>>>>> it
> >>>>>>>>> ran on the HBase shell for half a day. I have 4 other tables that
> >>>>>>>>>
> >>>>>>>> have
> >>>>
> >>>>> around 75million rows in total and also take heavy PUT and GET
> >>>>>>>>>
> >>>>>>>> traffic.
> >>>>>
> >>>>>> What is the best practice for disabling and dropping such a large
> >>>>>>>>>
> >>>>>>>> table
> >>>>>
> >>>>>> in
> >>>>>>>>> HBase so that I have minimal impact on the rest of the cluster?
> >>>>>>>>> 1) I hear there are ways to disable (and drop?) specific regions?
> >>>>>>>>>
> >>>>>>>> Would
> >>>>>
> >>>>>> that work?
> >>>>>>>>> 2) Should I scan and delete a few rows at a time until the size
> >>>>>>>>>
> >>>>>>>> becomes
> >>>>>
> >>>>>> manageable and then disable/drop the table?
> >>>>>>>>>      If so, what is a good number of rows to delete at a time,
> >>>>>>>>>
> >>>>>>>> should I
> >>>>
> >>>>> run
> >>>>>>>>> a
> >>>>>>>>> major compaction after these row deletes on specific regions, and
> >>>>>>>>>
> >>>>>>>> what
> >>>>
> >>>>> is
> >>>>>>>>> a
> >>>>>>>>> good sized table that can be easily dropped (and has been
> >>>>>>>>>
> >>>>>>>> validated)
> >>>
> >>>> without causing issues on the larger cluster.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks!
> >>>>>>>>> Ganesh
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>
>

Re: Dropping a very large table - 75million rows

Posted by Ganesh Viswanathan <ga...@gmail.com>.
Thanks Ted/Josh.

Ted-
I store historical metrics on the locality of regions in each regionserver
in the cluster. I noticed that the old table had many regions with low
locality before the drop while the other newer tables had very few cases of
low locality. After the drop, the new table's regions showed a large drop
in locality.

Josh-
I did go from 2200regions before the drop to about 540regions. So,
relatively speaking, yes it could be that the old table had more regions
with overall higher locality (though it showed up as more regions in each
regionserver with lower locality).

At what point (in terms of put/get load and row count/storage size in
HBase), does it make sense to dive into the data-locality settings and try
to tune it so that compaction and locality changes are more predictable?
Would looking at how to create regions (instead of auto-sharding) provide
greater benefit?

Thanks!
Ganesh


On Thu, Feb 9, 2017 at 11:51 AM, Josh Elser <el...@apache.org> wrote:

> It could be that the table you dropped had a very good locality while the
> other tables had less. So, your overall locality went down (when the "good"
> locality regions were no longer included). This wouldn't have affected your
> system's performance because the locality for the table didn't change --
> just the system-wide locality.
>
>
> Ted Yu wrote:
>
>> bq. The locality of regions for OTHER tables on the same regionserver also
>> fell drastically
>>
>> Can you be a bit more specific on how you came to the above conclusion ?
>> Dropping one table shouldn't affect locality of other tables - unless
>> number of regions on each server becomes unbalanced which triggers
>> balancer
>> activities.
>>
>> Thanks
>>
>> On Thu, Feb 9, 2017 at 7:34 AM, Ganesh Viswanathan<ga...@gmail.com>
>> wrote:
>>
>> So here is what I observed.
>>> Dropping this large table had an immediate effect on average locality for
>>> the entire cluster. The locality of regions for OTHER tables on the same
>>> regionserver also fell drastically in the cluster. This was unexpected (I
>>> only thought locality of regions for the dropped table would be
>>> impacted).
>>> Is this because of compaction? Does the locality computation use the size
>>> of other regions on each regionserver?
>>>
>>> The large drop in locality, however, did not cause latency issues on read
>>> writes for the other tables. Why is that? Is it because I did not try to
>>> hit all low locality regions?
>>>
>>> (On another note, I was able to test and perform deletions on per region
>>> basis, but that requires hbck -repair and it seemed more invasive on the
>>> entire cluster health.)
>>>
>>> Thanks,
>>> Ganesh
>>>
>>>
>>> On Sat, Feb 4, 2017 at 11:20 AM Josh Elser<el...@apache.org>  wrote:
>>>
>>> Ganesh,
>>>>
>>>> Just drop the table. You are worried about nothing.
>>>>
>>>> On Feb 3, 2017 16:51, "Ganesh Viswanathan"<ga...@gmail.com>  wrote:
>>>>
>>>> Hello Josh-
>>>>>
>>>>> I am trying to delete the entire table and recover the disk space. I do
>>>>>
>>>> not
>>>>
>>>>> need to pick specific contents of the table (if thats what you are
>>>>>
>>>> asking
>>>
>>>> with #2).
>>>>> My question is would disabling and dropping such a large table affect
>>>>>
>>>> data
>>>>
>>>>> locality in a bad way, or slow down the cluster when major_compaction
>>>>>
>>>> (or
>>>
>>>> whatever cleans up the tombstoned rows) happens. I also read from
>>>>>
>>>> another
>>>
>>>> post that it can spawn zookeeper transactions and even lock the
>>>>>
>>>> zookeeper
>>>
>>>> nodes. Is there any concern around zookeeper functionality when
>>>>>
>>>> dropping
>>>
>>>> large HBase tables.
>>>>>
>>>>> Thanks again for taking the time to respond to my questions!
>>>>>
>>>>> Ganesh
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Feb 3, 2017 at 1:12 PM, Josh Elser<el...@apache.org>  wrote:
>>>>>
>>>>> Ganesh -- I was trying to get at maybe there is a terminology issue
>>>>>>
>>>>> here.
>>>>
>>>>> If you disable+drop the table, this is an operation on the order of
>>>>>>
>>>>> Regions
>>>>>
>>>>>> you have. The number of rows/entries is irrelevant. Closing and
>>>>>>
>>>>> deleting
>>>>
>>>>> a
>>>>>
>>>>>> region is a relatively fast operation.
>>>>>>
>>>>>> Can you please confirm: are you trying to delete the entire table or
>>>>>>
>>>>> are
>>>>
>>>>> you trying to delete the *contents* of a table?
>>>>>>
>>>>>> If it is the former, I stand by my "you're worried about nothing"
>>>>>>
>>>>> comment
>>>>
>>>>> :)
>>>>>>
>>>>>>
>>>>>> Ganesh Viswanathan wrote:
>>>>>>
>>>>>> Thanks Josh.
>>>>>>>
>>>>>>> Also, I realized I didnt give the full size of the table. It takes
>>>>>>>
>>>>>> in
>>>
>>>> ~75million rows per minute and stores for 15days. So around
>>>>>>>
>>>>>> 1.125billion
>>>>
>>>>> rows total.
>>>>>>>
>>>>>>> On Fri, Feb 3, 2017 at 12:52 PM, Josh Elser<el...@apache.org>
>>>>>>>
>>>>>> wrote:
>>>>
>>>>> I think you are worried about nothing, Ganesh.
>>>>>>>
>>>>>>>> If you want to drop (delete) the entire table, just disable and
>>>>>>>>
>>>>>>> drop
>>>
>>>> it
>>>>
>>>>> from the shell. This operation is not going to have a significant
>>>>>>>>
>>>>>>> impact
>>>>>
>>>>>> on
>>>>>>>> your cluster (save a few flush'es). This would only happen if you
>>>>>>>>
>>>>>>> have
>>>>
>>>>> had
>>>>>>>> recent writes to this table (which seems unlikely if you want to
>>>>>>>>
>>>>>>> drop
>>>
>>>> it).
>>>>>>>>
>>>>>>>>
>>>>>>>> Ganesh Viswanathan wrote:
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>>> I need to drop an old HBase table that is quite large. It has
>>>>>>>>>
>>>>>>>> anywhere
>>>>
>>>>> between 2million and 70million datapoints. I turned off the count
>>>>>>>>>
>>>>>>>> after
>>>>>
>>>>>> it
>>>>>>>>> ran on the HBase shell for half a day. I have 4 other tables that
>>>>>>>>>
>>>>>>>> have
>>>>
>>>>> around 75million rows in total and also take heavy PUT and GET
>>>>>>>>>
>>>>>>>> traffic.
>>>>>
>>>>>> What is the best practice for disabling and dropping such a large
>>>>>>>>>
>>>>>>>> table
>>>>>
>>>>>> in
>>>>>>>>> HBase so that I have minimal impact on the rest of the cluster?
>>>>>>>>> 1) I hear there are ways to disable (and drop?) specific regions?
>>>>>>>>>
>>>>>>>> Would
>>>>>
>>>>>> that work?
>>>>>>>>> 2) Should I scan and delete a few rows at a time until the size
>>>>>>>>>
>>>>>>>> becomes
>>>>>
>>>>>> manageable and then disable/drop the table?
>>>>>>>>>      If so, what is a good number of rows to delete at a time,
>>>>>>>>>
>>>>>>>> should I
>>>>
>>>>> run
>>>>>>>>> a
>>>>>>>>> major compaction after these row deletes on specific regions, and
>>>>>>>>>
>>>>>>>> what
>>>>
>>>>> is
>>>>>>>>> a
>>>>>>>>> good sized table that can be easily dropped (and has been
>>>>>>>>>
>>>>>>>> validated)
>>>
>>>> without causing issues on the larger cluster.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>> Ganesh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>

Re: Dropping a very large table - 75million rows

Posted by Josh Elser <el...@apache.org>.
It could be that the table you dropped had a very good locality while 
the other tables had less. So, your overall locality went down (when the 
"good" locality regions were no longer included). This wouldn't have 
affected your system's performance because the locality for the table 
didn't change -- just the system-wide locality.

Ted Yu wrote:
> bq. The locality of regions for OTHER tables on the same regionserver also
> fell drastically
>
> Can you be a bit more specific on how you came to the above conclusion ?
> Dropping one table shouldn't affect locality of other tables - unless
> number of regions on each server becomes unbalanced which triggers balancer
> activities.
>
> Thanks
>
> On Thu, Feb 9, 2017 at 7:34 AM, Ganesh Viswanathan<ga...@gmail.com>  wrote:
>
>> So here is what I observed.
>> Dropping this large table had an immediate effect on average locality for
>> the entire cluster. The locality of regions for OTHER tables on the same
>> regionserver also fell drastically in the cluster. This was unexpected (I
>> only thought locality of regions for the dropped table would be impacted).
>> Is this because of compaction? Does the locality computation use the size
>> of other regions on each regionserver?
>>
>> The large drop in locality, however, did not cause latency issues on read
>> writes for the other tables. Why is that? Is it because I did not try to
>> hit all low locality regions?
>>
>> (On another note, I was able to test and perform deletions on per region
>> basis, but that requires hbck -repair and it seemed more invasive on the
>> entire cluster health.)
>>
>> Thanks,
>> Ganesh
>>
>>
>> On Sat, Feb 4, 2017 at 11:20 AM Josh Elser<el...@apache.org>  wrote:
>>
>>> Ganesh,
>>>
>>> Just drop the table. You are worried about nothing.
>>>
>>> On Feb 3, 2017 16:51, "Ganesh Viswanathan"<ga...@gmail.com>  wrote:
>>>
>>>> Hello Josh-
>>>>
>>>> I am trying to delete the entire table and recover the disk space. I do
>>> not
>>>> need to pick specific contents of the table (if thats what you are
>> asking
>>>> with #2).
>>>> My question is would disabling and dropping such a large table affect
>>> data
>>>> locality in a bad way, or slow down the cluster when major_compaction
>> (or
>>>> whatever cleans up the tombstoned rows) happens. I also read from
>> another
>>>> post that it can spawn zookeeper transactions and even lock the
>> zookeeper
>>>> nodes. Is there any concern around zookeeper functionality when
>> dropping
>>>> large HBase tables.
>>>>
>>>> Thanks again for taking the time to respond to my questions!
>>>>
>>>> Ganesh
>>>>
>>>>
>>>>
>>>> On Fri, Feb 3, 2017 at 1:12 PM, Josh Elser<el...@apache.org>  wrote:
>>>>
>>>>> Ganesh -- I was trying to get at maybe there is a terminology issue
>>> here.
>>>>> If you disable+drop the table, this is an operation on the order of
>>>> Regions
>>>>> you have. The number of rows/entries is irrelevant. Closing and
>>> deleting
>>>> a
>>>>> region is a relatively fast operation.
>>>>>
>>>>> Can you please confirm: are you trying to delete the entire table or
>>> are
>>>>> you trying to delete the *contents* of a table?
>>>>>
>>>>> If it is the former, I stand by my "you're worried about nothing"
>>> comment
>>>>> :)
>>>>>
>>>>>
>>>>> Ganesh Viswanathan wrote:
>>>>>
>>>>>> Thanks Josh.
>>>>>>
>>>>>> Also, I realized I didnt give the full size of the table. It takes
>> in
>>>>>> ~75million rows per minute and stores for 15days. So around
>>> 1.125billion
>>>>>> rows total.
>>>>>>
>>>>>> On Fri, Feb 3, 2017 at 12:52 PM, Josh Elser<el...@apache.org>
>>> wrote:
>>>>>> I think you are worried about nothing, Ganesh.
>>>>>>> If you want to drop (delete) the entire table, just disable and
>> drop
>>> it
>>>>>>> from the shell. This operation is not going to have a significant
>>>> impact
>>>>>>> on
>>>>>>> your cluster (save a few flush'es). This would only happen if you
>>> have
>>>>>>> had
>>>>>>> recent writes to this table (which seems unlikely if you want to
>> drop
>>>>>>> it).
>>>>>>>
>>>>>>>
>>>>>>> Ganesh Viswanathan wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>> I need to drop an old HBase table that is quite large. It has
>>> anywhere
>>>>>>>> between 2million and 70million datapoints. I turned off the count
>>>> after
>>>>>>>> it
>>>>>>>> ran on the HBase shell for half a day. I have 4 other tables that
>>> have
>>>>>>>> around 75million rows in total and also take heavy PUT and GET
>>>> traffic.
>>>>>>>> What is the best practice for disabling and dropping such a large
>>>> table
>>>>>>>> in
>>>>>>>> HBase so that I have minimal impact on the rest of the cluster?
>>>>>>>> 1) I hear there are ways to disable (and drop?) specific regions?
>>>> Would
>>>>>>>> that work?
>>>>>>>> 2) Should I scan and delete a few rows at a time until the size
>>>> becomes
>>>>>>>> manageable and then disable/drop the table?
>>>>>>>>      If so, what is a good number of rows to delete at a time,
>>> should I
>>>>>>>> run
>>>>>>>> a
>>>>>>>> major compaction after these row deletes on specific regions, and
>>> what
>>>>>>>> is
>>>>>>>> a
>>>>>>>> good sized table that can be easily dropped (and has been
>> validated)
>>>>>>>> without causing issues on the larger cluster.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> Ganesh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>

Re: Dropping a very large table - 75million rows

Posted by Ted Yu <yu...@gmail.com>.
bq. The locality of regions for OTHER tables on the same regionserver also
fell drastically

Can you be a bit more specific on how you came to the above conclusion ?
Dropping one table shouldn't affect locality of other tables - unless
number of regions on each server becomes unbalanced which triggers balancer
activities.

Thanks

On Thu, Feb 9, 2017 at 7:34 AM, Ganesh Viswanathan <ga...@gmail.com> wrote:

> So here is what I observed.
> Dropping this large table had an immediate effect on average locality for
> the entire cluster. The locality of regions for OTHER tables on the same
> regionserver also fell drastically in the cluster. This was unexpected (I
> only thought locality of regions for the dropped table would be impacted).
> Is this because of compaction? Does the locality computation use the size
> of other regions on each regionserver?
>
> The large drop in locality, however, did not cause latency issues on read
> writes for the other tables. Why is that? Is it because I did not try to
> hit all low locality regions?
>
> (On another note, I was able to test and perform deletions on per region
> basis, but that requires hbck -repair and it seemed more invasive on the
> entire cluster health.)
>
> Thanks,
> Ganesh
>
>
> On Sat, Feb 4, 2017 at 11:20 AM Josh Elser <el...@apache.org> wrote:
>
> > Ganesh,
> >
> > Just drop the table. You are worried about nothing.
> >
> > On Feb 3, 2017 16:51, "Ganesh Viswanathan" <ga...@gmail.com> wrote:
> >
> > > Hello Josh-
> > >
> > > I am trying to delete the entire table and recover the disk space. I do
> > not
> > > need to pick specific contents of the table (if thats what you are
> asking
> > > with #2).
> > > My question is would disabling and dropping such a large table affect
> > data
> > > locality in a bad way, or slow down the cluster when major_compaction
> (or
> > > whatever cleans up the tombstoned rows) happens. I also read from
> another
> > > post that it can spawn zookeeper transactions and even lock the
> zookeeper
> > > nodes. Is there any concern around zookeeper functionality when
> dropping
> > > large HBase tables.
> > >
> > > Thanks again for taking the time to respond to my questions!
> > >
> > > Ganesh
> > >
> > >
> > >
> > > On Fri, Feb 3, 2017 at 1:12 PM, Josh Elser <el...@apache.org> wrote:
> > >
> > > > Ganesh -- I was trying to get at maybe there is a terminology issue
> > here.
> > > > If you disable+drop the table, this is an operation on the order of
> > > Regions
> > > > you have. The number of rows/entries is irrelevant. Closing and
> > deleting
> > > a
> > > > region is a relatively fast operation.
> > > >
> > > > Can you please confirm: are you trying to delete the entire table or
> > are
> > > > you trying to delete the *contents* of a table?
> > > >
> > > > If it is the former, I stand by my "you're worried about nothing"
> > comment
> > > > :)
> > > >
> > > >
> > > > Ganesh Viswanathan wrote:
> > > >
> > > >> Thanks Josh.
> > > >>
> > > >> Also, I realized I didnt give the full size of the table. It takes
> in
> > > >> ~75million rows per minute and stores for 15days. So around
> > 1.125billion
> > > >> rows total.
> > > >>
> > > >> On Fri, Feb 3, 2017 at 12:52 PM, Josh Elser<el...@apache.org>
> > wrote:
> > > >>
> > > >> I think you are worried about nothing, Ganesh.
> > > >>>
> > > >>> If you want to drop (delete) the entire table, just disable and
> drop
> > it
> > > >>> from the shell. This operation is not going to have a significant
> > > impact
> > > >>> on
> > > >>> your cluster (save a few flush'es). This would only happen if you
> > have
> > > >>> had
> > > >>> recent writes to this table (which seems unlikely if you want to
> drop
> > > >>> it).
> > > >>>
> > > >>>
> > > >>> Ganesh Viswanathan wrote:
> > > >>>
> > > >>> Hello,
> > > >>>>
> > > >>>> I need to drop an old HBase table that is quite large. It has
> > anywhere
> > > >>>> between 2million and 70million datapoints. I turned off the count
> > > after
> > > >>>> it
> > > >>>> ran on the HBase shell for half a day. I have 4 other tables that
> > have
> > > >>>> around 75million rows in total and also take heavy PUT and GET
> > > traffic.
> > > >>>>
> > > >>>> What is the best practice for disabling and dropping such a large
> > > table
> > > >>>> in
> > > >>>> HBase so that I have minimal impact on the rest of the cluster?
> > > >>>> 1) I hear there are ways to disable (and drop?) specific regions?
> > > Would
> > > >>>> that work?
> > > >>>> 2) Should I scan and delete a few rows at a time until the size
> > > becomes
> > > >>>> manageable and then disable/drop the table?
> > > >>>>     If so, what is a good number of rows to delete at a time,
> > should I
> > > >>>> run
> > > >>>> a
> > > >>>> major compaction after these row deletes on specific regions, and
> > what
> > > >>>> is
> > > >>>> a
> > > >>>> good sized table that can be easily dropped (and has been
> validated)
> > > >>>> without causing issues on the larger cluster.
> > > >>>>
> > > >>>>
> > > >>>> Thanks!
> > > >>>> Ganesh
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>
> > >
> >
>

Re: Dropping a very large table - 75million rows

Posted by Ganesh Viswanathan <ga...@gmail.com>.
So here is what I observed.
Dropping this large table had an immediate effect on average locality for
the entire cluster. The locality of regions for OTHER tables on the same
regionserver also fell drastically in the cluster. This was unexpected (I
only thought locality of regions for the dropped table would be impacted).
Is this because of compaction? Does the locality computation use the size
of other regions on each regionserver?

The large drop in locality, however, did not cause latency issues on read
writes for the other tables. Why is that? Is it because I did not try to
hit all low locality regions?

(On another note, I was able to test and perform deletions on per region
basis, but that requires hbck -repair and it seemed more invasive on the
entire cluster health.)

Thanks,
Ganesh


On Sat, Feb 4, 2017 at 11:20 AM Josh Elser <el...@apache.org> wrote:

> Ganesh,
>
> Just drop the table. You are worried about nothing.
>
> On Feb 3, 2017 16:51, "Ganesh Viswanathan" <ga...@gmail.com> wrote:
>
> > Hello Josh-
> >
> > I am trying to delete the entire table and recover the disk space. I do
> not
> > need to pick specific contents of the table (if thats what you are asking
> > with #2).
> > My question is would disabling and dropping such a large table affect
> data
> > locality in a bad way, or slow down the cluster when major_compaction (or
> > whatever cleans up the tombstoned rows) happens. I also read from another
> > post that it can spawn zookeeper transactions and even lock the zookeeper
> > nodes. Is there any concern around zookeeper functionality when dropping
> > large HBase tables.
> >
> > Thanks again for taking the time to respond to my questions!
> >
> > Ganesh
> >
> >
> >
> > On Fri, Feb 3, 2017 at 1:12 PM, Josh Elser <el...@apache.org> wrote:
> >
> > > Ganesh -- I was trying to get at maybe there is a terminology issue
> here.
> > > If you disable+drop the table, this is an operation on the order of
> > Regions
> > > you have. The number of rows/entries is irrelevant. Closing and
> deleting
> > a
> > > region is a relatively fast operation.
> > >
> > > Can you please confirm: are you trying to delete the entire table or
> are
> > > you trying to delete the *contents* of a table?
> > >
> > > If it is the former, I stand by my "you're worried about nothing"
> comment
> > > :)
> > >
> > >
> > > Ganesh Viswanathan wrote:
> > >
> > >> Thanks Josh.
> > >>
> > >> Also, I realized I didnt give the full size of the table. It takes in
> > >> ~75million rows per minute and stores for 15days. So around
> 1.125billion
> > >> rows total.
> > >>
> > >> On Fri, Feb 3, 2017 at 12:52 PM, Josh Elser<el...@apache.org>
> wrote:
> > >>
> > >> I think you are worried about nothing, Ganesh.
> > >>>
> > >>> If you want to drop (delete) the entire table, just disable and drop
> it
> > >>> from the shell. This operation is not going to have a significant
> > impact
> > >>> on
> > >>> your cluster (save a few flush'es). This would only happen if you
> have
> > >>> had
> > >>> recent writes to this table (which seems unlikely if you want to drop
> > >>> it).
> > >>>
> > >>>
> > >>> Ganesh Viswanathan wrote:
> > >>>
> > >>> Hello,
> > >>>>
> > >>>> I need to drop an old HBase table that is quite large. It has
> anywhere
> > >>>> between 2million and 70million datapoints. I turned off the count
> > after
> > >>>> it
> > >>>> ran on the HBase shell for half a day. I have 4 other tables that
> have
> > >>>> around 75million rows in total and also take heavy PUT and GET
> > traffic.
> > >>>>
> > >>>> What is the best practice for disabling and dropping such a large
> > table
> > >>>> in
> > >>>> HBase so that I have minimal impact on the rest of the cluster?
> > >>>> 1) I hear there are ways to disable (and drop?) specific regions?
> > Would
> > >>>> that work?
> > >>>> 2) Should I scan and delete a few rows at a time until the size
> > becomes
> > >>>> manageable and then disable/drop the table?
> > >>>>     If so, what is a good number of rows to delete at a time,
> should I
> > >>>> run
> > >>>> a
> > >>>> major compaction after these row deletes on specific regions, and
> what
> > >>>> is
> > >>>> a
> > >>>> good sized table that can be easily dropped (and has been validated)
> > >>>> without causing issues on the larger cluster.
> > >>>>
> > >>>>
> > >>>> Thanks!
> > >>>> Ganesh
> > >>>>
> > >>>>
> > >>>>
> > >>
> >
>

Re: Dropping a very large table - 75million rows

Posted by Josh Elser <el...@apache.org>.
Ganesh,

Just drop the table. You are worried about nothing.

On Feb 3, 2017 16:51, "Ganesh Viswanathan" <ga...@gmail.com> wrote:

> Hello Josh-
>
> I am trying to delete the entire table and recover the disk space. I do not
> need to pick specific contents of the table (if thats what you are asking
> with #2).
> My question is would disabling and dropping such a large table affect data
> locality in a bad way, or slow down the cluster when major_compaction (or
> whatever cleans up the tombstoned rows) happens. I also read from another
> post that it can spawn zookeeper transactions and even lock the zookeeper
> nodes. Is there any concern around zookeeper functionality when dropping
> large HBase tables.
>
> Thanks again for taking the time to respond to my questions!
>
> Ganesh
>
>
>
> On Fri, Feb 3, 2017 at 1:12 PM, Josh Elser <el...@apache.org> wrote:
>
> > Ganesh -- I was trying to get at maybe there is a terminology issue here.
> > If you disable+drop the table, this is an operation on the order of
> Regions
> > you have. The number of rows/entries is irrelevant. Closing and deleting
> a
> > region is a relatively fast operation.
> >
> > Can you please confirm: are you trying to delete the entire table or are
> > you trying to delete the *contents* of a table?
> >
> > If it is the former, I stand by my "you're worried about nothing" comment
> > :)
> >
> >
> > Ganesh Viswanathan wrote:
> >
> >> Thanks Josh.
> >>
> >> Also, I realized I didnt give the full size of the table. It takes in
> >> ~75million rows per minute and stores for 15days. So around 1.125billion
> >> rows total.
> >>
> >> On Fri, Feb 3, 2017 at 12:52 PM, Josh Elser<el...@apache.org>  wrote:
> >>
> >> I think you are worried about nothing, Ganesh.
> >>>
> >>> If you want to drop (delete) the entire table, just disable and drop it
> >>> from the shell. This operation is not going to have a significant
> impact
> >>> on
> >>> your cluster (save a few flush'es). This would only happen if you have
> >>> had
> >>> recent writes to this table (which seems unlikely if you want to drop
> >>> it).
> >>>
> >>>
> >>> Ganesh Viswanathan wrote:
> >>>
> >>> Hello,
> >>>>
> >>>> I need to drop an old HBase table that is quite large. It has anywhere
> >>>> between 2million and 70million datapoints. I turned off the count
> after
> >>>> it
> >>>> ran on the HBase shell for half a day. I have 4 other tables that have
> >>>> around 75million rows in total and also take heavy PUT and GET
> traffic.
> >>>>
> >>>> What is the best practice for disabling and dropping such a large
> table
> >>>> in
> >>>> HBase so that I have minimal impact on the rest of the cluster?
> >>>> 1) I hear there are ways to disable (and drop?) specific regions?
> Would
> >>>> that work?
> >>>> 2) Should I scan and delete a few rows at a time until the size
> becomes
> >>>> manageable and then disable/drop the table?
> >>>>     If so, what is a good number of rows to delete at a time, should I
> >>>> run
> >>>> a
> >>>> major compaction after these row deletes on specific regions, and what
> >>>> is
> >>>> a
> >>>> good sized table that can be easily dropped (and has been validated)
> >>>> without causing issues on the larger cluster.
> >>>>
> >>>>
> >>>> Thanks!
> >>>> Ganesh
> >>>>
> >>>>
> >>>>
> >>
>

Re: Dropping a very large table - 75million rows

Posted by Ganesh Viswanathan <ga...@gmail.com>.
Hello Josh-

I am trying to delete the entire table and recover the disk space. I do not
need to pick specific contents of the table (if thats what you are asking
with #2).
My question is would disabling and dropping such a large table affect data
locality in a bad way, or slow down the cluster when major_compaction (or
whatever cleans up the tombstoned rows) happens. I also read from another
post that it can spawn zookeeper transactions and even lock the zookeeper
nodes. Is there any concern around zookeeper functionality when dropping
large HBase tables.

Thanks again for taking the time to respond to my questions!

Ganesh



On Fri, Feb 3, 2017 at 1:12 PM, Josh Elser <el...@apache.org> wrote:

> Ganesh -- I was trying to get at maybe there is a terminology issue here.
> If you disable+drop the table, this is an operation on the order of Regions
> you have. The number of rows/entries is irrelevant. Closing and deleting a
> region is a relatively fast operation.
>
> Can you please confirm: are you trying to delete the entire table or are
> you trying to delete the *contents* of a table?
>
> If it is the former, I stand by my "you're worried about nothing" comment
> :)
>
>
> Ganesh Viswanathan wrote:
>
>> Thanks Josh.
>>
>> Also, I realized I didnt give the full size of the table. It takes in
>> ~75million rows per minute and stores for 15days. So around 1.125billion
>> rows total.
>>
>> On Fri, Feb 3, 2017 at 12:52 PM, Josh Elser<el...@apache.org>  wrote:
>>
>> I think you are worried about nothing, Ganesh.
>>>
>>> If you want to drop (delete) the entire table, just disable and drop it
>>> from the shell. This operation is not going to have a significant impact
>>> on
>>> your cluster (save a few flush'es). This would only happen if you have
>>> had
>>> recent writes to this table (which seems unlikely if you want to drop
>>> it).
>>>
>>>
>>> Ganesh Viswanathan wrote:
>>>
>>> Hello,
>>>>
>>>> I need to drop an old HBase table that is quite large. It has anywhere
>>>> between 2million and 70million datapoints. I turned off the count after
>>>> it
>>>> ran on the HBase shell for half a day. I have 4 other tables that have
>>>> around 75million rows in total and also take heavy PUT and GET traffic.
>>>>
>>>> What is the best practice for disabling and dropping such a large table
>>>> in
>>>> HBase so that I have minimal impact on the rest of the cluster?
>>>> 1) I hear there are ways to disable (and drop?) specific regions? Would
>>>> that work?
>>>> 2) Should I scan and delete a few rows at a time until the size becomes
>>>> manageable and then disable/drop the table?
>>>>     If so, what is a good number of rows to delete at a time, should I
>>>> run
>>>> a
>>>> major compaction after these row deletes on specific regions, and what
>>>> is
>>>> a
>>>> good sized table that can be easily dropped (and has been validated)
>>>> without causing issues on the larger cluster.
>>>>
>>>>
>>>> Thanks!
>>>> Ganesh
>>>>
>>>>
>>>>
>>

Re: Dropping a very large table - 75million rows

Posted by Josh Elser <el...@apache.org>.
Ganesh -- I was trying to get at maybe there is a terminology issue 
here. If you disable+drop the table, this is an operation on the order 
of Regions you have. The number of rows/entries is irrelevant. Closing 
and deleting a region is a relatively fast operation.

Can you please confirm: are you trying to delete the entire table or are 
you trying to delete the *contents* of a table?

If it is the former, I stand by my "you're worried about nothing" comment :)

Ganesh Viswanathan wrote:
> Thanks Josh.
>
> Also, I realized I didnt give the full size of the table. It takes in
> ~75million rows per minute and stores for 15days. So around 1.125billion
> rows total.
>
> On Fri, Feb 3, 2017 at 12:52 PM, Josh Elser<el...@apache.org>  wrote:
>
>> I think you are worried about nothing, Ganesh.
>>
>> If you want to drop (delete) the entire table, just disable and drop it
>> from the shell. This operation is not going to have a significant impact on
>> your cluster (save a few flush'es). This would only happen if you have had
>> recent writes to this table (which seems unlikely if you want to drop it).
>>
>>
>> Ganesh Viswanathan wrote:
>>
>>> Hello,
>>>
>>> I need to drop an old HBase table that is quite large. It has anywhere
>>> between 2million and 70million datapoints. I turned off the count after it
>>> ran on the HBase shell for half a day. I have 4 other tables that have
>>> around 75million rows in total and also take heavy PUT and GET traffic.
>>>
>>> What is the best practice for disabling and dropping such a large table in
>>> HBase so that I have minimal impact on the rest of the cluster?
>>> 1) I hear there are ways to disable (and drop?) specific regions? Would
>>> that work?
>>> 2) Should I scan and delete a few rows at a time until the size becomes
>>> manageable and then disable/drop the table?
>>>     If so, what is a good number of rows to delete at a time, should I run
>>> a
>>> major compaction after these row deletes on specific regions, and what is
>>> a
>>> good sized table that can be easily dropped (and has been validated)
>>> without causing issues on the larger cluster.
>>>
>>>
>>> Thanks!
>>> Ganesh
>>>
>>>
>

Re: Dropping a very large table - 75million rows

Posted by Ganesh Viswanathan <ga...@gmail.com>.
Thanks Josh.

Also, I realized I didnt give the full size of the table. It takes in
~75million rows per minute and stores for 15days. So around 1.125billion
rows total.

On Fri, Feb 3, 2017 at 12:52 PM, Josh Elser <el...@apache.org> wrote:

> I think you are worried about nothing, Ganesh.
>
> If you want to drop (delete) the entire table, just disable and drop it
> from the shell. This operation is not going to have a significant impact on
> your cluster (save a few flush'es). This would only happen if you have had
> recent writes to this table (which seems unlikely if you want to drop it).
>
>
> Ganesh Viswanathan wrote:
>
>> Hello,
>>
>> I need to drop an old HBase table that is quite large. It has anywhere
>> between 2million and 70million datapoints. I turned off the count after it
>> ran on the HBase shell for half a day. I have 4 other tables that have
>> around 75million rows in total and also take heavy PUT and GET traffic.
>>
>> What is the best practice for disabling and dropping such a large table in
>> HBase so that I have minimal impact on the rest of the cluster?
>> 1) I hear there are ways to disable (and drop?) specific regions? Would
>> that work?
>> 2) Should I scan and delete a few rows at a time until the size becomes
>> manageable and then disable/drop the table?
>>    If so, what is a good number of rows to delete at a time, should I run
>> a
>> major compaction after these row deletes on specific regions, and what is
>> a
>> good sized table that can be easily dropped (and has been validated)
>> without causing issues on the larger cluster.
>>
>>
>> Thanks!
>> Ganesh
>>
>>

Re: Dropping a very large table - 75million rows

Posted by Josh Elser <el...@apache.org>.
I think you are worried about nothing, Ganesh.

If you want to drop (delete) the entire table, just disable and drop it 
from the shell. This operation is not going to have a significant impact 
on your cluster (save a few flush'es). This would only happen if you 
have had recent writes to this table (which seems unlikely if you want 
to drop it).

Ganesh Viswanathan wrote:
> Hello,
>
> I need to drop an old HBase table that is quite large. It has anywhere
> between 2million and 70million datapoints. I turned off the count after it
> ran on the HBase shell for half a day. I have 4 other tables that have
> around 75million rows in total and also take heavy PUT and GET traffic.
>
> What is the best practice for disabling and dropping such a large table in
> HBase so that I have minimal impact on the rest of the cluster?
> 1) I hear there are ways to disable (and drop?) specific regions? Would
> that work?
> 2) Should I scan and delete a few rows at a time until the size becomes
> manageable and then disable/drop the table?
>    If so, what is a good number of rows to delete at a time, should I run a
> major compaction after these row deletes on specific regions, and what is a
> good sized table that can be easily dropped (and has been validated)
> without causing issues on the larger cluster.
>
>
> Thanks!
> Ganesh
>