You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Hai Pham <ht...@tigermail.auburn.edu> on 2015/07/31 00:52:23 UTC

How to control Minor Compaction by programming

Hi,


Please share with me is there any way that we can init / control the Minor Compaction by programming (not from the shell). My situation is when I ingest a large data using the BatchWriter, the minor compaction is triggered uncontrollably. The flush() command in BatchWriter seems not for this purpose.

I also tried to play around with parameters in documentation but seems not much helpful.


Also, can you please explain the number 0, 1.0, 2.0, ... in charts (web monitoring) denoting the level of Minor Compaction and Major Compaction?


Thank you!

Hai Pham




Re: How to control Minor Compaction by programming

Posted by Hai Pham <ht...@tigermail.auburn.edu>.
Thanks Josh and everyone. You did help greatly. 
Hai 
________________________________________
From: Josh Elser <jo...@gmail.com>
Sent: Friday, July 31, 2015 3:09 PM
To: user@accumulo.apache.org
Subject: Re: How to control Minor Compaction by programming

You may benefit from reading the following section:

http://accumulo.apache.org/1.6/accumulo_user_manual.html#_administration_configuration

Specifically the formula for choosing a new size for
tserver.memory.maps.max.

Hai Pham wrote:
> Hi John,
>
>
> For your advice, I will test with other options for number of splits.
>
>
> My native map size is 1G (default). I am also trying increasing it.
> Maybe if the problem is reproduced, I will have more information to
> provide. Thank you!
>
>
> Hai
>
>
> ------------------------------------------------------------------------
> *From:* John Vines <vi...@apache.org>
> *Sent:* Friday, July 31, 2015 11:12 AM
> *To:* user@accumulo.apache.org
> *Subject:* Re: How to control Minor Compaction by programming
> If you have only 4/8 tablets for 4 tservers, you're not really
> parallelizing well.
>
> That doesn't explain a 5 minute hold time though, that is strange. How
> large is your in memory map size?
>
>
> On Fri, Jul 31, 2015 at 11:53 AM Hai Pham <htp0005@tigermail.auburn.edu
> <ma...@tigermail.auburn.edu>> wrote:
>
>     Hi Josh and John,
>
>
>     Correct. Since one of my constraint was the time, I tested with wal
>     flush and wal disabled and the the lost data case happened in wal
>     disabled mode - my mistake for not having described.
>
>
>     I have 1 master + 16 hadoop slaves under Accumulo, all are Centos
>     6.5 physical boxes times at least 500GB, 24G RAM each, but the
>     network is only 1G. DFS replication = 3 by default. I tested with 4
>     and 8 splits, the hold time problem was likely happen more often in
>     4 splits. And you are right, changing flushing scheme got the
>     problem remediated.
>
>
>     Thank you a lot!
>
>     Hai
>
>     ------------------------------------------------------------------------
>     *From:* John Vines <vines@apache.org <ma...@apache.org>>
>     *Sent:* Friday, July 31, 2015 10:29 AM
>     *To:* user@accumulo.apache.org <ma...@accumulo.apache.org>
>
>     *Subject:* Re: How to control Minor Compaction by programming
>     Data could be lost if walogs were disabled or configured to use a
>     poor flushing mechanism.
>
>     However, I'm also concerned about the hold times from a single
>     ingest being enough to bring down a server. What's the environment
>     you're running in? Are these virtualized or real servers? How many
>     splits did you make. How many disks per node do you have? And are
>     you using default hdfs replication?
>
>     On Fri, Jul 31, 2015 at 11:11 AM Josh Elser <josh.elser@gmail.com
>     <ma...@gmail.com>> wrote:
>
>
>         Hai Pham wrote:
>          > Hi Keith,
>          >
>          >
>          > I have 4 tablet servers + 1 master. I also did a pre-split before
>          > ingesting and it increased the speed a lot.
>          >
>          >
>          > And you're right, when I created too many ingest threads,
>         many of them
>          > were on the queue of thread pools and the hold time will
>         increases. In
>          > some intense ingest, there was a case when a tablet was
>         killed by master
>          > for the hold time exceeded 5 min. In this situation, all
>         Tablets were in
>          > stuck. Only after that one is dead, the ingest was back with the
>          > comparable speed. But the entries in dead tablet were all
>         gone and lost
>          > to the table.
>
>         You're saying that you lost data? If a server dies, all of the
>         tablets
>         that were hosted there are reassigned to other servers. This is
>         done in
>         a manner that guarantees that there is no data lost in this
>         transition.
>         If you actually lost data, this would be a critical bug, but I would
>         certainly hope you just didn't realize that the data was
>         automatically
>         being hosted by another server.
>
>          > I have had no idea to repair this except for regulating the
>         number of
>          > ingest threads and speed to make it more friendly to the
>         terminal of
>          > Accumulo itself.
>          >
>          >
>          > Another myth to me is that when I did a pre-split to, e.g. 8
>         tablets.
>          > But along with the ingest operation, the tablet number
>         increases (e.g.
>          > 10, 14 or bigger). Any idea?
>
>         Yep, Accumulo will naturally split tablets when they exceed a
>         certain
>         size (1GB by default for normal tables). Unless you increase the
>         property table.split.threshold, as you ingest more data, you will
>         observe more tablets.
>
>         Given enough time, Accumulo will naturally split your table enough.
>         Pre-splitting quickly gets you to a good level of performance
>         right away.
>
>          >
>          > Hai
>          >
>         ------------------------------------------------------------------------
>          > *From:* Keith Turner <keith@deenlo.com <ma...@deenlo.com>>
>          > *Sent:* Friday, July 31, 2015 8:39 AM
>          > *To:* user@accumulo.apache.org <ma...@accumulo.apache.org>
>          > *Subject:* Re: How to control Minor Compaction by programming
>          > How many tablets do you have? Entire tablets are minor
>         compacted at
>          > once. If you have 1 tablet per tablet server, then minor
>         compactions
>          > will have a lot of work to do at once. While this work is
>         being done,
>          > the tablet servers memory may fill up, leading to writes
>         being held.
>          >
>          > If you have 10 tablets per tablet server, then tablets can be
>         compacted
>          > in parallel w/ less work to do at any given point in time.
>         This can
>          > avoid memory filling up and writes being held.
>          >
>          > In short, its possible that adding good split points to the
>         table (and
>          > therefore creating more tablets) may help w/ this issue.
>          >
>          > Also, are you seeing hold times?
>          >
>          > On Thu, Jul 30, 2015 at 11:24 PM, Hai Pham
>         <htp0005@tigermail.auburn.edu <ma...@tigermail.auburn.edu>
>          > <mailto:htp0005@tigermail.auburn.edu
>         <ma...@tigermail.auburn.edu>>> wrote:
>          >
>          > Hey William, Josh and David,
>          >
>          > Thanks for explaining, I might not have been clear: I used
>         the web
>          > interface with port 50095 to monitor the real-time charts
>         (ingest,
>          > scan, load average, minor compaction, major compaction, ...).
>          >
>          > Nonetheless, as I witnessed, when I ingested about 100k
>         entries ->
>          > then minor compaction happened -> ingest was stuck -> the
>         level of
>          > minor compaction on the charts was just about 1.0, 2.0 and
>         max 3.0
>          > while about >20k entries were forced out of memory (I knew
>         this by
>          > looking at the number of entries in memory w.r.t the table being
>          > ingested to) -> then when minor compaction ended, ingest resumed,
>          > somewhat faster.
>          >
>          > Thus I presume the level 1.0, 2.0, 3.0 is not representative for
>          > number of files being minor-compacted from memory?
>          >
>          > Hai
>          > ________________________________________
>          > From: Josh Elser <josh.elser@gmail.com
>         <ma...@gmail.com> <mailto:josh.elser@gmail.com
>         <ma...@gmail.com>>>
>          > Sent: Thursday, July 30, 2015 7:12 PM
>          > To: user@accumulo.apache.org
>         <ma...@accumulo.apache.org>
>         <mailto:user@accumulo.apache.org <ma...@accumulo.apache.org>>
>          > Subject: Re: How to control Minor Compaction by programming
>          >
>          > >
>          > > Also, can you please explain the number 0, 1.0, 2.0, ... in
>          > charts (web
>          > > monitoring) denoting the level of Minor Compaction and Major
>          > Compaction?
>          >
>          > On the monitor, the number of compactions are of the form:
>          >
>          > active (queued)
>          >
>          > e.g. 4 (2), would mean that 4 are running and 2 are queued.
>          >
>          > >
>          > >
>          > > Thank you!
>          > >
>          > > Hai Pham
>          > >
>          > >
>          > >
>          > >
>          >
>          >
>

Re: How to control Minor Compaction by programming

Posted by Josh Elser <jo...@gmail.com>.
You may benefit from reading the following section:

http://accumulo.apache.org/1.6/accumulo_user_manual.html#_administration_configuration

Specifically the formula for choosing a new size for 
tserver.memory.maps.max.

Hai Pham wrote:
> Hi John,
>
>
> For your advice, I will test with other options for number of splits.
>
>
> My native map size is 1G (default). I am also trying increasing it.
> Maybe if the problem is reproduced, I will have more information to
> provide. Thank you!
>
>
> Hai
>
>
> ------------------------------------------------------------------------
> *From:* John Vines <vi...@apache.org>
> *Sent:* Friday, July 31, 2015 11:12 AM
> *To:* user@accumulo.apache.org
> *Subject:* Re: How to control Minor Compaction by programming
> If you have only 4/8 tablets for 4 tservers, you're not really
> parallelizing well.
>
> That doesn't explain a 5 minute hold time though, that is strange. How
> large is your in memory map size?
>
>
> On Fri, Jul 31, 2015 at 11:53 AM Hai Pham <htp0005@tigermail.auburn.edu
> <ma...@tigermail.auburn.edu>> wrote:
>
>     Hi Josh and John,
>
>
>     Correct. Since one of my constraint was the time, I tested with wal
>     flush and wal disabled and the the lost data case happened in wal
>     disabled mode - my mistake for not having described.
>
>
>     I have 1 master + 16 hadoop slaves under Accumulo, all are Centos
>     6.5 physical boxes times at least 500GB, 24G RAM each, but the
>     network is only 1G. DFS replication = 3 by default. I tested with 4
>     and 8 splits, the hold time problem was likely happen more often in
>     4 splits. And you are right, changing flushing scheme got the
>     problem remediated.
>
>
>     Thank you a lot!
>
>     Hai
>
>     ------------------------------------------------------------------------
>     *From:* John Vines <vines@apache.org <ma...@apache.org>>
>     *Sent:* Friday, July 31, 2015 10:29 AM
>     *To:* user@accumulo.apache.org <ma...@accumulo.apache.org>
>
>     *Subject:* Re: How to control Minor Compaction by programming
>     Data could be lost if walogs were disabled or configured to use a
>     poor flushing mechanism.
>
>     However, I'm also concerned about the hold times from a single
>     ingest being enough to bring down a server. What's the environment
>     you're running in? Are these virtualized or real servers? How many
>     splits did you make. How many disks per node do you have? And are
>     you using default hdfs replication?
>
>     On Fri, Jul 31, 2015 at 11:11 AM Josh Elser <josh.elser@gmail.com
>     <ma...@gmail.com>> wrote:
>
>
>         Hai Pham wrote:
>          > Hi Keith,
>          >
>          >
>          > I have 4 tablet servers + 1 master. I also did a pre-split before
>          > ingesting and it increased the speed a lot.
>          >
>          >
>          > And you're right, when I created too many ingest threads,
>         many of them
>          > were on the queue of thread pools and the hold time will
>         increases. In
>          > some intense ingest, there was a case when a tablet was
>         killed by master
>          > for the hold time exceeded 5 min. In this situation, all
>         Tablets were in
>          > stuck. Only after that one is dead, the ingest was back with the
>          > comparable speed. But the entries in dead tablet were all
>         gone and lost
>          > to the table.
>
>         You're saying that you lost data? If a server dies, all of the
>         tablets
>         that were hosted there are reassigned to other servers. This is
>         done in
>         a manner that guarantees that there is no data lost in this
>         transition.
>         If you actually lost data, this would be a critical bug, but I would
>         certainly hope you just didn't realize that the data was
>         automatically
>         being hosted by another server.
>
>          > I have had no idea to repair this except for regulating the
>         number of
>          > ingest threads and speed to make it more friendly to the
>         terminal of
>          > Accumulo itself.
>          >
>          >
>          > Another myth to me is that when I did a pre-split to, e.g. 8
>         tablets.
>          > But along with the ingest operation, the tablet number
>         increases (e.g.
>          > 10, 14 or bigger). Any idea?
>
>         Yep, Accumulo will naturally split tablets when they exceed a
>         certain
>         size (1GB by default for normal tables). Unless you increase the
>         property table.split.threshold, as you ingest more data, you will
>         observe more tablets.
>
>         Given enough time, Accumulo will naturally split your table enough.
>         Pre-splitting quickly gets you to a good level of performance
>         right away.
>
>          >
>          > Hai
>          >
>         ------------------------------------------------------------------------
>          > *From:* Keith Turner <keith@deenlo.com <ma...@deenlo.com>>
>          > *Sent:* Friday, July 31, 2015 8:39 AM
>          > *To:* user@accumulo.apache.org <ma...@accumulo.apache.org>
>          > *Subject:* Re: How to control Minor Compaction by programming
>          > How many tablets do you have? Entire tablets are minor
>         compacted at
>          > once. If you have 1 tablet per tablet server, then minor
>         compactions
>          > will have a lot of work to do at once. While this work is
>         being done,
>          > the tablet servers memory may fill up, leading to writes
>         being held.
>          >
>          > If you have 10 tablets per tablet server, then tablets can be
>         compacted
>          > in parallel w/ less work to do at any given point in time.
>         This can
>          > avoid memory filling up and writes being held.
>          >
>          > In short, its possible that adding good split points to the
>         table (and
>          > therefore creating more tablets) may help w/ this issue.
>          >
>          > Also, are you seeing hold times?
>          >
>          > On Thu, Jul 30, 2015 at 11:24 PM, Hai Pham
>         <htp0005@tigermail.auburn.edu <ma...@tigermail.auburn.edu>
>          > <mailto:htp0005@tigermail.auburn.edu
>         <ma...@tigermail.auburn.edu>>> wrote:
>          >
>          > Hey William, Josh and David,
>          >
>          > Thanks for explaining, I might not have been clear: I used
>         the web
>          > interface with port 50095 to monitor the real-time charts
>         (ingest,
>          > scan, load average, minor compaction, major compaction, ...).
>          >
>          > Nonetheless, as I witnessed, when I ingested about 100k
>         entries ->
>          > then minor compaction happened -> ingest was stuck -> the
>         level of
>          > minor compaction on the charts was just about 1.0, 2.0 and
>         max 3.0
>          > while about >20k entries were forced out of memory (I knew
>         this by
>          > looking at the number of entries in memory w.r.t the table being
>          > ingested to) -> then when minor compaction ended, ingest resumed,
>          > somewhat faster.
>          >
>          > Thus I presume the level 1.0, 2.0, 3.0 is not representative for
>          > number of files being minor-compacted from memory?
>          >
>          > Hai
>          > ________________________________________
>          > From: Josh Elser <josh.elser@gmail.com
>         <ma...@gmail.com> <mailto:josh.elser@gmail.com
>         <ma...@gmail.com>>>
>          > Sent: Thursday, July 30, 2015 7:12 PM
>          > To: user@accumulo.apache.org
>         <ma...@accumulo.apache.org>
>         <mailto:user@accumulo.apache.org <ma...@accumulo.apache.org>>
>          > Subject: Re: How to control Minor Compaction by programming
>          >
>          > >
>          > > Also, can you please explain the number 0, 1.0, 2.0, ... in
>          > charts (web
>          > > monitoring) denoting the level of Minor Compaction and Major
>          > Compaction?
>          >
>          > On the monitor, the number of compactions are of the form:
>          >
>          > active (queued)
>          >
>          > e.g. 4 (2), would mean that 4 are running and 2 are queued.
>          >
>          > >
>          > >
>          > > Thank you!
>          > >
>          > > Hai Pham
>          > >
>          > >
>          > >
>          > >
>          >
>          >
>

Re: How to control Minor Compaction by programming

Posted by Hai Pham <ht...@tigermail.auburn.edu>.
Hi John,


For your advice, I will test with other options for number of splits.


My native map size is 1G (default). I am also trying increasing it. Maybe if the problem is reproduced, I will have more information to provide. Thank you!


Hai

________________________________
From: John Vines <vi...@apache.org>
Sent: Friday, July 31, 2015 11:12 AM
To: user@accumulo.apache.org
Subject: Re: How to control Minor Compaction by programming

If you have only 4/8 tablets for 4 tservers, you're not really parallelizing well.

That doesn't explain a 5 minute hold time though, that is strange. How large is your in memory map size?


On Fri, Jul 31, 2015 at 11:53 AM Hai Pham <ht...@tigermail.auburn.edu>> wrote:

Hi Josh and John,


Correct. Since one of my constraint was the time, I tested with wal flush and wal disabled and the the lost data case happened in wal disabled mode - my mistake for not having described.


I have 1 master + 16 hadoop slaves under Accumulo, all are Centos 6.5 physical boxes times at least 500GB, 24G RAM each, but the network is only 1G. DFS replication = 3 by default. I tested with 4  and 8 splits, the hold time problem was likely happen more often in 4 splits. And you are right, changing flushing scheme got the problem remediated.


Thank you a lot!

Hai

________________________________
From: John Vines <vi...@apache.org>>
Sent: Friday, July 31, 2015 10:29 AM
To: user@accumulo.apache.org<ma...@accumulo.apache.org>

Subject: Re: How to control Minor Compaction by programming
Data could be lost if walogs were disabled or configured to use a poor flushing mechanism.

However, I'm also concerned about the hold times from a single ingest being enough to bring down a server. What's the environment you're running in? Are these virtualized or real servers? How many splits did you make. How many disks per node do you have? And are you using default hdfs replication?

On Fri, Jul 31, 2015 at 11:11 AM Josh Elser <jo...@gmail.com>> wrote:

Hai Pham wrote:
> Hi Keith,
>
>
> I have 4 tablet servers + 1 master. I also did a pre-split before
> ingesting and it increased the speed a lot.
>
>
> And you're right, when I created too many ingest threads, many of them
> were on the queue of thread pools and the hold time will increases. In
> some intense ingest, there was a case when a tablet was killed by master
> for the hold time exceeded 5 min. In this situation, all Tablets were in
> stuck. Only after that one is dead, the ingest was back with the
> comparable speed. But the entries in dead tablet were all gone and lost
> to the table.

You're saying that you lost data? If a server dies, all of the tablets
that were hosted there are reassigned to other servers. This is done in
a manner that guarantees that there is no data lost in this transition.
If you actually lost data, this would be a critical bug, but I would
certainly hope you just didn't realize that the data was automatically
being hosted by another server.

> I have had no idea to repair this except for regulating the number of
> ingest threads and speed to make it more friendly to the terminal of
> Accumulo itself.
>
>
> Another myth to me is that when I did a pre-split to, e.g. 8 tablets.
> But along with the ingest operation, the tablet number increases (e.g.
> 10, 14 or bigger). Any idea?

Yep, Accumulo will naturally split tablets when they exceed a certain
size (1GB by default for normal tables). Unless you increase the
property table.split.threshold, as you ingest more data, you will
observe more tablets.

Given enough time, Accumulo will naturally split your table enough.
Pre-splitting quickly gets you to a good level of performance right away.

>
> Hai
> ------------------------------------------------------------------------
> *From:* Keith Turner <ke...@deenlo.com>>
> *Sent:* Friday, July 31, 2015 8:39 AM
> *To:* user@accumulo.apache.org<ma...@accumulo.apache.org>
> *Subject:* Re: How to control Minor Compaction by programming
> How many tablets do you have? Entire tablets are minor compacted at
> once. If you have 1 tablet per tablet server, then minor compactions
> will have a lot of work to do at once. While this work is being done,
> the tablet servers memory may fill up, leading to writes being held.
>
> If you have 10 tablets per tablet server, then tablets can be compacted
> in parallel w/ less work to do at any given point in time. This can
> avoid memory filling up and writes being held.
>
> In short, its possible that adding good split points to the table (and
> therefore creating more tablets) may help w/ this issue.
>
> Also, are you seeing hold times?
>
> On Thu, Jul 30, 2015 at 11:24 PM, Hai Pham <ht...@tigermail.auburn.edu>
> <ma...@tigermail.auburn.edu>>> wrote:
>
>     Hey William, Josh and David,
>
>     Thanks for explaining, I might not have been clear: I used the web
>     interface with port 50095 to monitor the real-time charts (ingest,
>     scan, load average, minor compaction, major compaction, ...).
>
>     Nonetheless, as I witnessed, when I ingested about 100k entries ->
>     then minor compaction happened -> ingest was stuck -> the level of
>     minor compaction on the charts was just about 1.0, 2.0 and max 3.0
>     while about >20k entries were forced out of memory (I knew this by
>     looking at the number of entries in memory w.r.t the table being
>     ingested to) -> then when minor compaction ended, ingest resumed,
>     somewhat faster.
>
>     Thus I presume the level 1.0, 2.0, 3.0 is not representative for
>     number of files being minor-compacted from memory?
>
>     Hai
>     ________________________________________
>     From: Josh Elser <jo...@gmail.com> <ma...@gmail.com>>>
>     Sent: Thursday, July 30, 2015 7:12 PM
>     To: user@accumulo.apache.org<ma...@accumulo.apache.org> <ma...@accumulo.apache.org>>
>     Subject: Re: How to control Minor Compaction by programming
>
>     >
>      > Also, can you please explain the number 0, 1.0, 2.0, ... in
>     charts (web
>      > monitoring) denoting the level of Minor Compaction and Major
>     Compaction?
>
>     On the monitor, the number of compactions are of the form:
>
>     active (queued)
>
>     e.g. 4 (2), would mean that 4 are running and 2 are queued.
>
>      >
>      >
>      > Thank you!
>      >
>      > Hai Pham
>      >
>      >
>      >
>      >
>
>

Re: How to control Minor Compaction by programming

Posted by John Vines <vi...@apache.org>.
If you have only 4/8 tablets for 4 tservers, you're not really
parallelizing well.

That doesn't explain a 5 minute hold time though, that is strange. How
large is your in memory map size?


On Fri, Jul 31, 2015 at 11:53 AM Hai Pham <ht...@tigermail.auburn.edu>
wrote:

> Hi Josh and John,
>
>
> Correct. Since one of my constraint was the time, I tested with wal flush
> and wal disabled and the the lost data case happened in wal disabled mode -
> my mistake for not having described.
>
>
> I have 1 master + 16 hadoop slaves under Accumulo, all are Centos
> 6.5 physical boxes times at least 500GB, 24G RAM each, but the network is
> only 1G. DFS replication = 3 by default. I tested with 4  and 8 splits, the
> hold time problem was likely happen more often in 4 splits. And you are
> right, changing flushing scheme got the problem remediated.
>
>
> Thank you a lot!
>
> Hai
> ------------------------------
> *From:* John Vines <vi...@apache.org>
> *Sent:* Friday, July 31, 2015 10:29 AM
> *To:* user@accumulo.apache.org
>
> *Subject:* Re: How to control Minor Compaction by programming
> Data could be lost if walogs were disabled or configured to use a poor
> flushing mechanism.
>
> However, I'm also concerned about the hold times from a single ingest
> being enough to bring down a server. What's the environment you're running
> in? Are these virtualized or real servers? How many splits did you make.
> How many disks per node do you have? And are you using default hdfs
> replication?
>
> On Fri, Jul 31, 2015 at 11:11 AM Josh Elser <jo...@gmail.com> wrote:
>
>>
>> Hai Pham wrote:
>> > Hi Keith,
>> >
>> >
>> > I have 4 tablet servers + 1 master. I also did a pre-split before
>> > ingesting and it increased the speed a lot.
>> >
>> >
>> > And you're right, when I created too many ingest threads, many of them
>> > were on the queue of thread pools and the hold time will increases. In
>> > some intense ingest, there was a case when a tablet was killed by master
>> > for the hold time exceeded 5 min. In this situation, all Tablets were in
>> > stuck. Only after that one is dead, the ingest was back with the
>> > comparable speed. But the entries in dead tablet were all gone and lost
>> > to the table.
>>
>> You're saying that you lost data? If a server dies, all of the tablets
>> that were hosted there are reassigned to other servers. This is done in
>> a manner that guarantees that there is no data lost in this transition.
>> If you actually lost data, this would be a critical bug, but I would
>> certainly hope you just didn't realize that the data was automatically
>> being hosted by another server.
>>
>> > I have had no idea to repair this except for regulating the number of
>> > ingest threads and speed to make it more friendly to the terminal of
>> > Accumulo itself.
>> >
>> >
>> > Another myth to me is that when I did a pre-split to, e.g. 8 tablets.
>> > But along with the ingest operation, the tablet number increases (e.g.
>> > 10, 14 or bigger). Any idea?
>>
>> Yep, Accumulo will naturally split tablets when they exceed a certain
>> size (1GB by default for normal tables). Unless you increase the
>> property table.split.threshold, as you ingest more data, you will
>> observe more tablets.
>>
>> Given enough time, Accumulo will naturally split your table enough.
>> Pre-splitting quickly gets you to a good level of performance right away.
>>
>> >
>> > Hai
>> > ------------------------------------------------------------------------
>> > *From:* Keith Turner <ke...@deenlo.com>
>> > *Sent:* Friday, July 31, 2015 8:39 AM
>> > *To:* user@accumulo.apache.org
>> > *Subject:* Re: How to control Minor Compaction by programming
>> > How many tablets do you have? Entire tablets are minor compacted at
>> > once. If you have 1 tablet per tablet server, then minor compactions
>> > will have a lot of work to do at once. While this work is being done,
>> > the tablet servers memory may fill up, leading to writes being held.
>> >
>> > If you have 10 tablets per tablet server, then tablets can be compacted
>> > in parallel w/ less work to do at any given point in time. This can
>> > avoid memory filling up and writes being held.
>> >
>> > In short, its possible that adding good split points to the table (and
>> > therefore creating more tablets) may help w/ this issue.
>> >
>> > Also, are you seeing hold times?
>> >
>> > On Thu, Jul 30, 2015 at 11:24 PM, Hai Pham <
>> htp0005@tigermail.auburn.edu
>> > <ma...@tigermail.auburn.edu>> wrote:
>> >
>> >     Hey William, Josh and David,
>> >
>> >     Thanks for explaining, I might not have been clear: I used the web
>> >     interface with port 50095 to monitor the real-time charts (ingest,
>> >     scan, load average, minor compaction, major compaction, ...).
>> >
>> >     Nonetheless, as I witnessed, when I ingested about 100k entries ->
>> >     then minor compaction happened -> ingest was stuck -> the level of
>> >     minor compaction on the charts was just about 1.0, 2.0 and max 3.0
>> >     while about >20k entries were forced out of memory (I knew this by
>> >     looking at the number of entries in memory w.r.t the table being
>> >     ingested to) -> then when minor compaction ended, ingest resumed,
>> >     somewhat faster.
>> >
>> >     Thus I presume the level 1.0, 2.0, 3.0 is not representative for
>> >     number of files being minor-compacted from memory?
>> >
>> >     Hai
>> >     ________________________________________
>> >     From: Josh Elser <josh.elser@gmail.com <mailto:josh.elser@gmail.com
>> >>
>> >     Sent: Thursday, July 30, 2015 7:12 PM
>> >     To: user@accumulo.apache.org <ma...@accumulo.apache.org>
>> >     Subject: Re: How to control Minor Compaction by programming
>> >
>> >     >
>> >      > Also, can you please explain the number 0, 1.0, 2.0, ... in
>> >     charts (web
>> >      > monitoring) denoting the level of Minor Compaction and Major
>> >     Compaction?
>> >
>> >     On the monitor, the number of compactions are of the form:
>> >
>> >     active (queued)
>> >
>> >     e.g. 4 (2), would mean that 4 are running and 2 are queued.
>> >
>> >      >
>> >      >
>> >      > Thank you!
>> >      >
>> >      > Hai Pham
>> >      >
>> >      >
>> >      >
>> >      >
>> >
>> >
>>
>

Re: How to control Minor Compaction by programming

Posted by Hai Pham <ht...@tigermail.auburn.edu>.
Hi Josh and John,


Correct. Since one of my constraint was the time, I tested with wal flush and wal disabled and the the lost data case happened in wal disabled mode - my mistake for not having described.


I have 1 master + 16 hadoop slaves under Accumulo, all are Centos 6.5 physical boxes times at least 500GB, 24G RAM each, but the network is only 1G. DFS replication = 3 by default. I tested with 4  and 8 splits, the hold time problem was likely happen more often in 4 splits. And you are right, changing flushing scheme got the problem remediated.


Thank you a lot!

Hai

________________________________
From: John Vines <vi...@apache.org>
Sent: Friday, July 31, 2015 10:29 AM
To: user@accumulo.apache.org
Subject: Re: How to control Minor Compaction by programming

Data could be lost if walogs were disabled or configured to use a poor flushing mechanism.

However, I'm also concerned about the hold times from a single ingest being enough to bring down a server. What's the environment you're running in? Are these virtualized or real servers? How many splits did you make. How many disks per node do you have? And are you using default hdfs replication?

On Fri, Jul 31, 2015 at 11:11 AM Josh Elser <jo...@gmail.com>> wrote:

Hai Pham wrote:
> Hi Keith,
>
>
> I have 4 tablet servers + 1 master. I also did a pre-split before
> ingesting and it increased the speed a lot.
>
>
> And you're right, when I created too many ingest threads, many of them
> were on the queue of thread pools and the hold time will increases. In
> some intense ingest, there was a case when a tablet was killed by master
> for the hold time exceeded 5 min. In this situation, all Tablets were in
> stuck. Only after that one is dead, the ingest was back with the
> comparable speed. But the entries in dead tablet were all gone and lost
> to the table.

You're saying that you lost data? If a server dies, all of the tablets
that were hosted there are reassigned to other servers. This is done in
a manner that guarantees that there is no data lost in this transition.
If you actually lost data, this would be a critical bug, but I would
certainly hope you just didn't realize that the data was automatically
being hosted by another server.

> I have had no idea to repair this except for regulating the number of
> ingest threads and speed to make it more friendly to the terminal of
> Accumulo itself.
>
>
> Another myth to me is that when I did a pre-split to, e.g. 8 tablets.
> But along with the ingest operation, the tablet number increases (e.g.
> 10, 14 or bigger). Any idea?

Yep, Accumulo will naturally split tablets when they exceed a certain
size (1GB by default for normal tables). Unless you increase the
property table.split.threshold, as you ingest more data, you will
observe more tablets.

Given enough time, Accumulo will naturally split your table enough.
Pre-splitting quickly gets you to a good level of performance right away.

>
> Hai
> ------------------------------------------------------------------------
> *From:* Keith Turner <ke...@deenlo.com>>
> *Sent:* Friday, July 31, 2015 8:39 AM
> *To:* user@accumulo.apache.org<ma...@accumulo.apache.org>
> *Subject:* Re: How to control Minor Compaction by programming
> How many tablets do you have? Entire tablets are minor compacted at
> once. If you have 1 tablet per tablet server, then minor compactions
> will have a lot of work to do at once. While this work is being done,
> the tablet servers memory may fill up, leading to writes being held.
>
> If you have 10 tablets per tablet server, then tablets can be compacted
> in parallel w/ less work to do at any given point in time. This can
> avoid memory filling up and writes being held.
>
> In short, its possible that adding good split points to the table (and
> therefore creating more tablets) may help w/ this issue.
>
> Also, are you seeing hold times?
>
> On Thu, Jul 30, 2015 at 11:24 PM, Hai Pham <ht...@tigermail.auburn.edu>
> <ma...@tigermail.auburn.edu>>> wrote:
>
>     Hey William, Josh and David,
>
>     Thanks for explaining, I might not have been clear: I used the web
>     interface with port 50095 to monitor the real-time charts (ingest,
>     scan, load average, minor compaction, major compaction, ...).
>
>     Nonetheless, as I witnessed, when I ingested about 100k entries ->
>     then minor compaction happened -> ingest was stuck -> the level of
>     minor compaction on the charts was just about 1.0, 2.0 and max 3.0
>     while about >20k entries were forced out of memory (I knew this by
>     looking at the number of entries in memory w.r.t the table being
>     ingested to) -> then when minor compaction ended, ingest resumed,
>     somewhat faster.
>
>     Thus I presume the level 1.0, 2.0, 3.0 is not representative for
>     number of files being minor-compacted from memory?
>
>     Hai
>     ________________________________________
>     From: Josh Elser <jo...@gmail.com> <ma...@gmail.com>>>
>     Sent: Thursday, July 30, 2015 7:12 PM
>     To: user@accumulo.apache.org<ma...@accumulo.apache.org> <ma...@accumulo.apache.org>>
>     Subject: Re: How to control Minor Compaction by programming
>
>     >
>      > Also, can you please explain the number 0, 1.0, 2.0, ... in
>     charts (web
>      > monitoring) denoting the level of Minor Compaction and Major
>     Compaction?
>
>     On the monitor, the number of compactions are of the form:
>
>     active (queued)
>
>     e.g. 4 (2), would mean that 4 are running and 2 are queued.
>
>      >
>      >
>      > Thank you!
>      >
>      > Hai Pham
>      >
>      >
>      >
>      >
>
>

Re: How to control Minor Compaction by programming

Posted by John Vines <vi...@apache.org>.
Data could be lost if walogs were disabled or configured to use a poor
flushing mechanism.

However, I'm also concerned about the hold times from a single ingest being
enough to bring down a server. What's the environment you're running in?
Are these virtualized or real servers? How many splits did you make. How
many disks per node do you have? And are you using default hdfs replication?

On Fri, Jul 31, 2015 at 11:11 AM Josh Elser <jo...@gmail.com> wrote:

>
> Hai Pham wrote:
> > Hi Keith,
> >
> >
> > I have 4 tablet servers + 1 master. I also did a pre-split before
> > ingesting and it increased the speed a lot.
> >
> >
> > And you're right, when I created too many ingest threads, many of them
> > were on the queue of thread pools and the hold time will increases. In
> > some intense ingest, there was a case when a tablet was killed by master
> > for the hold time exceeded 5 min. In this situation, all Tablets were in
> > stuck. Only after that one is dead, the ingest was back with the
> > comparable speed. But the entries in dead tablet were all gone and lost
> > to the table.
>
> You're saying that you lost data? If a server dies, all of the tablets
> that were hosted there are reassigned to other servers. This is done in
> a manner that guarantees that there is no data lost in this transition.
> If you actually lost data, this would be a critical bug, but I would
> certainly hope you just didn't realize that the data was automatically
> being hosted by another server.
>
> > I have had no idea to repair this except for regulating the number of
> > ingest threads and speed to make it more friendly to the terminal of
> > Accumulo itself.
> >
> >
> > Another myth to me is that when I did a pre-split to, e.g. 8 tablets.
> > But along with the ingest operation, the tablet number increases (e.g.
> > 10, 14 or bigger). Any idea?
>
> Yep, Accumulo will naturally split tablets when they exceed a certain
> size (1GB by default for normal tables). Unless you increase the
> property table.split.threshold, as you ingest more data, you will
> observe more tablets.
>
> Given enough time, Accumulo will naturally split your table enough.
> Pre-splitting quickly gets you to a good level of performance right away.
>
> >
> > Hai
> > ------------------------------------------------------------------------
> > *From:* Keith Turner <ke...@deenlo.com>
> > *Sent:* Friday, July 31, 2015 8:39 AM
> > *To:* user@accumulo.apache.org
> > *Subject:* Re: How to control Minor Compaction by programming
> > How many tablets do you have? Entire tablets are minor compacted at
> > once. If you have 1 tablet per tablet server, then minor compactions
> > will have a lot of work to do at once. While this work is being done,
> > the tablet servers memory may fill up, leading to writes being held.
> >
> > If you have 10 tablets per tablet server, then tablets can be compacted
> > in parallel w/ less work to do at any given point in time. This can
> > avoid memory filling up and writes being held.
> >
> > In short, its possible that adding good split points to the table (and
> > therefore creating more tablets) may help w/ this issue.
> >
> > Also, are you seeing hold times?
> >
> > On Thu, Jul 30, 2015 at 11:24 PM, Hai Pham <htp0005@tigermail.auburn.edu
> > <ma...@tigermail.auburn.edu>> wrote:
> >
> >     Hey William, Josh and David,
> >
> >     Thanks for explaining, I might not have been clear: I used the web
> >     interface with port 50095 to monitor the real-time charts (ingest,
> >     scan, load average, minor compaction, major compaction, ...).
> >
> >     Nonetheless, as I witnessed, when I ingested about 100k entries ->
> >     then minor compaction happened -> ingest was stuck -> the level of
> >     minor compaction on the charts was just about 1.0, 2.0 and max 3.0
> >     while about >20k entries were forced out of memory (I knew this by
> >     looking at the number of entries in memory w.r.t the table being
> >     ingested to) -> then when minor compaction ended, ingest resumed,
> >     somewhat faster.
> >
> >     Thus I presume the level 1.0, 2.0, 3.0 is not representative for
> >     number of files being minor-compacted from memory?
> >
> >     Hai
> >     ________________________________________
> >     From: Josh Elser <josh.elser@gmail.com <mailto:josh.elser@gmail.com
> >>
> >     Sent: Thursday, July 30, 2015 7:12 PM
> >     To: user@accumulo.apache.org <ma...@accumulo.apache.org>
> >     Subject: Re: How to control Minor Compaction by programming
> >
> >     >
> >      > Also, can you please explain the number 0, 1.0, 2.0, ... in
> >     charts (web
> >      > monitoring) denoting the level of Minor Compaction and Major
> >     Compaction?
> >
> >     On the monitor, the number of compactions are of the form:
> >
> >     active (queued)
> >
> >     e.g. 4 (2), would mean that 4 are running and 2 are queued.
> >
> >      >
> >      >
> >      > Thank you!
> >      >
> >      > Hai Pham
> >      >
> >      >
> >      >
> >      >
> >
> >
>

Re: How to control Minor Compaction by programming

Posted by Josh Elser <jo...@gmail.com>.
Hai Pham wrote:
> Hi Keith,
>
>
> I have 4 tablet servers + 1 master. I also did a pre-split before
> ingesting and it increased the speed a lot.
>
>
> And you're right, when I created too many ingest threads, many of them
> were on the queue of thread pools and the hold time will increases. In
> some intense ingest, there was a case when a tablet was killed by master
> for the hold time exceeded 5 min. In this situation, all Tablets were in
> stuck. Only after that one is dead, the ingest was back with the
> comparable speed. But the entries in dead tablet were all gone and lost
> to the table.

You're saying that you lost data? If a server dies, all of the tablets 
that were hosted there are reassigned to other servers. This is done in 
a manner that guarantees that there is no data lost in this transition. 
If you actually lost data, this would be a critical bug, but I would 
certainly hope you just didn't realize that the data was automatically 
being hosted by another server.

> I have had no idea to repair this except for regulating the number of
> ingest threads and speed to make it more friendly to the terminal of
> Accumulo itself.
>
>
> Another myth to me is that when I did a pre-split to, e.g. 8 tablets.
> But along with the ingest operation, the tablet number increases (e.g.
> 10, 14 or bigger). Any idea?

Yep, Accumulo will naturally split tablets when they exceed a certain 
size (1GB by default for normal tables). Unless you increase the 
property table.split.threshold, as you ingest more data, you will 
observe more tablets.

Given enough time, Accumulo will naturally split your table enough. 
Pre-splitting quickly gets you to a good level of performance right away.

>
> Hai
> ------------------------------------------------------------------------
> *From:* Keith Turner <ke...@deenlo.com>
> *Sent:* Friday, July 31, 2015 8:39 AM
> *To:* user@accumulo.apache.org
> *Subject:* Re: How to control Minor Compaction by programming
> How many tablets do you have? Entire tablets are minor compacted at
> once. If you have 1 tablet per tablet server, then minor compactions
> will have a lot of work to do at once. While this work is being done,
> the tablet servers memory may fill up, leading to writes being held.
>
> If you have 10 tablets per tablet server, then tablets can be compacted
> in parallel w/ less work to do at any given point in time. This can
> avoid memory filling up and writes being held.
>
> In short, its possible that adding good split points to the table (and
> therefore creating more tablets) may help w/ this issue.
>
> Also, are you seeing hold times?
>
> On Thu, Jul 30, 2015 at 11:24 PM, Hai Pham <htp0005@tigermail.auburn.edu
> <ma...@tigermail.auburn.edu>> wrote:
>
>     Hey William, Josh and David,
>
>     Thanks for explaining, I might not have been clear: I used the web
>     interface with port 50095 to monitor the real-time charts (ingest,
>     scan, load average, minor compaction, major compaction, ...).
>
>     Nonetheless, as I witnessed, when I ingested about 100k entries ->
>     then minor compaction happened -> ingest was stuck -> the level of
>     minor compaction on the charts was just about 1.0, 2.0 and max 3.0
>     while about >20k entries were forced out of memory (I knew this by
>     looking at the number of entries in memory w.r.t the table being
>     ingested to) -> then when minor compaction ended, ingest resumed,
>     somewhat faster.
>
>     Thus I presume the level 1.0, 2.0, 3.0 is not representative for
>     number of files being minor-compacted from memory?
>
>     Hai
>     ________________________________________
>     From: Josh Elser <josh.elser@gmail.com <ma...@gmail.com>>
>     Sent: Thursday, July 30, 2015 7:12 PM
>     To: user@accumulo.apache.org <ma...@accumulo.apache.org>
>     Subject: Re: How to control Minor Compaction by programming
>
>     >
>      > Also, can you please explain the number 0, 1.0, 2.0, ... in
>     charts (web
>      > monitoring) denoting the level of Minor Compaction and Major
>     Compaction?
>
>     On the monitor, the number of compactions are of the form:
>
>     active (queued)
>
>     e.g. 4 (2), would mean that 4 are running and 2 are queued.
>
>      >
>      >
>      > Thank you!
>      >
>      > Hai Pham
>      >
>      >
>      >
>      >
>
>

Re: How to control Minor Compaction by programming

Posted by Hai Pham <ht...@tigermail.auburn.edu>.
Hi Keith,


I have 4 tablet servers + 1 master. I also did a pre-split before ingesting and it increased the speed a lot.


And you're right, when I created too many ingest threads, many of them were on the queue of thread pools and the hold time will increases. In some intense ingest, there was a case when a tablet was killed by master for the hold time exceeded 5 min. In this situation, all Tablets were in stuck. Only after that one is dead, the ingest was back with the comparable speed. But the entries in dead tablet were all gone and lost to the table.

I have had no idea to repair this except for regulating the number of ingest threads and speed to make it more friendly to the terminal of Accumulo itself.


Another myth to me is that when I did a pre-split to, e.g. 8 tablets. But along with the ingest operation, the tablet number increases (e.g. 10, 14 or bigger). Any idea?

Hai
________________________________
From: Keith Turner <ke...@deenlo.com>
Sent: Friday, July 31, 2015 8:39 AM
To: user@accumulo.apache.org
Subject: Re: How to control Minor Compaction by programming

How many tablets do you have?  Entire tablets are minor compacted at once.  If you have 1 tablet per tablet server, then minor compactions will have a lot of work to do at once.  While this work is being done, the tablet servers memory may fill up, leading to writes being held.

If you have 10 tablets per tablet server, then tablets can be compacted in parallel w/ less work to do at any given point in time.    This can avoid memory filling up and writes being held.

In short, its possible that adding good split points to the table (and therefore creating more tablets) may help w/ this issue.

Also, are you seeing hold times?

On Thu, Jul 30, 2015 at 11:24 PM, Hai Pham <ht...@tigermail.auburn.edu>> wrote:
Hey William, Josh and David,

Thanks for explaining, I might not have been clear: I used the web interface with port 50095 to monitor the real-time charts (ingest, scan, load average, minor compaction, major compaction, ...).

Nonetheless, as I witnessed, when I ingested about 100k entries -> then minor compaction happened -> ingest was stuck -> the level of minor compaction on the charts was just about 1.0, 2.0 and max 3.0 while about >20k entries were forced out of memory (I knew this by looking at the number of entries in memory w.r.t the table being ingested to) -> then when minor compaction ended, ingest resumed, somewhat faster.

Thus I presume the level 1.0, 2.0, 3.0 is not representative for number of files being minor-compacted from memory?

Hai
________________________________________
From: Josh Elser <jo...@gmail.com>>
Sent: Thursday, July 30, 2015 7:12 PM
To: user@accumulo.apache.org<ma...@accumulo.apache.org>
Subject: Re: How to control Minor Compaction by programming

>
> Also, can you please explain the number 0, 1.0, 2.0, ... in charts (web
> monitoring) denoting the level of Minor Compaction and Major Compaction?

On the monitor, the number of compactions are of the form:

active (queued)

e.g. 4 (2), would mean that 4 are running and 2 are queued.

>
>
> Thank you!
>
> Hai Pham
>
>
>
>


Re: How to control Minor Compaction by programming

Posted by Keith Turner <ke...@deenlo.com>.
How many tablets do you have?  Entire tablets are minor compacted at once.
If you have 1 tablet per tablet server, then minor compactions will have a
lot of work to do at once.  While this work is being done, the tablet
servers memory may fill up, leading to writes being held.

If you have 10 tablets per tablet server, then tablets can be compacted in
parallel w/ less work to do at any given point in time.    This can avoid
memory filling up and writes being held.

In short, its possible that adding good split points to the table (and
therefore creating more tablets) may help w/ this issue.

Also, are you seeing hold times?

On Thu, Jul 30, 2015 at 11:24 PM, Hai Pham <ht...@tigermail.auburn.edu>
wrote:

> Hey William, Josh and David,
>
> Thanks for explaining, I might not have been clear: I used the web
> interface with port 50095 to monitor the real-time charts (ingest, scan,
> load average, minor compaction, major compaction, ...).
>
> Nonetheless, as I witnessed, when I ingested about 100k entries -> then
> minor compaction happened -> ingest was stuck -> the level of minor
> compaction on the charts was just about 1.0, 2.0 and max 3.0 while about
> >20k entries were forced out of memory (I knew this by looking at the
> number of entries in memory w.r.t the table being ingested to) -> then when
> minor compaction ended, ingest resumed, somewhat faster.
>
> Thus I presume the level 1.0, 2.0, 3.0 is not representative for number of
> files being minor-compacted from memory?
>
> Hai
> ________________________________________
> From: Josh Elser <jo...@gmail.com>
> Sent: Thursday, July 30, 2015 7:12 PM
> To: user@accumulo.apache.org
> Subject: Re: How to control Minor Compaction by programming
>
> >
> > Also, can you please explain the number 0, 1.0, 2.0, ... in charts (web
> > monitoring) denoting the level of Minor Compaction and Major Compaction?
>
> On the monitor, the number of compactions are of the form:
>
> active (queued)
>
> e.g. 4 (2), would mean that 4 are running and 2 are queued.
>
> >
> >
> > Thank you!
> >
> > Hai Pham
> >
> >
> >
> >
>

Re: How to control Minor Compaction by programming

Posted by Josh Elser <jo...@gmail.com>.
No, it's just a count of the minor compactions being run.

Hai Pham wrote:
> Hey William, Josh and David,
>
> Thanks for explaining, I might not have been clear: I used the web interface with port 50095 to monitor the real-time charts (ingest, scan, load average, minor compaction, major compaction, ...).
>
> Nonetheless, as I witnessed, when I ingested about 100k entries ->  then minor compaction happened ->  ingest was stuck ->  the level of minor compaction on the charts was just about 1.0, 2.0 and max 3.0 while about>20k entries were forced out of memory (I knew this by looking at the number of entries in memory w.r.t the table being ingested to) ->  then when minor compaction ended, ingest resumed, somewhat faster.
>
> Thus I presume the level 1.0, 2.0, 3.0 is not representative for number of files being minor-compacted from memory?
>
> Hai
> ________________________________________
> From: Josh Elser<jo...@gmail.com>
> Sent: Thursday, July 30, 2015 7:12 PM
> To: user@accumulo.apache.org
> Subject: Re: How to control Minor Compaction by programming
>
>> Also, can you please explain the number 0, 1.0, 2.0, ... in charts (web
>> monitoring) denoting the level of Minor Compaction and Major Compaction?
>
> On the monitor, the number of compactions are of the form:
>
> active (queued)
>
> e.g. 4 (2), would mean that 4 are running and 2 are queued.
>
>>
>> Thank you!
>>
>> Hai Pham
>>
>>
>>
>>

Re: How to control Minor Compaction by programming

Posted by Hai Pham <ht...@tigermail.auburn.edu>.
Hey William, Josh and David, 

Thanks for explaining, I might not have been clear: I used the web interface with port 50095 to monitor the real-time charts (ingest, scan, load average, minor compaction, major compaction, ...). 

Nonetheless, as I witnessed, when I ingested about 100k entries -> then minor compaction happened -> ingest was stuck -> the level of minor compaction on the charts was just about 1.0, 2.0 and max 3.0 while about >20k entries were forced out of memory (I knew this by looking at the number of entries in memory w.r.t the table being ingested to) -> then when minor compaction ended, ingest resumed, somewhat faster. 

Thus I presume the level 1.0, 2.0, 3.0 is not representative for number of files being minor-compacted from memory? 

Hai 
________________________________________
From: Josh Elser <jo...@gmail.com>
Sent: Thursday, July 30, 2015 7:12 PM
To: user@accumulo.apache.org
Subject: Re: How to control Minor Compaction by programming

>
> Also, can you please explain the number 0, 1.0, 2.0, ... in charts (web
> monitoring) denoting the level of Minor Compaction and Major Compaction?

On the monitor, the number of compactions are of the form:

active (queued)

e.g. 4 (2), would mean that 4 are running and 2 are queued.

>
>
> Thank you!
>
> Hai Pham
>
>
>
>

Re: How to control Minor Compaction by programming

Posted by Josh Elser <jo...@gmail.com>.
>
> Also, can you please explain the number 0, 1.0, 2.0, ... in charts (web
> monitoring) denoting the level of Minor Compaction and Major Compaction?

On the monitor, the number of compactions are of the form:

active (queued)

e.g. 4 (2), would mean that 4 are running and 2 are queued.

>
>
> Thank you!
>
> Hai Pham
>
>
>
>

Re: How to control Minor Compaction by programming

Posted by Hai Pham <ht...@tigermail.auburn.edu>.
Hi,


Yes, in fact I really want to avoid the minor compaction as much as possible, because during a long Ingest, any minor compaction largely blocks the speed of ingest.


But since the memory is limited, compaction is unavoidable, thus my desire is to control it as much as possible to harmonize the code accordingly.


Thanks,

Hai

________________________________
From: dlmarion@comcast.net <dl...@comcast.net>
Sent: Thursday, July 30, 2015 7:12 PM
To: user@accumulo.apache.org
Subject: RE: How to control Minor Compaction by programming


It sounds like you want to try and not minor compact during your ingest of your data. Is that correct?



From: William Slacum [mailto:wslacum@gmail.com]
Sent: Thursday, July 30, 2015 8:10 PM
To: user@accumulo.apache.org
Subject: Re: How to control Minor Compaction by programming



See http://accumulo.apache.org/1.5/apidocs/org/apache/accumulo/core/client/admin/TableOperations.html#flush%28java.lang.String,%20org.apache.hadoop.io.Text,%20org.apache.hadoop.io.Text,%20boolean%29 for minor compacting (aka "flushing") a table via the API.



On Thu, Jul 30, 2015 at 5:52 PM, Hai Pham <ht...@tigermail.auburn.edu>> wrote:

Hi,



Please share with me is there any way that we can init / control the Minor Compaction by programming (not from the shell). My situation is when I ingest a large data using the BatchWriter, the minor compaction is triggered uncontrollably. The flush() command in BatchWriter seems not for this purpose.

I also tried to play around with parameters in documentation but seems not much helpful.



Also, can you please explain the number 0, 1.0, 2.0, ... in charts (web monitoring) denoting the level of Minor Compaction and Major Compaction?



Thank you!

Hai Pham









RE: How to control Minor Compaction by programming

Posted by dl...@comcast.net.
It sounds like you want to try and not minor compact during your ingest of your data. Is that correct?

 

From: William Slacum [mailto:wslacum@gmail.com] 
Sent: Thursday, July 30, 2015 8:10 PM
To: user@accumulo.apache.org
Subject: Re: How to control Minor Compaction by programming

 

See http://accumulo.apache.org/1.5/apidocs/org/apache/accumulo/core/client/admin/TableOperations.html#flush%28java.lang.String,%20org.apache.hadoop.io.Text,%20org.apache.hadoop.io.Text,%20boolean%29 for minor compacting (aka "flushing") a table via the API.

 

On Thu, Jul 30, 2015 at 5:52 PM, Hai Pham <ht...@tigermail.auburn.edu> wrote:

Hi, 

 

Please share with me is there any way that we can init / control the Minor Compaction by programming (not from the shell). My situation is when I ingest a large data using the BatchWriter, the minor compaction is triggered uncontrollably. The flush() command in BatchWriter seems not for this purpose. 

I also tried to play around with parameters in documentation but seems not much helpful. 

 

Also, can you please explain the number 0, 1.0, 2.0, ... in charts (web monitoring) denoting the level of Minor Compaction and Major Compaction? 

 

Thank you! 

Hai Pham 

 

 

 

 


Re: How to control Minor Compaction by programming

Posted by Hai Pham <ht...@tigermail.auburn.edu>.
Hi William,


It's probably what I have looked for, thus I will try soon. Thank you!


Hai

________________________________
From: William Slacum <ws...@gmail.com>
Sent: Thursday, July 30, 2015 7:11 PM
To: user@accumulo.apache.org
Subject: Re: How to control Minor Compaction by programming

Swap out 1.5 in the previous link for the version you're probably using.

Which charts are you looking at for the compactions? Usually it's just the number of compactions currently running for the system.

On Thu, Jul 30, 2015 at 7:10 PM, William Slacum <ws...@gmail.com>> wrote:
See http://accumulo.apache.org/1.5/apidocs/org/apache/accumulo/core/client/admin/TableOperations.html#flush%28java.lang.String,%20org.apache.hadoop.io.Text,%20org.apache.hadoop.io.Text,%20boolean%29 for minor compacting (aka "flushing") a table via the API.


On Thu, Jul 30, 2015 at 5:52 PM, Hai Pham <ht...@tigermail.auburn.edu>> wrote:

Hi,


Please share with me is there any way that we can init / control the Minor Compaction by programming (not from the shell). My situation is when I ingest a large data using the BatchWriter, the minor compaction is triggered uncontrollably. The flush() command in BatchWriter seems not for this purpose.

I also tried to play around with parameters in documentation but seems not much helpful.


Also, can you please explain the number 0, 1.0, 2.0, ... in charts (web monitoring) denoting the level of Minor Compaction and Major Compaction?


Thank you!

Hai Pham






Re: How to control Minor Compaction by programming

Posted by William Slacum <ws...@gmail.com>.
Swap out 1.5 in the previous link for the version you're probably using.

Which charts are you looking at for the compactions? Usually it's just the
number of compactions currently running for the system.

On Thu, Jul 30, 2015 at 7:10 PM, William Slacum <ws...@gmail.com> wrote:

> See
> http://accumulo.apache.org/1.5/apidocs/org/apache/accumulo/core/client/admin/TableOperations.html#flush%28java.lang.String,%20org.apache.hadoop.io.Text,%20org.apache.hadoop.io.Text,%20boolean%29
> for minor compacting (aka "flushing") a table via the API.
>
>
> On Thu, Jul 30, 2015 at 5:52 PM, Hai Pham <ht...@tigermail.auburn.edu>
> wrote:
>
>> Hi,
>>
>>
>> Please share with me is there any way that we can init / control the
>> Minor Compaction by programming (not from the shell). My situation is when
>> I ingest a large data using the BatchWriter, the minor compaction is
>> triggered uncontrollably. The flush() command in BatchWriter seems not for
>> this purpose.
>>
>> I also tried to play around with parameters in documentation but seems
>> not much helpful.
>>
>>
>> Also, can you please explain the number 0, 1.0, 2.0, ... in charts (web
>> monitoring) denoting the level of Minor Compaction and Major Compaction?
>>
>>
>> Thank you!
>>
>> Hai Pham
>>
>>
>>
>>
>>
>

Re: How to control Minor Compaction by programming

Posted by William Slacum <ws...@gmail.com>.
See
http://accumulo.apache.org/1.5/apidocs/org/apache/accumulo/core/client/admin/TableOperations.html#flush%28java.lang.String,%20org.apache.hadoop.io.Text,%20org.apache.hadoop.io.Text,%20boolean%29
for minor compacting (aka "flushing") a table via the API.


On Thu, Jul 30, 2015 at 5:52 PM, Hai Pham <ht...@tigermail.auburn.edu>
wrote:

> Hi,
>
>
> Please share with me is there any way that we can init / control the Minor
> Compaction by programming (not from the shell). My situation is when I
> ingest a large data using the BatchWriter, the minor compaction is
> triggered uncontrollably. The flush() command in BatchWriter seems not for
> this purpose.
>
> I also tried to play around with parameters in documentation but seems not
> much helpful.
>
>
> Also, can you please explain the number 0, 1.0, 2.0, ... in charts (web
> monitoring) denoting the level of Minor Compaction and Major Compaction?
>
>
> Thank you!
>
> Hai Pham
>
>
>
>
>

Re: How to control Minor Compaction by programming

Posted by David Medinets <da...@gmail.com>.
Just in case you didn't know, you can control the number of threads used
for compacting. See
https://accumulo.apache.org/1.4/user_manual/Table_Configuration.html, then
search for "background threads"

On Thu, Jul 30, 2015 at 6:52 PM, Hai Pham <ht...@tigermail.auburn.edu>
wrote:

> Hi,
>
>
> Please share with me is there any way that we can init / control the Minor
> Compaction by programming (not from the shell). My situation is when I
> ingest a large data using the BatchWriter, the minor compaction is
> triggered uncontrollably. The flush() command in BatchWriter seems not for
> this purpose.
>
> I also tried to play around with parameters in documentation but seems not
> much helpful.
>
>
> Also, can you please explain the number 0, 1.0, 2.0, ... in charts (web
> monitoring) denoting the level of Minor Compaction and Major Compaction?
>
>
> Thank you!
>
> Hai Pham
>
>
>
>
>