You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by David Medinets <da...@gmail.com> on 2012/01/05 16:14:15 UTC

Does it make sense to add '-bt true' to createtable command?

While I love the flexibility of setting configurations using the shell
command, it seems like bloom tables are useful enough to warrant
special handling. If this option makes sense, I'll create a JIRA
ticket.

I also think that the following section of README.bloom should be
expanded with the actual commands, unless you're trying to make the
reader think? It's not obvious that the table names should be, nor
even if three tables are needed instead of one. Another JIRA ticket?

 * Insert 1 million entries using  RandomBatchWriter with a seed of 7
 * Flush the table using the shell
 * Insert 1 million entries using  RandomBatchWriter with a seed of 8
 * Flush the table using the shell
 * Insert 1 million entries using  RandomBatchWriter with a seed of 9
 * Flush the table using the shell

Accumulo does have an auto-flush feature? Why flush between each
insert instead of at the end?

Re: Does it make sense to add '-bt true' to createtable command?

Posted by Adam Fuchs <ad...@ugov.gov>.
I think the confusion here might be that there are two different operations
called "flush". One is the flush of the BatchWriter's local buffer, and the
other is the flush of the TabletServer's in-memory map (AKA minor
compaction). This example refers to the latter. There are also auto-flushes
in both cases, but the flush in this case is effectively forcing the
minor-compaction operation with a known quantity of data.

Adam


On Thu, Jan 5, 2012 at 10:14 AM, David Medinets <da...@gmail.com>wrote:

> While I love the flexibility of setting configurations using the shell
> command, it seems like bloom tables are useful enough to warrant
> special handling. If this option makes sense, I'll create a JIRA
> ticket.
>
> I also think that the following section of README.bloom should be
> expanded with the actual commands, unless you're trying to make the
> reader think? It's not obvious that the table names should be, nor
> even if three tables are needed instead of one. Another JIRA ticket?
>
>  * Insert 1 million entries using  RandomBatchWriter with a seed of 7
>  * Flush the table using the shell
>  * Insert 1 million entries using  RandomBatchWriter with a seed of 8
>  * Flush the table using the shell
>  * Insert 1 million entries using  RandomBatchWriter with a seed of 9
>  * Flush the table using the shell
>
> Accumulo does have an auto-flush feature? Why flush between each
> insert instead of at the end?
>

Re: Does it make sense to add '-bt true' to createtable command?

Posted by Keith Turner <ke...@deenlo.com>.
The flushes after each insert are there for a specific purpose, to
ensure the data written with different seeds ends up in different
files.  This is done to show that at scan time the bloom filter will
let you skip seeking 2 of 3 files.

On Thu, Jan 5, 2012 at 10:14 AM, David Medinets
<da...@gmail.com> wrote:
> While I love the flexibility of setting configurations using the shell
> command, it seems like bloom tables are useful enough to warrant
> special handling. If this option makes sense, I'll create a JIRA
> ticket.
>
> I also think that the following section of README.bloom should be
> expanded with the actual commands, unless you're trying to make the
> reader think? It's not obvious that the table names should be, nor
> even if three tables are needed instead of one. Another JIRA ticket?
>
>  * Insert 1 million entries using  RandomBatchWriter with a seed of 7
>  * Flush the table using the shell
>  * Insert 1 million entries using  RandomBatchWriter with a seed of 8
>  * Flush the table using the shell
>  * Insert 1 million entries using  RandomBatchWriter with a seed of 9
>  * Flush the table using the shell
>
> Accumulo does have an auto-flush feature? Why flush between each
> insert instead of at the end?

Re: Does it make sense to add '-bt true' to createtable command?

Posted by Keith Turner <ke...@deenlo.com>.
Adding an option to the createtable shell command to enable bloom
filters seems ok to me.

I do not think we should add another createTable method to
TableOperations.   Just have the shell set the bloom configs by making
additional API calls after it creates the table.

On Thu, Jan 5, 2012 at 10:14 AM, David Medinets
<da...@gmail.com> wrote:
> While I love the flexibility of setting configurations using the shell
> command, it seems like bloom tables are useful enough to warrant
> special handling. If this option makes sense, I'll create a JIRA
> ticket.
>
> I also think that the following section of README.bloom should be
> expanded with the actual commands, unless you're trying to make the
> reader think? It's not obvious that the table names should be, nor
> even if three tables are needed instead of one. Another JIRA ticket?
>
>  * Insert 1 million entries using  RandomBatchWriter with a seed of 7
>  * Flush the table using the shell
>  * Insert 1 million entries using  RandomBatchWriter with a seed of 8
>  * Flush the table using the shell
>  * Insert 1 million entries using  RandomBatchWriter with a seed of 9
>  * Flush the table using the shell
>
> Accumulo does have an auto-flush feature? Why flush between each
> insert instead of at the end?