You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@accumulo.apache.org by David Medinets <da...@gmail.com> on 2012/02/02 04:15:22 UTC

Re: 1.4 example & performance information

On Mon, Jan 30, 2012 at 3:26 PM, Eric Newton <er...@gmail.com> wrote:
> Feedback is welcome.

> From http://incubator.apache.org/accumulo/example/wikisearch.html
>
>The sample application takes advantage of 3 unique capabilities of Accumulo:
>    Extensible iterators that operate within the distributed region servers of the key-value store
>    Custom aggregators which can efficiently condense information during the various life-cycles of the log-structured merge tree
>    Custom load balancing, which ensures that a table is evenly distributed on all region servers

The following questions occurred to me as I read the above text. I've
assumed the mindset of someone reading this page without knowing much
about Accumulo. Perhaps they reached the page via a web search.

What is a region server?
Why are they distributed in the first point but not the third?
Is there some kind of storage besides key-value?
Where is an "aggregator" defined?
Where are the "various life-cycles" defined?
Why the life-cycles various? Do they change?
What is a log-structured merge tree and why should the reader care?

"The region servers then used the installed aggregators" - should this
read "then uses"?

"The query “octopus” and “big” will be performed on all the servers,
but only those partitions for which the low-cardinality term “octopus”
can be found by using the aggregated reverse index information." -
This sentence seems incomplete. What happens to "those partitions
which ... can be found"?

"These extensions become part of the region servers iterator stack" -
does it make sense to reword this as "These extensions become part of
the Iterator stack on each region server"?