You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by Christopher Tubbs <ct...@gmail.com> on 2012/11/28 23:55:52 UTC

RFile configuration preferences

It seems RFile has a preference for the Hadoop configuration object holding
Accumulo configuration over Accumulo per-table configuration in ZooKeeper.

See RFileOperations.openWriter(...).
The affected configuration properties are:

table.file.replication
table.file.blocksize
table.file.compress.blocksize
table.file.compress.blocksize.index
table.file.compress.type

Furthermore, when they appear in Hadoop configuration, they cannot contain
the Accumulo shortcuts for specifying byte sizes (like "1G").

Is this a bug, or a feature? It seems like there's a potential for it to be
a feature, particularly in AccumuloFileOutputFormat, so one can specify the
property in Hadoop, but it could also be a bug if it shows up in the Hadoop
configuration files... especially since we don't prefix these configuration
properties with something unique, like "accumulo."

Thoughts?

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii

Re: RFile configuration preferences

Posted by Christopher Tubbs <ct...@gmail.com>.
Looking more carefully at the history, it appears this is the result of
ACCUMULO-467. I think I can get a more consistent expected behavior if I
wrap the AccumuloFileOutputFormat configuration options for RFile in an
AccumuloConfiguration instance, so from RFileOperation's perspective, it
looks as though it could just as easily have come from a per-table
Zookeeper config.


--
Christopher L Tubbs II
http://gravatar.com/ctubbsii



On Wed, Nov 28, 2012 at 6:50 PM, Eric Newton <er...@gmail.com> wrote:

> Sounds to me like an ancient holdover from the days of MapFile.
>
> If we can change it easily, I'm all for that.
>
> -Eric
>
>
>
> On Wed, Nov 28, 2012 at 5:55 PM, Christopher Tubbs <ctubbsii@gmail.com
> >wrote:
>
> > It seems RFile has a preference for the Hadoop configuration object
> holding
> > Accumulo configuration over Accumulo per-table configuration in
> ZooKeeper.
> >
> > See RFileOperations.openWriter(...).
> > The affected configuration properties are:
> >
> > table.file.replication
> > table.file.blocksize
> > table.file.compress.blocksize
> > table.file.compress.blocksize.index
> > table.file.compress.type
> >
> > Furthermore, when they appear in Hadoop configuration, they cannot
> contain
> > the Accumulo shortcuts for specifying byte sizes (like "1G").
> >
> > Is this a bug, or a feature? It seems like there's a potential for it to
> be
> > a feature, particularly in AccumuloFileOutputFormat, so one can specify
> the
> > property in Hadoop, but it could also be a bug if it shows up in the
> Hadoop
> > configuration files... especially since we don't prefix these
> configuration
> > properties with something unique, like "accumulo."
> >
> > Thoughts?
> >
> > --
> > Christopher L Tubbs II
> > http://gravatar.com/ctubbsii
> >
>

Re: RFile configuration preferences

Posted by Eric Newton <er...@gmail.com>.
Sounds to me like an ancient holdover from the days of MapFile.

If we can change it easily, I'm all for that.

-Eric



On Wed, Nov 28, 2012 at 5:55 PM, Christopher Tubbs <ct...@gmail.com>wrote:

> It seems RFile has a preference for the Hadoop configuration object holding
> Accumulo configuration over Accumulo per-table configuration in ZooKeeper.
>
> See RFileOperations.openWriter(...).
> The affected configuration properties are:
>
> table.file.replication
> table.file.blocksize
> table.file.compress.blocksize
> table.file.compress.blocksize.index
> table.file.compress.type
>
> Furthermore, when they appear in Hadoop configuration, they cannot contain
> the Accumulo shortcuts for specifying byte sizes (like "1G").
>
> Is this a bug, or a feature? It seems like there's a potential for it to be
> a feature, particularly in AccumuloFileOutputFormat, so one can specify the
> property in Hadoop, but it could also be a bug if it shows up in the Hadoop
> configuration files... especially since we don't prefix these configuration
> properties with something unique, like "accumulo."
>
> Thoughts?
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>