You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by karthick rn <ka...@gmail.com> on 2019/11/19 23:45:30 UTC

Multiple instance volumes

Hi,

When provisioning multiple volumes, for ex. HDFS & Azure Data Lake storage,
it would be good to choose which volume we want the system tables like
metadata, root, replication tables to be created. Currently, Accumulo
randomly creates these tables on multiple volumes and the only way to
control this is to run Accumulo init on 1st volume so all system tables get
created on this volume and then add the other volumes.

I have also tried running 'config -t accumulo.metadata -s
table.custom.volume.preferred=hdfs://accucluster/accumulo' in an attempt to
move the metadata table to a preferred volume, in this case HDFS, but I
don’t see metadata table under HDFS. Also, ‘config -f
table.custom.volume.preferred’ does not show anything!
In short, I was wondering if there is any provision to "move" these system
tables across volumes, or is that a non-goal by design?

Many thanks

Regards,
Karthick

Karthick: Try this command to see your config

Posted by Jeffrey Zeiberg <jz...@gmail.com>.
config -t accumulo.metadata -f table.custom.volume.preferred

On Thu, Nov 21, 2019 at 7:32 PM Christopher <ct...@apache.org> wrote:

> There's a TODO in the code for that issue, too. The issue just needs
> somebody to work on it. It might be a simple matter of ensuring the
> VolumeChooserEnvironment created in the init code has an appropriate
> service environment object which contains the site configuration (but
> not the system config from zookeeper).
>
> On Tue, Nov 19, 2019 at 8:17 PM Billie Rinaldi <bi...@apache.org> wrote:
> >
> > I think it would be a good idea to have the PreferredVolumeChooser select
> > volumes during init. Looks like there is already an issue open for this:
> > https://github.com/apache/accumulo/issues/1373
> >
> > I imagine you could move a table to a different volume by changing the
> > preferred volume configuration and then compacting the table. But as
> Keith
> > mentions in the issue, it would be easier if this weren't necessary.
> >
> > Billie
> >
> > On Tue, Nov 19, 2019, 3:45 PM karthick rn <ka...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > When provisioning multiple volumes, for ex. HDFS & Azure Data Lake
> storage,
> > > it would be good to choose which volume we want the system tables like
> > > metadata, root, replication tables to be created. Currently, Accumulo
> > > randomly creates these tables on multiple volumes and the only way to
> > > control this is to run Accumulo init on 1st volume so all system
> tables get
> > > created on this volume and then add the other volumes.
> > >
> > > I have also tried running 'config -t accumulo.metadata -s
> > > table.custom.volume.preferred=hdfs://accucluster/accumulo' in an
> attempt to
> > > move the metadata table to a preferred volume, in this case HDFS, but I
> > > don’t see metadata table under HDFS. Also, ‘config -f
> > > table.custom.volume.preferred’ does not show anything!
> > > In short, I was wondering if there is any provision to "move" these
> system
> > > tables across volumes, or is that a non-goal by design?
> > >
> > > Many thanks
> > >
> > > Regards,
> > > Karthick
> > >
>

Re: Multiple instance volumes

Posted by Christopher <ct...@apache.org>.
There's a TODO in the code for that issue, too. The issue just needs
somebody to work on it. It might be a simple matter of ensuring the
VolumeChooserEnvironment created in the init code has an appropriate
service environment object which contains the site configuration (but
not the system config from zookeeper).

On Tue, Nov 19, 2019 at 8:17 PM Billie Rinaldi <bi...@apache.org> wrote:
>
> I think it would be a good idea to have the PreferredVolumeChooser select
> volumes during init. Looks like there is already an issue open for this:
> https://github.com/apache/accumulo/issues/1373
>
> I imagine you could move a table to a different volume by changing the
> preferred volume configuration and then compacting the table. But as Keith
> mentions in the issue, it would be easier if this weren't necessary.
>
> Billie
>
> On Tue, Nov 19, 2019, 3:45 PM karthick rn <ka...@gmail.com>
> wrote:
>
> > Hi,
> >
> > When provisioning multiple volumes, for ex. HDFS & Azure Data Lake storage,
> > it would be good to choose which volume we want the system tables like
> > metadata, root, replication tables to be created. Currently, Accumulo
> > randomly creates these tables on multiple volumes and the only way to
> > control this is to run Accumulo init on 1st volume so all system tables get
> > created on this volume and then add the other volumes.
> >
> > I have also tried running 'config -t accumulo.metadata -s
> > table.custom.volume.preferred=hdfs://accucluster/accumulo' in an attempt to
> > move the metadata table to a preferred volume, in this case HDFS, but I
> > don’t see metadata table under HDFS. Also, ‘config -f
> > table.custom.volume.preferred’ does not show anything!
> > In short, I was wondering if there is any provision to "move" these system
> > tables across volumes, or is that a non-goal by design?
> >
> > Many thanks
> >
> > Regards,
> > Karthick
> >

Re: Multiple instance volumes

Posted by Billie Rinaldi <bi...@apache.org>.
I think it would be a good idea to have the PreferredVolumeChooser select
volumes during init. Looks like there is already an issue open for this:
https://github.com/apache/accumulo/issues/1373

I imagine you could move a table to a different volume by changing the
preferred volume configuration and then compacting the table. But as Keith
mentions in the issue, it would be easier if this weren't necessary.

Billie

On Tue, Nov 19, 2019, 3:45 PM karthick rn <ka...@gmail.com>
wrote:

> Hi,
>
> When provisioning multiple volumes, for ex. HDFS & Azure Data Lake storage,
> it would be good to choose which volume we want the system tables like
> metadata, root, replication tables to be created. Currently, Accumulo
> randomly creates these tables on multiple volumes and the only way to
> control this is to run Accumulo init on 1st volume so all system tables get
> created on this volume and then add the other volumes.
>
> I have also tried running 'config -t accumulo.metadata -s
> table.custom.volume.preferred=hdfs://accucluster/accumulo' in an attempt to
> move the metadata table to a preferred volume, in this case HDFS, but I
> don’t see metadata table under HDFS. Also, ‘config -f
> table.custom.volume.preferred’ does not show anything!
> In short, I was wondering if there is any provision to "move" these system
> tables across volumes, or is that a non-goal by design?
>
> Many thanks
>
> Regards,
> Karthick
>

Re: Multiple instance volumes

Posted by Keith Turner <ke...@deenlo.com>.
On Tue, Nov 19, 2019 at 6:45 PM karthick rn
<ka...@gmail.com> wrote:
>
> Hi,
>
> When provisioning multiple volumes, for ex. HDFS & Azure Data Lake storage,
> it would be good to choose which volume we want the system tables like
> metadata, root, replication tables to be created. Currently, Accumulo
> randomly creates these tables on multiple volumes and the only way to
> control this is to run Accumulo init on 1st volume so all system tables get
> created on this volume and then add the other volumes.
>
> I have also tried running 'config -t accumulo.metadata -s
> table.custom.volume.preferred=hdfs://accucluster/accumulo' in an attempt to
> move the metadata table to a preferred volume, in this case HDFS, but I
> don’t see metadata table under HDFS. Also, ‘config -f
> table.custom.volume.preferred’ does not show anything!
> In short, I was wondering if there is any provision to "move" these system
> tables across volumes, or is that a non-goal by design?

Accumulo volume choices are currently sticky.  Each tablet chooses a
volume where it will put files and it remembers that choice (stored in
the metadata table under the srv:dir column).  After that choice is
made, the tablet will always put files on that volume. I don't think
changing the volume choose config and then compacting will fix this.
I think the best option is to run init carefully as you mentioned
and/or look into fixing #1373.

For 2.1.0 this is fixed by #1389[1] and choices are no longer sticky.
So in 2.1, you should be able to change config and then compact.

Another option for 2.0 and earlier would be to perform surgery on
Accumulo's metadata, but this would be tricky for the root tablet.
The following is speculation on what needs to be done.  For the
metadata table, you could change the srv:dir entries stored in the
root tablet (then restart and compact metadata table).  For the root
tablet itself, you would need to change its dir entry in zookeeper AND
move its files to the location specified in zookeeper (doing all of
this while Accumulo is down).  Accumulo has an admin utility to do
this type of thing (it was removed in #1389), but I looked at that
utility and its really only suited for user tables and not the
metadata or root table.  We could possibly make that utility support
the metadata and root tables, make it know how to do the surgery I
mentioned.

[1]: https://github.com/apache/accumulo/pull/1389

>
> Many thanks
>
> Regards,
> Karthick