You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geode.apache.org by Nick Reich <nr...@pivotal.io> on 2017/08/22 18:32:58 UTC

Adding parallel import/export of snapshots to gfsh

Team,

I am working on exposing the parallel export/import of snapshots through
gfsh and would appreciate input on the best approach to adding to /
updating the existing interface.

Currently, ExportDataCommand and ImportDataCommand take a region name, a
member to run the command on, and a file location (that must end in .gfd).
Parallel import and export require a directory location instead of a single
file name (as there can be multiple files and need for uniquely named
files). It is possible to add a parallel flag and have the meaning of the
"file" parameter be different depending on that flag, but that seems overly
confusing to me. I am instead leaning towards creating new commands (e.g.
ParallelExportDataCommand) that has a "directory" parameter to replace
"file", but is otherwise identical in usage to the existing commands.

Do others have different views or approaches to suggest?

Re: Adding parallel import/export of snapshots to gfsh

Posted by Anilkumar Gingade <ag...@pivotal.io>.
>> One other idea that hasn't been mentioned is making parallel the only way

My vote is to support both option; we could make parallel default but
having an option to take snapshot at one node may be useful for use-cases
where:
- Easier to manage snapshot at one file location; in a large cluster
environment.
- Control on file level security
- A node can be used for snapshot without impacting other nodes (disk i/o
ops).

-Anil.



On Tue, Aug 22, 2017 at 3:42 PM, Nick Reich <nr...@pivotal.io> wrote:

> With minimal code change, it is possible to enable the use of —dir for both
> standard and parallel export/import, allowing —file to function only for
> standard exports (and optionally, make it depricated in favor of the —dir
> option).
>
> While not inherently opposed to forcing all Partitioned Region snapshots to
> be parallel, that seems to be a significant enough change to current
> functionality (one file one one member to multiple files on multiple
> members), I am hesitant to do so without united community backing for that
> approach.
>
> On Tue, Aug 22, 2017 at 2:24 PM, Michael Stolz <ms...@pivotal.io> wrote:
>
> > One other idea that hasn't been mentioned is making parallel the only way
> > for Partitioned Regions, and having --file serve the purpose of defining
> > both a path and a filename pattern where the bucket ID or whatever we're
> > using gets automatically inserted before the .gfd extension.
> >
> > No need for a new option (--parallel).
> > No need for a new option (--path).
> >
> > In fact, no need for a change to gfsh command at all.
> >
> >
> > --
> > Mike Stolz
> > Principal Engineer, GemFire Product Manager
> > Mobile: +1-631-835-4771
> >
> > On Tue, Aug 22, 2017 at 2:15 PM, Nick Reich <nr...@pivotal.io> wrote:
> >
> > > Parallel export will write the data to files on the bucket primary for
> > each
> > > bucket, distributing the work (and therefore files) to all the members.
> > > That would be a big enough deviation from the current behavior (single
> > file
> > > on single machine), that I think it makes it worth having the
> additional
> > > options (but I agree: less options is generally better).
> > >
> > > On Tue, Aug 22, 2017 at 1:59 PM, Jacob Barrett <jb...@pivotal.io>
> > > wrote:
> > >
> > > > On Tue, Aug 22, 2017 at 1:49 PM Nick Reich <nr...@pivotal.io>
> wrote:
> > > >
> > > > > The idea of deprecating —file in favor of path is interesting. I
> > wonder
> > > > if
> > > > > instead of making them mutually exclusive to start, having —path be
> > > able
> > > > to
> > > > > support both modes from the start would be better? That way —file
> > could
> > > > > still be used for the existing mode, but —path could be used
> instead
> > > (and
> > > > > override —file is both given?): that would provide a clear path
> > forward
> > > > for
> > > > > how the command should be used, while fully supporting existing
> > > > workflows.
> > > > >
> > > >
> > > > This is what I meant by deprecating. Maybe even providing a message
> > that
> > > if
> > > > --file is set that it is deprecated for --path.
> > > >
> > > >
> > > > > We need to continue to support both modes, as only Partitioned
> > Regions
> > > > can
> > > > > make use of parallel export (it is parallelized on a bucket level).
> > > > >
> > > >
> > > > Ok, so why not just make parallel the only mode for partitioned. Then
> > you
> > > > remove the need for --parallel and --path would work for any region,
> > > > non-partitioned would create a single file at that path and
> partitioned
> > > > would create several? I am all for less options. ;)
> > > >
> > > > -Jake
> > > >
> > >
> >
>

Re: Adding parallel import/export of snapshots to gfsh

Posted by Nick Reich <nr...@pivotal.io>.
With minimal code change, it is possible to enable the use of —dir for both
standard and parallel export/import, allowing —file to function only for
standard exports (and optionally, make it depricated in favor of the —dir
option).

While not inherently opposed to forcing all Partitioned Region snapshots to
be parallel, that seems to be a significant enough change to current
functionality (one file one one member to multiple files on multiple
members), I am hesitant to do so without united community backing for that
approach.

On Tue, Aug 22, 2017 at 2:24 PM, Michael Stolz <ms...@pivotal.io> wrote:

> One other idea that hasn't been mentioned is making parallel the only way
> for Partitioned Regions, and having --file serve the purpose of defining
> both a path and a filename pattern where the bucket ID or whatever we're
> using gets automatically inserted before the .gfd extension.
>
> No need for a new option (--parallel).
> No need for a new option (--path).
>
> In fact, no need for a change to gfsh command at all.
>
>
> --
> Mike Stolz
> Principal Engineer, GemFire Product Manager
> Mobile: +1-631-835-4771
>
> On Tue, Aug 22, 2017 at 2:15 PM, Nick Reich <nr...@pivotal.io> wrote:
>
> > Parallel export will write the data to files on the bucket primary for
> each
> > bucket, distributing the work (and therefore files) to all the members.
> > That would be a big enough deviation from the current behavior (single
> file
> > on single machine), that I think it makes it worth having the additional
> > options (but I agree: less options is generally better).
> >
> > On Tue, Aug 22, 2017 at 1:59 PM, Jacob Barrett <jb...@pivotal.io>
> > wrote:
> >
> > > On Tue, Aug 22, 2017 at 1:49 PM Nick Reich <nr...@pivotal.io> wrote:
> > >
> > > > The idea of deprecating —file in favor of path is interesting. I
> wonder
> > > if
> > > > instead of making them mutually exclusive to start, having —path be
> > able
> > > to
> > > > support both modes from the start would be better? That way —file
> could
> > > > still be used for the existing mode, but —path could be used instead
> > (and
> > > > override —file is both given?): that would provide a clear path
> forward
> > > for
> > > > how the command should be used, while fully supporting existing
> > > workflows.
> > > >
> > >
> > > This is what I meant by deprecating. Maybe even providing a message
> that
> > if
> > > --file is set that it is deprecated for --path.
> > >
> > >
> > > > We need to continue to support both modes, as only Partitioned
> Regions
> > > can
> > > > make use of parallel export (it is parallelized on a bucket level).
> > > >
> > >
> > > Ok, so why not just make parallel the only mode for partitioned. Then
> you
> > > remove the need for --parallel and --path would work for any region,
> > > non-partitioned would create a single file at that path and partitioned
> > > would create several? I am all for less options. ;)
> > >
> > > -Jake
> > >
> >
>

Re: Adding parallel import/export of snapshots to gfsh

Posted by Michael Stolz <ms...@pivotal.io>.
One other idea that hasn't been mentioned is making parallel the only way
for Partitioned Regions, and having --file serve the purpose of defining
both a path and a filename pattern where the bucket ID or whatever we're
using gets automatically inserted before the .gfd extension.

No need for a new option (--parallel).
No need for a new option (--path).

In fact, no need for a change to gfsh command at all.


--
Mike Stolz
Principal Engineer, GemFire Product Manager
Mobile: +1-631-835-4771

On Tue, Aug 22, 2017 at 2:15 PM, Nick Reich <nr...@pivotal.io> wrote:

> Parallel export will write the data to files on the bucket primary for each
> bucket, distributing the work (and therefore files) to all the members.
> That would be a big enough deviation from the current behavior (single file
> on single machine), that I think it makes it worth having the additional
> options (but I agree: less options is generally better).
>
> On Tue, Aug 22, 2017 at 1:59 PM, Jacob Barrett <jb...@pivotal.io>
> wrote:
>
> > On Tue, Aug 22, 2017 at 1:49 PM Nick Reich <nr...@pivotal.io> wrote:
> >
> > > The idea of deprecating —file in favor of path is interesting. I wonder
> > if
> > > instead of making them mutually exclusive to start, having —path be
> able
> > to
> > > support both modes from the start would be better? That way —file could
> > > still be used for the existing mode, but —path could be used instead
> (and
> > > override —file is both given?): that would provide a clear path forward
> > for
> > > how the command should be used, while fully supporting existing
> > workflows.
> > >
> >
> > This is what I meant by deprecating. Maybe even providing a message that
> if
> > --file is set that it is deprecated for --path.
> >
> >
> > > We need to continue to support both modes, as only Partitioned Regions
> > can
> > > make use of parallel export (it is parallelized on a bucket level).
> > >
> >
> > Ok, so why not just make parallel the only mode for partitioned. Then you
> > remove the need for --parallel and --path would work for any region,
> > non-partitioned would create a single file at that path and partitioned
> > would create several? I am all for less options. ;)
> >
> > -Jake
> >
>

Re: Adding parallel import/export of snapshots to gfsh

Posted by Nick Reich <nr...@pivotal.io>.
Parallel export will write the data to files on the bucket primary for each
bucket, distributing the work (and therefore files) to all the members.
That would be a big enough deviation from the current behavior (single file
on single machine), that I think it makes it worth having the additional
options (but I agree: less options is generally better).

On Tue, Aug 22, 2017 at 1:59 PM, Jacob Barrett <jb...@pivotal.io> wrote:

> On Tue, Aug 22, 2017 at 1:49 PM Nick Reich <nr...@pivotal.io> wrote:
>
> > The idea of deprecating —file in favor of path is interesting. I wonder
> if
> > instead of making them mutually exclusive to start, having —path be able
> to
> > support both modes from the start would be better? That way —file could
> > still be used for the existing mode, but —path could be used instead (and
> > override —file is both given?): that would provide a clear path forward
> for
> > how the command should be used, while fully supporting existing
> workflows.
> >
>
> This is what I meant by deprecating. Maybe even providing a message that if
> --file is set that it is deprecated for --path.
>
>
> > We need to continue to support both modes, as only Partitioned Regions
> can
> > make use of parallel export (it is parallelized on a bucket level).
> >
>
> Ok, so why not just make parallel the only mode for partitioned. Then you
> remove the need for --parallel and --path would work for any region,
> non-partitioned would create a single file at that path and partitioned
> would create several? I am all for less options. ;)
>
> -Jake
>

Re: Adding parallel import/export of snapshots to gfsh

Posted by Dan Smith <ds...@pivotal.io>.
I don't really like the idea of adding a separate command. It really is the
same command - you just want to have the parallel flag interact with other
options. A separate command would be more confusing for users, and more of
a maintenance issue as we add more options to export.

Having a --path that could be a file or directory depending on whether your
export is parallel or serial also seems unintuitive.  Kirk's idea of
mutually exclusive options seems more reasonable. Or better yet, just add
--dir and make it work the same way for both serial and a parallel exports
- we generate a files and put them in that directory.

-Dan

On Tue, Aug 22, 2017 at 1:59 PM, Jacob Barrett <jb...@pivotal.io> wrote:

> On Tue, Aug 22, 2017 at 1:49 PM Nick Reich <nr...@pivotal.io> wrote:
>
> > The idea of deprecating —file in favor of path is interesting. I wonder
> if
> > instead of making them mutually exclusive to start, having —path be able
> to
> > support both modes from the start would be better? That way —file could
> > still be used for the existing mode, but —path could be used instead (and
> > override —file is both given?): that would provide a clear path forward
> for
> > how the command should be used, while fully supporting existing
> workflows.
> >
>
> This is what I meant by deprecating. Maybe even providing a message that if
> --file is set that it is deprecated for --path.
>
>
> > We need to continue to support both modes, as only Partitioned Regions
> can
> > make use of parallel export (it is parallelized on a bucket level).
> >
>
> Ok, so why not just make parallel the only mode for partitioned. Then you
> remove the need for --parallel and --path would work for any region,
> non-partitioned would create a single file at that path and partitioned
> would create several? I am all for less options. ;)
>
> -Jake
>

Re: Adding parallel import/export of snapshots to gfsh

Posted by Jacob Barrett <jb...@pivotal.io>.
On Tue, Aug 22, 2017 at 1:49 PM Nick Reich <nr...@pivotal.io> wrote:

> The idea of deprecating —file in favor of path is interesting. I wonder if
> instead of making them mutually exclusive to start, having —path be able to
> support both modes from the start would be better? That way —file could
> still be used for the existing mode, but —path could be used instead (and
> override —file is both given?): that would provide a clear path forward for
> how the command should be used, while fully supporting existing workflows.
>

This is what I meant by deprecating. Maybe even providing a message that if
--file is set that it is deprecated for --path.


> We need to continue to support both modes, as only Partitioned Regions can
> make use of parallel export (it is parallelized on a bucket level).
>

Ok, so why not just make parallel the only mode for partitioned. Then you
remove the need for --parallel and --path would work for any region,
non-partitioned would create a single file at that path and partitioned
would create several? I am all for less options. ;)

-Jake

Re: Adding parallel import/export of snapshots to gfsh

Posted by Nick Reich <nr...@pivotal.io>.
I thought about a mutually exclusive —file and —dir, but in that case, -—file
is required for standard and —path required for parallel export, which I
think could be better than overloading —file, but still has potential for
confusion.

The idea of deprecating —file in favor of path is interesting. I wonder if
instead of making them mutually exclusive to start, having —path be able to
support both modes from the start would be better? That way —file could
still be used for the existing mode, but —path could be used instead (and
override —file is both given?): that would provide a clear path forward for
how the command should be used, while fully supporting existing workflows.


We need to continue to support both modes, as only Partitioned Regions can
make use of parallel export (it is parallelized on a bucket level).

On Tue, Aug 22, 2017 at 12:55 PM, Jacob Barrett <jb...@pivotal.io> wrote:

> How about deprecate —file and replace with —path? In the transition make
> them mutually exclusive and —path required for —parallel.
>
> Any reason to not just make all export parallel rather than supporting two
> different modes?
>
> -Jake
>
>
> Sent from my iPhone
>
> > On Aug 22, 2017, at 12:27 PM, Kenneth Howe <kh...@pivotal.io> wrote:
> >
> > I agrees that overloading the “file” option seems like a bad idea. As an
> alternative to separate commands, what about mutually exclusive options,
> ‘—file’ and ‘—dir’?
> >
> > If you go for implementing the new functionality as a separate command,
> I would suggest calling the gfsh commands: “export data-parallel” and
> “import data-parallel"
> >
> >> On Aug 22, 2017, at 11:32 AM, Nick Reich <nr...@pivotal.io> wrote:
> >>
> >> Team,
> >>
> >> I am working on exposing the parallel export/import of snapshots through
> >> gfsh and would appreciate input on the best approach to adding to /
> >> updating the existing interface.
> >>
> >> Currently, ExportDataCommand and ImportDataCommand take a region name, a
> >> member to run the command on, and a file location (that must end in
> .gfd).
> >> Parallel import and export require a directory location instead of a
> single
> >> file name (as there can be multiple files and need for uniquely named
> >> files). It is possible to add a parallel flag and have the meaning of
> the
> >> "file" parameter be different depending on that flag, but that seems
> overly
> >> confusing to me. I am instead leaning towards creating new commands
> (e.g.
> >> ParallelExportDataCommand) that has a "directory" parameter to replace
> >> "file", but is otherwise identical in usage to the existing commands.
> >>
> >> Do others have different views or approaches to suggest?
> >
>

Re: Adding parallel import/export of snapshots to gfsh

Posted by Jacob Barrett <jb...@pivotal.io>.
How about deprecate —file and replace with —path? In the transition make them mutually exclusive and —path required for —parallel. 

Any reason to not just make all export parallel rather than supporting two different modes?

-Jake


Sent from my iPhone

> On Aug 22, 2017, at 12:27 PM, Kenneth Howe <kh...@pivotal.io> wrote:
> 
> I agrees that overloading the “file” option seems like a bad idea. As an alternative to separate commands, what about mutually exclusive options, ‘—file’ and ‘—dir’?
> 
> If you go for implementing the new functionality as a separate command, I would suggest calling the gfsh commands: “export data-parallel” and “import data-parallel"
> 
>> On Aug 22, 2017, at 11:32 AM, Nick Reich <nr...@pivotal.io> wrote:
>> 
>> Team,
>> 
>> I am working on exposing the parallel export/import of snapshots through
>> gfsh and would appreciate input on the best approach to adding to /
>> updating the existing interface.
>> 
>> Currently, ExportDataCommand and ImportDataCommand take a region name, a
>> member to run the command on, and a file location (that must end in .gfd).
>> Parallel import and export require a directory location instead of a single
>> file name (as there can be multiple files and need for uniquely named
>> files). It is possible to add a parallel flag and have the meaning of the
>> "file" parameter be different depending on that flag, but that seems overly
>> confusing to me. I am instead leaning towards creating new commands (e.g.
>> ParallelExportDataCommand) that has a "directory" parameter to replace
>> "file", but is otherwise identical in usage to the existing commands.
>> 
>> Do others have different views or approaches to suggest?
> 

Re: Adding parallel import/export of snapshots to gfsh

Posted by Kenneth Howe <kh...@pivotal.io>.
I agrees that overloading the “file” option seems like a bad idea. As an alternative to separate commands, what about mutually exclusive options, ‘—file’ and ‘—dir’?

If you go for implementing the new functionality as a separate command, I would suggest calling the gfsh commands: “export data-parallel” and “import data-parallel"

> On Aug 22, 2017, at 11:32 AM, Nick Reich <nr...@pivotal.io> wrote:
> 
> Team,
> 
> I am working on exposing the parallel export/import of snapshots through
> gfsh and would appreciate input on the best approach to adding to /
> updating the existing interface.
> 
> Currently, ExportDataCommand and ImportDataCommand take a region name, a
> member to run the command on, and a file location (that must end in .gfd).
> Parallel import and export require a directory location instead of a single
> file name (as there can be multiple files and need for uniquely named
> files). It is possible to add a parallel flag and have the meaning of the
> "file" parameter be different depending on that flag, but that seems overly
> confusing to me. I am instead leaning towards creating new commands (e.g.
> ParallelExportDataCommand) that has a "directory" parameter to replace
> "file", but is otherwise identical in usage to the existing commands.
> 
> Do others have different views or approaches to suggest?