You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Dmitriy Govorukhin <dm...@gmail.com> on 2018/06/30 19:37:55 UTC

data extractor

Igniters,

I am working on IGNITE-7644
<https://issues.apache.org/jira/browse/IGNITE-7644> (export all key-value
data from a persisted partition),
it will be command line tool for extracting data from Ignite partition file
without the need to start node.
The main motivation is to have a lifebuoy in case if a file has damage for
some reason.

I suggest simple API and two commands for the first implementation:

-c
--CRC [srcPath] - check CRC for all(or by type) pages in partition

-e
--extract [srcPath] [outPath] - dump all survey data from partition to
another file with raw key/value pair format
(required graceful stop for a node, not necessary after --restore will be
implemented)

Output file format see in attached, this format does not contain any index
inside but it is very simple and
flexible for future works with raw key/value data.

Future features:
-u
--upload - reload raw key/value pairs to node

-s
--status - check current node file status, need binary recovery or not
(node crash on the middle of a checkpoint)

-r
--restore - restore binary consistency (finish checkpoint, required WAL
file for recovery)

Let's start a discussion, any comments are welcome.

Re: data extractor

Posted by Dmitriy Govorukhin <dm...@gmail.com>.
Alexey,

1. The utility will extract raw payload bytes. If you want to build binary
object or Java class instances you will need binary/marshaller metadata.
If two grid will have different metadata, you should move metadata as well
as dumped data for construct binary objects on another grid.
Do you have any ideas on how we can improve this approach?

2. I do not think that I understood your idea, please explain in more
details who do you want to use the utility in checkpoint statistic?

3. In the first implementation, I prefer simple *file path* approach, you
can specify a path as a parameter to some partition file or directory
cache/group or root to caches/groups directory.

4. I have not had time to work out how we will upload date to another grid.
Any ideas are welcome.


On Mon, Jul 2, 2018 at 5:34 PM Alexey Goncharuk <al...@gmail.com>
wrote:

> Dmitriy,
>
> A few questions regarding the user cases for the utility:
> 1) Would I be able to read the extracted data from the dumped file without
> Ignite node binary/marshaller metadata? In other words, will I be able to
> move only the dumped file to another grid or will I need to move the
> metadata as well?
> 2) Are you planning to add a public API version of this utility as a part
> of Ignite? For example, if I am planning to run some statistics on a
> checkpointed data, will I be able to get some sort of an iterator to
> process this data?
> 3) How a user will choose which caches (cache groups) to process? Will the
> user need to provide a cache or cache ID (or either of them)? Will the
> utility be able to extract a single cache data from a cache group?
> 4) I think the upload part of the utility is missing some input parameters
> - for example, what cluster to connect to, what caches to upload to, etc.
>
> сб, 30 июн. 2018 г. в 22:38, Dmitriy Govorukhin <
> dmitriy.govorukhin@gmail.com>:
>
> > Igniters,
> >
> > I am working on IGNITE-7644
> > <https://issues.apache.org/jira/browse/IGNITE-7644> (export all
> key-value
> > data from a persisted partition),
> > it will be command line tool for extracting data from Ignite partition
> > file without the need to start node.
> > The main motivation is to have a lifebuoy in case if a file has damage
> for
> > some reason.
> >
> > I suggest simple API and two commands for the first implementation:
> >
> > -c
> > --CRC [srcPath] - check CRC for all(or by type) pages in partition
> >
> > -e
> > --extract [srcPath] [outPath] - dump all survey data from partition to
> > another file with raw key/value pair format
> > (required graceful stop for a node, not necessary after --restore will be
> > implemented)
> >
> > Output file format see in attached, this format does not contain any
> index
> > inside but it is very simple and
> > flexible for future works with raw key/value data.
> >
> > Future features:
> > -u
> > --upload - reload raw key/value pairs to node
> >
> > -s
> > --status - check current node file status, need binary recovery or not
> > (node crash on the middle of a checkpoint)
> >
> > -r
> > --restore - restore binary consistency (finish checkpoint, required WAL
> > file for recovery)
> >
> > Let's start a discussion, any comments are welcome.
> >
> >
>

Re: data extractor

Posted by Alexey Goncharuk <al...@gmail.com>.
Dmitriy,

A few questions regarding the user cases for the utility:
1) Would I be able to read the extracted data from the dumped file without
Ignite node binary/marshaller metadata? In other words, will I be able to
move only the dumped file to another grid or will I need to move the
metadata as well?
2) Are you planning to add a public API version of this utility as a part
of Ignite? For example, if I am planning to run some statistics on a
checkpointed data, will I be able to get some sort of an iterator to
process this data?
3) How a user will choose which caches (cache groups) to process? Will the
user need to provide a cache or cache ID (or either of them)? Will the
utility be able to extract a single cache data from a cache group?
4) I think the upload part of the utility is missing some input parameters
- for example, what cluster to connect to, what caches to upload to, etc.

сб, 30 июн. 2018 г. в 22:38, Dmitriy Govorukhin <
dmitriy.govorukhin@gmail.com>:

> Igniters,
>
> I am working on IGNITE-7644
> <https://issues.apache.org/jira/browse/IGNITE-7644> (export all key-value
> data from a persisted partition),
> it will be command line tool for extracting data from Ignite partition
> file without the need to start node.
> The main motivation is to have a lifebuoy in case if a file has damage for
> some reason.
>
> I suggest simple API and two commands for the first implementation:
>
> -c
> --CRC [srcPath] - check CRC for all(or by type) pages in partition
>
> -e
> --extract [srcPath] [outPath] - dump all survey data from partition to
> another file with raw key/value pair format
> (required graceful stop for a node, not necessary after --restore will be
> implemented)
>
> Output file format see in attached, this format does not contain any index
> inside but it is very simple and
> flexible for future works with raw key/value data.
>
> Future features:
> -u
> --upload - reload raw key/value pairs to node
>
> -s
> --status - check current node file status, need binary recovery or not
> (node crash on the middle of a checkpoint)
>
> -r
> --restore - restore binary consistency (finish checkpoint, required WAL
> file for recovery)
>
> Let's start a discussion, any comments are welcome.
>
>

Re: data extractor

Posted by Dmitriy Govorukhin <dm...@gmail.com>.
Nikolay,

I think we won't support extract from encrypted store In the first
implementation.
I guess we can support the encrypted store in future, or you have a reason
why we should do it in first?


On Sun, Jul 1, 2018 at 11:48 AM Nikolay Izhikov <ni...@apache.org> wrote:

> Hello, Dmitriy.
>
> Should we support extraction of encrypted data?
>
> There will be 2 type of keys we should load to successfully extract data:
>
> * master key: keystore + password required.
> * cache keys: masterkey + access to metastore required.
>
> TDE task is almost done, please, take a look.
>
> ticket - https://issues.apache.org/jira/browse/IGNITE-8485
> prototype - https://github.com/apache/ignite/pull/4167
> spi -
> https://github.com/apache/ignite/pull/4167/files#diff-9a792ab0e6971f202d22d530af0ac933
>
> В Сб, 30/06/2018 в 22:37 +0300, Dmitriy Govorukhin пишет:
> > Igniters,
> >
> > I am working on IGNITE-7644 (export all key-value data from a persisted
> partition),
> > it will be command line tool for extracting data from Ignite partition
> file without the need to start node.
> > The main motivation is to have a lifebuoy in case if a file has damage
> for some reason.
> >
> > I suggest simple API and two commands for the first implementation:
> >
> > -c
> > --CRC [srcPath] - check CRC for all(or by type) pages in partition
> >
> > -e
> > --extract [srcPath] [outPath] - dump all survey data from partition to
> another file with raw key/value pair format
> > (required graceful stop for a node, not necessary after --restore will
> be implemented)
> >
> > Output file format see in attached, this format does not contain any
> index inside but it is very simple and
> > flexible for future works with raw key/value data.
> >
> > Future features:
> > -u
> > --upload - reload raw key/value pairs to node
> >
> > -s
> > --status - check current node file status, need binary recovery or not
> (node crash on the middle of a checkpoint)
> >
> > -r
> > --restore - restore binary consistency (finish checkpoint, required WAL
> file for recovery)
> >
> > Let's start a discussion, any comments are welcome.
> >

Re: data extractor

Posted by Nikolay Izhikov <ni...@apache.org>.
Hello, Dmitriy.

Should we support extraction of encrypted data?

There will be 2 type of keys we should load to successfully extract data:

* master key: keystore + password required.
* cache keys: masterkey + access to metastore required.

TDE task is almost done, please, take a look.

ticket - https://issues.apache.org/jira/browse/IGNITE-8485
prototype - https://github.com/apache/ignite/pull/4167
spi - https://github.com/apache/ignite/pull/4167/files#diff-9a792ab0e6971f202d22d530af0ac933

В Сб, 30/06/2018 в 22:37 +0300, Dmitriy Govorukhin пишет:
> Igniters,
> 
> I am working on IGNITE-7644 (export all key-value data from a persisted partition),
> it will be command line tool for extracting data from Ignite partition file without the need to start node.
> The main motivation is to have a lifebuoy in case if a file has damage for some reason. 
> 
> I suggest simple API and two commands for the first implementation:
> 
> -c
> --CRC [srcPath] - check CRC for all(or by type) pages in partition
> 
> -e
> --extract [srcPath] [outPath] - dump all survey data from partition to another file with raw key/value pair format 
> (required graceful stop for a node, not necessary after --restore will be implemented)
> 
> Output file format see in attached, this format does not contain any index inside but it is very simple and
> flexible for future works with raw key/value data.
> 
> Future features:
> -u
> --upload - reload raw key/value pairs to node
> 
> -s
> --status - check current node file status, need binary recovery or not (node crash on the middle of a checkpoint)
> 
> -r
> --restore - restore binary consistency (finish checkpoint, required WAL file for recovery)
> 
> Let's start a discussion, any comments are welcome.
>