You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by "Ivan V." <iv...@gridgain.com> on 2015/11/20 17:15:56 UTC

Igfs PURGE events: do we need them?

Hi, dev,
need opinions on the question discussed in
https://issues.apache.org/jira/browse/IGNITE-1679  (IGFS: Purge event is
inconsistent).
In short: in Igfs we have "soft" delete that moves the deleted file or
folder to special "TRASH" folder.
Special async worker walks inside TRASH and removes the items permanently.
When an item is completely removed, an event of type
org.apache.ignite.events.EventType#EVT_IGFS_FILE_PURGED  is fired.
But such events are now fired only for files, and only in case if such file
was deleted itself, but not a part of a folder sub-tree. It's quite obvious
that such behavior is not quite consistent, so we should either get rid of
PURGE events at all, or make them consistent.
In the latter case it would be good to have answer to the question: what
are real  use cases when we may need the purge events ? (Now they seem to
be used in tests only).
If we don't have such real use cases, are there any objections to get rid
of the purge events at all?
Thanks in advance.

Re: Igfs PURGE events: do we need them?

Posted by "Ivan V." <iv...@gridgain.com>.
Hi, Dmitriy,
this is not difficult to support the events properly, we just need to store
last path of each file or make sure it is guaranteed to be calculatable
when the file is already in TRASH (this is needed because each PURGE event
must contain the path of the file that is purged).

On Fri, Nov 20, 2015 at 10:45 PM, Dmitriy Setrakyan <ds...@apache.org>
wrote:

> On Fri, Nov 20, 2015 at 11:08 AM, Ivan V. <iv...@gridgain.com>
> wrote:
>
> > Hi, Dmitriy,
> > to wait for memory freeing we have
> > method
> > org.apache.ignite.internal.processors.igfs.IgfsEx#awaitDeletesAsync()
> > which returns a Future that can be awaited (with a timeout or without).
> > Also during recent fix https://issues.apache.org/jira/browse/IGNITE-1510
> > we
> > introduced new method IgfsEx#clear(IgfsPath) that deletes the specified
> > path and waits for the garbage data cleanup.
> > These methods have more or less convenient usage pattern.
> > But it is much more difficult to use PURGE events in practice. E.g. how
> to
> > know how many events to expect, and how to track what events have
> arrived,
> > and what have not?
> >
>
> Ivan, I see your point. There are 2 ways to resolve it, we either deprecate
> the event, or we support it properly. How difficult, in your opinion, it
> would be to support this even properly.
>
>
> >
> > On Fri, Nov 20, 2015 at 9:10 PM, Dmitriy Setrakyan <
> dsetrakyan@apache.org>
> > wrote:
> >
> > > Ivan,
> > >
> > > The importance of the PURGE event has to do with notification about
> > freeing
> > > memory, otherwise occupied by a deleted file.
> > >
> > > How hard do you think would be making the PURGE behavior consistent
> > between
> > > directory and file deletions?
> > >
> > > D
> > >
> > > On Fri, Nov 20, 2015 at 8:15 AM, Ivan V. <iv...@gridgain.com>
> > > wrote:
> > >
> > > > Hi, dev,
> > > > need opinions on the question discussed in
> > > > https://issues.apache.org/jira/browse/IGNITE-1679  (IGFS: Purge
> event
> > is
> > > > inconsistent).
> > > > In short: in Igfs we have "soft" delete that moves the deleted file
> or
> > > > folder to special "TRASH" folder.
> > > > Special async worker walks inside TRASH and removes the items
> > > permanently.
> > > > When an item is completely removed, an event of type
> > > > org.apache.ignite.events.EventType#EVT_IGFS_FILE_PURGED  is fired.
> > > > But such events are now fired only for files, and only in case if
> such
> > > file
> > > > was deleted itself, but not a part of a folder sub-tree. It's quite
> > > obvious
> > > > that such behavior is not quite consistent, so we should either get
> rid
> > > of
> > > > PURGE events at all, or make them consistent.
> > > > In the latter case it would be good to have answer to the question:
> > what
> > > > are real  use cases when we may need the purge events ? (Now they
> seem
> > to
> > > > be used in tests only).
> > > > If we don't have such real use cases, are there any objections to get
> > rid
> > > > of the purge events at all?
> > > > Thanks in advance.
> > > >
> > >
> >
>

Re: Igfs PURGE events: do we need them?

Posted by Dmitriy Setrakyan <ds...@apache.org>.
On Fri, Nov 20, 2015 at 11:08 AM, Ivan V. <iv...@gridgain.com> wrote:

> Hi, Dmitriy,
> to wait for memory freeing we have
> method
> org.apache.ignite.internal.processors.igfs.IgfsEx#awaitDeletesAsync()
> which returns a Future that can be awaited (with a timeout or without).
> Also during recent fix https://issues.apache.org/jira/browse/IGNITE-1510
> we
> introduced new method IgfsEx#clear(IgfsPath) that deletes the specified
> path and waits for the garbage data cleanup.
> These methods have more or less convenient usage pattern.
> But it is much more difficult to use PURGE events in practice. E.g. how to
> know how many events to expect, and how to track what events have arrived,
> and what have not?
>

Ivan, I see your point. There are 2 ways to resolve it, we either deprecate
the event, or we support it properly. How difficult, in your opinion, it
would be to support this even properly.


>
> On Fri, Nov 20, 2015 at 9:10 PM, Dmitriy Setrakyan <ds...@apache.org>
> wrote:
>
> > Ivan,
> >
> > The importance of the PURGE event has to do with notification about
> freeing
> > memory, otherwise occupied by a deleted file.
> >
> > How hard do you think would be making the PURGE behavior consistent
> between
> > directory and file deletions?
> >
> > D
> >
> > On Fri, Nov 20, 2015 at 8:15 AM, Ivan V. <iv...@gridgain.com>
> > wrote:
> >
> > > Hi, dev,
> > > need opinions on the question discussed in
> > > https://issues.apache.org/jira/browse/IGNITE-1679  (IGFS: Purge event
> is
> > > inconsistent).
> > > In short: in Igfs we have "soft" delete that moves the deleted file or
> > > folder to special "TRASH" folder.
> > > Special async worker walks inside TRASH and removes the items
> > permanently.
> > > When an item is completely removed, an event of type
> > > org.apache.ignite.events.EventType#EVT_IGFS_FILE_PURGED  is fired.
> > > But such events are now fired only for files, and only in case if such
> > file
> > > was deleted itself, but not a part of a folder sub-tree. It's quite
> > obvious
> > > that such behavior is not quite consistent, so we should either get rid
> > of
> > > PURGE events at all, or make them consistent.
> > > In the latter case it would be good to have answer to the question:
> what
> > > are real  use cases when we may need the purge events ? (Now they seem
> to
> > > be used in tests only).
> > > If we don't have such real use cases, are there any objections to get
> rid
> > > of the purge events at all?
> > > Thanks in advance.
> > >
> >
>

Re: Igfs PURGE events: do we need them?

Posted by "Ivan V." <iv...@gridgain.com>.
Hi, Dmitriy,
to wait for memory freeing we have
method org.apache.ignite.internal.processors.igfs.IgfsEx#awaitDeletesAsync()
which returns a Future that can be awaited (with a timeout or without).
Also during recent fix https://issues.apache.org/jira/browse/IGNITE-1510 we
introduced new method IgfsEx#clear(IgfsPath) that deletes the specified
path and waits for the garbage data cleanup.
These methods have more or less convenient usage pattern.
But it is much more difficult to use PURGE events in practice. E.g. how to
know how many events to expect, and how to track what events have arrived,
and what have not?

On Fri, Nov 20, 2015 at 9:10 PM, Dmitriy Setrakyan <ds...@apache.org>
wrote:

> Ivan,
>
> The importance of the PURGE event has to do with notification about freeing
> memory, otherwise occupied by a deleted file.
>
> How hard do you think would be making the PURGE behavior consistent between
> directory and file deletions?
>
> D
>
> On Fri, Nov 20, 2015 at 8:15 AM, Ivan V. <iv...@gridgain.com>
> wrote:
>
> > Hi, dev,
> > need opinions on the question discussed in
> > https://issues.apache.org/jira/browse/IGNITE-1679  (IGFS: Purge event is
> > inconsistent).
> > In short: in Igfs we have "soft" delete that moves the deleted file or
> > folder to special "TRASH" folder.
> > Special async worker walks inside TRASH and removes the items
> permanently.
> > When an item is completely removed, an event of type
> > org.apache.ignite.events.EventType#EVT_IGFS_FILE_PURGED  is fired.
> > But such events are now fired only for files, and only in case if such
> file
> > was deleted itself, but not a part of a folder sub-tree. It's quite
> obvious
> > that such behavior is not quite consistent, so we should either get rid
> of
> > PURGE events at all, or make them consistent.
> > In the latter case it would be good to have answer to the question: what
> > are real  use cases when we may need the purge events ? (Now they seem to
> > be used in tests only).
> > If we don't have such real use cases, are there any objections to get rid
> > of the purge events at all?
> > Thanks in advance.
> >
>

Re: Igfs PURGE events: do we need them?

Posted by Dmitriy Setrakyan <ds...@apache.org>.
Ivan,

The importance of the PURGE event has to do with notification about freeing
memory, otherwise occupied by a deleted file.

How hard do you think would be making the PURGE behavior consistent between
directory and file deletions?

D

On Fri, Nov 20, 2015 at 8:15 AM, Ivan V. <iv...@gridgain.com> wrote:

> Hi, dev,
> need opinions on the question discussed in
> https://issues.apache.org/jira/browse/IGNITE-1679  (IGFS: Purge event is
> inconsistent).
> In short: in Igfs we have "soft" delete that moves the deleted file or
> folder to special "TRASH" folder.
> Special async worker walks inside TRASH and removes the items permanently.
> When an item is completely removed, an event of type
> org.apache.ignite.events.EventType#EVT_IGFS_FILE_PURGED  is fired.
> But such events are now fired only for files, and only in case if such file
> was deleted itself, but not a part of a folder sub-tree. It's quite obvious
> that such behavior is not quite consistent, so we should either get rid of
> PURGE events at all, or make them consistent.
> In the latter case it would be good to have answer to the question: what
> are real  use cases when we may need the purge events ? (Now they seem to
> be used in tests only).
> If we don't have such real use cases, are there any objections to get rid
> of the purge events at all?
> Thanks in advance.
>

Re: Igfs PURGE events: do we need them?

Posted by Vladimir Ozerov <vo...@gridgain.com>.
Cos,

I agree with Ivan that there should be no architectural problems with our
"trash" concept. Essentialy, when delete is performed in DUAL mode, we do
two things:
1) Propagate removal to secondary file system;
2) Logically move affected path to "trash" in IGFS.

Once item is in trash, no one else will be able to operate on it or it's
children. Moreover, "trashed" item is decoupled from secondary file system
(to the contrast with normal entities).
We had some consistency problems when parts of "trashed" entry could be
resurrected, but for now they are fixed.

Vladimir.


On Thu, Nov 26, 2015 at 1:11 PM, Ivan V. <iv...@gridgain.com> wrote:

> Hi, Konstantin,
> currently we work on IGFS failovers and reliability, so IGFS should behave
> correctly in any expected use case.
> It's not quite clear to me what does "when an _optional_ secondary file
> system is plugged in" mean.
> Can you please explain this use case in more detail?
>
>
> On Thu, Nov 26, 2015 at 5:36 AM, Konstantin Boudnik <co...@apache.org>
> wrote:
>
> > On Wed, Nov 25, 2015 at 04:54PM, Ivan V. wrote:
> > > Hi, Konstantin,
> > > "TRASH" (the name comes from
> > > org.apache.ignite.internal.processors.igfs.IgfsFileInfo#TRASH_ID Java
> > > constant) notion is only applicable to primary (IGFS) file system. This
> > is
> > > a special "folder" that does not have file system path. When IGFS is
> > > running over a secondary Fs, TRASH also exists in the primary IGFS, but
> > > does not exist in the secondary Fs.
> > > In secondary Fs deletion is performed just through the ordinary Fs API.
> > So,
> > > we *do not* employ any assumption regarding the TRASH existence and
> > > behavior in the secondary Fs.
> > >
> > > As Dmitriy mentioned above, TRASH in primary Fs is needed for
> performance
> > > reasons: with it we delete file with only 1 transaction in Meta cache:
> we
> > > do not do any transactions in Data cache.
> > > (Similar technique is frequently applied frequently in real Fs
> deletion,
> > > like mv foo /tmp/ && rm -r /tmp/foo/ .)
> >
> > I understand. However, I am wary about the potentially funny and
> > inconsistent
> > cases when an _optional_ secondary file system is plugged in.
> >
> > Cos
> >
> > > Currently we have fix of
> > https://issues.apache.org/jira/browse/IGNITE-1679
> > > that makes PURGE events enabled for all files.
> > > I still not quite realize how this functionality will be used by
> > customers,
> > > but now it is repaired: once merged, you will be able to use it.
> > >
> > > On Tue, Nov 24, 2015 at 2:52 AM, Dmitriy Setrakyan <
> > dsetrakyan@apache.org>
> > > wrote:
> > >
> > > > Cos,
> > > >
> > > > The main reason soft delete was added is performance. Without
> > soft-delete,
> > > > the delete operation would have to wait until a file is fully deleted
> > from
> > > > a folder, which may take time.
> > > >
> > > > As far as secondary FS handling it, IGFS does not require a secondary
> > FS,
> > > > so we should account for cases when IGFS is running stand-alone.
> > > >
> > > > Thoughts?
> > > >
> > > > D.
> > > >
> > > > On Mon, Nov 23, 2015 at 11:00 PM, Konstantin Boudnik <cos@apache.org
> >
> > > > wrote:
> > > >
> > > > > Let me ask a different question: what's the point of having the
> > concept
> > > > of
> > > > > TRASH?
> > > > >
> > > > > Here's an example why I think the 'soft' delete would only
> complicate
> > > > > thing.
> > > > > Suppose IGFS is sitting on top of HDFS and both have 'Trash'
> enabled.
> > > > Now,
> > > > > the file is getting soft-deleted from IGFS and is moved to TRASH
> > folder.
> > > > > But
> > > > > in HDFS it is also a move to a place which doesn't have any special
> > > > meaning
> > > > > for HDFS.
> > > > >
> > > > > Even worst, if IFGS TRASH is linked to HDFS .Trash. HDFS has it's
> own
> > > > > policy
> > > > > on how to clean that up, which is likely to be different from that
> on
> > > > IGFS.
> > > > > Often enough, HDFS .Trash is simply disabled. This discrepancy is
> > going
> > > > to
> > > > > create a situation when a file should still be in TRASH, but the
> > > > secondary
> > > > > FS
> > > > > has already purged it.
> > > > >
> > > > > And what if yet another secondary file system like S3 has yet
> another
> > > > > policy
> > > > > around their own trash, which they don't even have, I believe?
> > > > >
> > > > > Where I am going with this is pretty straight forward: let's drop
> the
> > > > > soft-delete support and let the secondary FS to deal with it. If
> > there's
> > > > no
> > > > > secondary FS configured - the content of deleted file will have to
> > > > > retrieved
> > > > > by other means.
> > > > >
> > > > > Thoughts?
> > > > >   Cos
> > > > >
> > > > > On Fri, Nov 20, 2015 at 07:15PM, Ivan V. wrote:
> > > > > > Hi, dev,
> > > > > > need opinions on the question discussed in
> > > > > > https://issues.apache.org/jira/browse/IGNITE-1679  (IGFS: Purge
> > event
> > > > is
> > > > > > inconsistent).
> > > > > > In short: in Igfs we have "soft" delete that moves the deleted
> > file or
> > > > > > folder to special "TRASH" folder.
> > > > > > Special async worker walks inside TRASH and removes the items
> > > > > permanently.
> > > > > > When an item is completely removed, an event of type
> > > > > > org.apache.ignite.events.EventType#EVT_IGFS_FILE_PURGED  is
> fired.
> > > > > > But such events are now fired only for files, and only in case if
> > such
> > > > > file
> > > > > > was deleted itself, but not a part of a folder sub-tree. It's
> quite
> > > > > obvious
> > > > > > that such behavior is not quite consistent, so we should either
> > get rid
> > > > > of
> > > > > > PURGE events at all, or make them consistent.
> > > > > > In the latter case it would be good to have answer to the
> question:
> > > > what
> > > > > > are real  use cases when we may need the purge events ? (Now they
> > seem
> > > > to
> > > > > > be used in tests only).
> > > > > > If we don't have such real use cases, are there any objections to
> > get
> > > > rid
> > > > > > of the purge events at all?
> > > > > > Thanks in advance.
> > > > >
> > > >
> >
>

Re: Igfs PURGE events: do we need them?

Posted by "Ivan V." <iv...@gridgain.com>.
Hi, Konstantin,
currently we work on IGFS failovers and reliability, so IGFS should behave
correctly in any expected use case.
It's not quite clear to me what does "when an _optional_ secondary file
system is plugged in" mean.
Can you please explain this use case in more detail?


On Thu, Nov 26, 2015 at 5:36 AM, Konstantin Boudnik <co...@apache.org> wrote:

> On Wed, Nov 25, 2015 at 04:54PM, Ivan V. wrote:
> > Hi, Konstantin,
> > "TRASH" (the name comes from
> > org.apache.ignite.internal.processors.igfs.IgfsFileInfo#TRASH_ID Java
> > constant) notion is only applicable to primary (IGFS) file system. This
> is
> > a special "folder" that does not have file system path. When IGFS is
> > running over a secondary Fs, TRASH also exists in the primary IGFS, but
> > does not exist in the secondary Fs.
> > In secondary Fs deletion is performed just through the ordinary Fs API.
> So,
> > we *do not* employ any assumption regarding the TRASH existence and
> > behavior in the secondary Fs.
> >
> > As Dmitriy mentioned above, TRASH in primary Fs is needed for performance
> > reasons: with it we delete file with only 1 transaction in Meta cache: we
> > do not do any transactions in Data cache.
> > (Similar technique is frequently applied frequently in real Fs deletion,
> > like mv foo /tmp/ && rm -r /tmp/foo/ .)
>
> I understand. However, I am wary about the potentially funny and
> inconsistent
> cases when an _optional_ secondary file system is plugged in.
>
> Cos
>
> > Currently we have fix of
> https://issues.apache.org/jira/browse/IGNITE-1679
> > that makes PURGE events enabled for all files.
> > I still not quite realize how this functionality will be used by
> customers,
> > but now it is repaired: once merged, you will be able to use it.
> >
> > On Tue, Nov 24, 2015 at 2:52 AM, Dmitriy Setrakyan <
> dsetrakyan@apache.org>
> > wrote:
> >
> > > Cos,
> > >
> > > The main reason soft delete was added is performance. Without
> soft-delete,
> > > the delete operation would have to wait until a file is fully deleted
> from
> > > a folder, which may take time.
> > >
> > > As far as secondary FS handling it, IGFS does not require a secondary
> FS,
> > > so we should account for cases when IGFS is running stand-alone.
> > >
> > > Thoughts?
> > >
> > > D.
> > >
> > > On Mon, Nov 23, 2015 at 11:00 PM, Konstantin Boudnik <co...@apache.org>
> > > wrote:
> > >
> > > > Let me ask a different question: what's the point of having the
> concept
> > > of
> > > > TRASH?
> > > >
> > > > Here's an example why I think the 'soft' delete would only complicate
> > > > thing.
> > > > Suppose IGFS is sitting on top of HDFS and both have 'Trash' enabled.
> > > Now,
> > > > the file is getting soft-deleted from IGFS and is moved to TRASH
> folder.
> > > > But
> > > > in HDFS it is also a move to a place which doesn't have any special
> > > meaning
> > > > for HDFS.
> > > >
> > > > Even worst, if IFGS TRASH is linked to HDFS .Trash. HDFS has it's own
> > > > policy
> > > > on how to clean that up, which is likely to be different from that on
> > > IGFS.
> > > > Often enough, HDFS .Trash is simply disabled. This discrepancy is
> going
> > > to
> > > > create a situation when a file should still be in TRASH, but the
> > > secondary
> > > > FS
> > > > has already purged it.
> > > >
> > > > And what if yet another secondary file system like S3 has yet another
> > > > policy
> > > > around their own trash, which they don't even have, I believe?
> > > >
> > > > Where I am going with this is pretty straight forward: let's drop the
> > > > soft-delete support and let the secondary FS to deal with it. If
> there's
> > > no
> > > > secondary FS configured - the content of deleted file will have to
> > > > retrieved
> > > > by other means.
> > > >
> > > > Thoughts?
> > > >   Cos
> > > >
> > > > On Fri, Nov 20, 2015 at 07:15PM, Ivan V. wrote:
> > > > > Hi, dev,
> > > > > need opinions on the question discussed in
> > > > > https://issues.apache.org/jira/browse/IGNITE-1679  (IGFS: Purge
> event
> > > is
> > > > > inconsistent).
> > > > > In short: in Igfs we have "soft" delete that moves the deleted
> file or
> > > > > folder to special "TRASH" folder.
> > > > > Special async worker walks inside TRASH and removes the items
> > > > permanently.
> > > > > When an item is completely removed, an event of type
> > > > > org.apache.ignite.events.EventType#EVT_IGFS_FILE_PURGED  is fired.
> > > > > But such events are now fired only for files, and only in case if
> such
> > > > file
> > > > > was deleted itself, but not a part of a folder sub-tree. It's quite
> > > > obvious
> > > > > that such behavior is not quite consistent, so we should either
> get rid
> > > > of
> > > > > PURGE events at all, or make them consistent.
> > > > > In the latter case it would be good to have answer to the question:
> > > what
> > > > > are real  use cases when we may need the purge events ? (Now they
> seem
> > > to
> > > > > be used in tests only).
> > > > > If we don't have such real use cases, are there any objections to
> get
> > > rid
> > > > > of the purge events at all?
> > > > > Thanks in advance.
> > > >
> > >
>

Re: Igfs PURGE events: do we need them?

Posted by Konstantin Boudnik <co...@apache.org>.
On Wed, Nov 25, 2015 at 04:54PM, Ivan V. wrote:
> Hi, Konstantin,
> "TRASH" (the name comes from
> org.apache.ignite.internal.processors.igfs.IgfsFileInfo#TRASH_ID Java
> constant) notion is only applicable to primary (IGFS) file system. This is
> a special "folder" that does not have file system path. When IGFS is
> running over a secondary Fs, TRASH also exists in the primary IGFS, but
> does not exist in the secondary Fs.
> In secondary Fs deletion is performed just through the ordinary Fs API. So,
> we *do not* employ any assumption regarding the TRASH existence and
> behavior in the secondary Fs.
> 
> As Dmitriy mentioned above, TRASH in primary Fs is needed for performance
> reasons: with it we delete file with only 1 transaction in Meta cache: we
> do not do any transactions in Data cache.
> (Similar technique is frequently applied frequently in real Fs deletion,
> like mv foo /tmp/ && rm -r /tmp/foo/ .)

I understand. However, I am wary about the potentially funny and inconsistent
cases when an _optional_ secondary file system is plugged in.

Cos

> Currently we have fix of https://issues.apache.org/jira/browse/IGNITE-1679
> that makes PURGE events enabled for all files.
> I still not quite realize how this functionality will be used by customers,
> but now it is repaired: once merged, you will be able to use it.
> 
> On Tue, Nov 24, 2015 at 2:52 AM, Dmitriy Setrakyan <ds...@apache.org>
> wrote:
> 
> > Cos,
> >
> > The main reason soft delete was added is performance. Without soft-delete,
> > the delete operation would have to wait until a file is fully deleted from
> > a folder, which may take time.
> >
> > As far as secondary FS handling it, IGFS does not require a secondary FS,
> > so we should account for cases when IGFS is running stand-alone.
> >
> > Thoughts?
> >
> > D.
> >
> > On Mon, Nov 23, 2015 at 11:00 PM, Konstantin Boudnik <co...@apache.org>
> > wrote:
> >
> > > Let me ask a different question: what's the point of having the concept
> > of
> > > TRASH?
> > >
> > > Here's an example why I think the 'soft' delete would only complicate
> > > thing.
> > > Suppose IGFS is sitting on top of HDFS and both have 'Trash' enabled.
> > Now,
> > > the file is getting soft-deleted from IGFS and is moved to TRASH folder.
> > > But
> > > in HDFS it is also a move to a place which doesn't have any special
> > meaning
> > > for HDFS.
> > >
> > > Even worst, if IFGS TRASH is linked to HDFS .Trash. HDFS has it's own
> > > policy
> > > on how to clean that up, which is likely to be different from that on
> > IGFS.
> > > Often enough, HDFS .Trash is simply disabled. This discrepancy is going
> > to
> > > create a situation when a file should still be in TRASH, but the
> > secondary
> > > FS
> > > has already purged it.
> > >
> > > And what if yet another secondary file system like S3 has yet another
> > > policy
> > > around their own trash, which they don't even have, I believe?
> > >
> > > Where I am going with this is pretty straight forward: let's drop the
> > > soft-delete support and let the secondary FS to deal with it. If there's
> > no
> > > secondary FS configured - the content of deleted file will have to
> > > retrieved
> > > by other means.
> > >
> > > Thoughts?
> > >   Cos
> > >
> > > On Fri, Nov 20, 2015 at 07:15PM, Ivan V. wrote:
> > > > Hi, dev,
> > > > need opinions on the question discussed in
> > > > https://issues.apache.org/jira/browse/IGNITE-1679  (IGFS: Purge event
> > is
> > > > inconsistent).
> > > > In short: in Igfs we have "soft" delete that moves the deleted file or
> > > > folder to special "TRASH" folder.
> > > > Special async worker walks inside TRASH and removes the items
> > > permanently.
> > > > When an item is completely removed, an event of type
> > > > org.apache.ignite.events.EventType#EVT_IGFS_FILE_PURGED  is fired.
> > > > But such events are now fired only for files, and only in case if such
> > > file
> > > > was deleted itself, but not a part of a folder sub-tree. It's quite
> > > obvious
> > > > that such behavior is not quite consistent, so we should either get rid
> > > of
> > > > PURGE events at all, or make them consistent.
> > > > In the latter case it would be good to have answer to the question:
> > what
> > > > are real  use cases when we may need the purge events ? (Now they seem
> > to
> > > > be used in tests only).
> > > > If we don't have such real use cases, are there any objections to get
> > rid
> > > > of the purge events at all?
> > > > Thanks in advance.
> > >
> >

Re: Igfs PURGE events: do we need them?

Posted by "Ivan V." <iv...@gridgain.com>.
Hi, Konstantin,
"TRASH" (the name comes from
org.apache.ignite.internal.processors.igfs.IgfsFileInfo#TRASH_ID Java
constant) notion is only applicable to primary (IGFS) file system. This is
a special "folder" that does not have file system path. When IGFS is
running over a secondary Fs, TRASH also exists in the primary IGFS, but
does not exist in the secondary Fs.
In secondary Fs deletion is performed just through the ordinary Fs API. So,
we *do not* employ any assumption regarding the TRASH existence and
behavior in the secondary Fs.

As Dmitriy mentioned above, TRASH in primary Fs is needed for performance
reasons: with it we delete file with only 1 transaction in Meta cache: we
do not do any transactions in Data cache.
(Similar technique is frequently applied frequently in real Fs deletion,
like mv foo /tmp/ && rm -r /tmp/foo/ .)

Currently we have fix of https://issues.apache.org/jira/browse/IGNITE-1679
that makes PURGE events enabled for all files.
I still not quite realize how this functionality will be used by customers,
but now it is repaired: once merged, you will be able to use it.

On Tue, Nov 24, 2015 at 2:52 AM, Dmitriy Setrakyan <ds...@apache.org>
wrote:

> Cos,
>
> The main reason soft delete was added is performance. Without soft-delete,
> the delete operation would have to wait until a file is fully deleted from
> a folder, which may take time.
>
> As far as secondary FS handling it, IGFS does not require a secondary FS,
> so we should account for cases when IGFS is running stand-alone.
>
> Thoughts?
>
> D.
>
> On Mon, Nov 23, 2015 at 11:00 PM, Konstantin Boudnik <co...@apache.org>
> wrote:
>
> > Let me ask a different question: what's the point of having the concept
> of
> > TRASH?
> >
> > Here's an example why I think the 'soft' delete would only complicate
> > thing.
> > Suppose IGFS is sitting on top of HDFS and both have 'Trash' enabled.
> Now,
> > the file is getting soft-deleted from IGFS and is moved to TRASH folder.
> > But
> > in HDFS it is also a move to a place which doesn't have any special
> meaning
> > for HDFS.
> >
> > Even worst, if IFGS TRASH is linked to HDFS .Trash. HDFS has it's own
> > policy
> > on how to clean that up, which is likely to be different from that on
> IGFS.
> > Often enough, HDFS .Trash is simply disabled. This discrepancy is going
> to
> > create a situation when a file should still be in TRASH, but the
> secondary
> > FS
> > has already purged it.
> >
> > And what if yet another secondary file system like S3 has yet another
> > policy
> > around their own trash, which they don't even have, I believe?
> >
> > Where I am going with this is pretty straight forward: let's drop the
> > soft-delete support and let the secondary FS to deal with it. If there's
> no
> > secondary FS configured - the content of deleted file will have to
> > retrieved
> > by other means.
> >
> > Thoughts?
> >   Cos
> >
> > On Fri, Nov 20, 2015 at 07:15PM, Ivan V. wrote:
> > > Hi, dev,
> > > need opinions on the question discussed in
> > > https://issues.apache.org/jira/browse/IGNITE-1679  (IGFS: Purge event
> is
> > > inconsistent).
> > > In short: in Igfs we have "soft" delete that moves the deleted file or
> > > folder to special "TRASH" folder.
> > > Special async worker walks inside TRASH and removes the items
> > permanently.
> > > When an item is completely removed, an event of type
> > > org.apache.ignite.events.EventType#EVT_IGFS_FILE_PURGED  is fired.
> > > But such events are now fired only for files, and only in case if such
> > file
> > > was deleted itself, but not a part of a folder sub-tree. It's quite
> > obvious
> > > that such behavior is not quite consistent, so we should either get rid
> > of
> > > PURGE events at all, or make them consistent.
> > > In the latter case it would be good to have answer to the question:
> what
> > > are real  use cases when we may need the purge events ? (Now they seem
> to
> > > be used in tests only).
> > > If we don't have such real use cases, are there any objections to get
> rid
> > > of the purge events at all?
> > > Thanks in advance.
> >
>

Re: Igfs PURGE events: do we need them?

Posted by Dmitriy Setrakyan <ds...@apache.org>.
Cos,

The main reason soft delete was added is performance. Without soft-delete,
the delete operation would have to wait until a file is fully deleted from
a folder, which may take time.

As far as secondary FS handling it, IGFS does not require a secondary FS,
so we should account for cases when IGFS is running stand-alone.

Thoughts?

D.

On Mon, Nov 23, 2015 at 11:00 PM, Konstantin Boudnik <co...@apache.org> wrote:

> Let me ask a different question: what's the point of having the concept of
> TRASH?
>
> Here's an example why I think the 'soft' delete would only complicate
> thing.
> Suppose IGFS is sitting on top of HDFS and both have 'Trash' enabled. Now,
> the file is getting soft-deleted from IGFS and is moved to TRASH folder.
> But
> in HDFS it is also a move to a place which doesn't have any special meaning
> for HDFS.
>
> Even worst, if IFGS TRASH is linked to HDFS .Trash. HDFS has it's own
> policy
> on how to clean that up, which is likely to be different from that on IGFS.
> Often enough, HDFS .Trash is simply disabled. This discrepancy is going to
> create a situation when a file should still be in TRASH, but the secondary
> FS
> has already purged it.
>
> And what if yet another secondary file system like S3 has yet another
> policy
> around their own trash, which they don't even have, I believe?
>
> Where I am going with this is pretty straight forward: let's drop the
> soft-delete support and let the secondary FS to deal with it. If there's no
> secondary FS configured - the content of deleted file will have to
> retrieved
> by other means.
>
> Thoughts?
>   Cos
>
> On Fri, Nov 20, 2015 at 07:15PM, Ivan V. wrote:
> > Hi, dev,
> > need opinions on the question discussed in
> > https://issues.apache.org/jira/browse/IGNITE-1679  (IGFS: Purge event is
> > inconsistent).
> > In short: in Igfs we have "soft" delete that moves the deleted file or
> > folder to special "TRASH" folder.
> > Special async worker walks inside TRASH and removes the items
> permanently.
> > When an item is completely removed, an event of type
> > org.apache.ignite.events.EventType#EVT_IGFS_FILE_PURGED  is fired.
> > But such events are now fired only for files, and only in case if such
> file
> > was deleted itself, but not a part of a folder sub-tree. It's quite
> obvious
> > that such behavior is not quite consistent, so we should either get rid
> of
> > PURGE events at all, or make them consistent.
> > In the latter case it would be good to have answer to the question: what
> > are real  use cases when we may need the purge events ? (Now they seem to
> > be used in tests only).
> > If we don't have such real use cases, are there any objections to get rid
> > of the purge events at all?
> > Thanks in advance.
>

Re: Igfs PURGE events: do we need them?

Posted by Konstantin Boudnik <co...@apache.org>.
Let me ask a different question: what's the point of having the concept of
TRASH?

Here's an example why I think the 'soft' delete would only complicate thing.
Suppose IGFS is sitting on top of HDFS and both have 'Trash' enabled. Now,
the file is getting soft-deleted from IGFS and is moved to TRASH folder. But
in HDFS it is also a move to a place which doesn't have any special meaning
for HDFS. 

Even worst, if IFGS TRASH is linked to HDFS .Trash. HDFS has it's own policy
on how to clean that up, which is likely to be different from that on IGFS.
Often enough, HDFS .Trash is simply disabled. This discrepancy is going to
create a situation when a file should still be in TRASH, but the secondary FS
has already purged it.

And what if yet another secondary file system like S3 has yet another policy
around their own trash, which they don't even have, I believe?

Where I am going with this is pretty straight forward: let's drop the
soft-delete support and let the secondary FS to deal with it. If there's no
secondary FS configured - the content of deleted file will have to retrieved
by other means.

Thoughts?
  Cos

On Fri, Nov 20, 2015 at 07:15PM, Ivan V. wrote:
> Hi, dev,
> need opinions on the question discussed in
> https://issues.apache.org/jira/browse/IGNITE-1679  (IGFS: Purge event is
> inconsistent).
> In short: in Igfs we have "soft" delete that moves the deleted file or
> folder to special "TRASH" folder.
> Special async worker walks inside TRASH and removes the items permanently.
> When an item is completely removed, an event of type
> org.apache.ignite.events.EventType#EVT_IGFS_FILE_PURGED  is fired.
> But such events are now fired only for files, and only in case if such file
> was deleted itself, but not a part of a folder sub-tree. It's quite obvious
> that such behavior is not quite consistent, so we should either get rid of
> PURGE events at all, or make them consistent.
> In the latter case it would be good to have answer to the question: what
> are real  use cases when we may need the purge events ? (Now they seem to
> be used in tests only).
> If we don't have such real use cases, are there any objections to get rid
> of the purge events at all?
> Thanks in advance.