You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Juan Rodríguez Hortalá <ju...@gmail.com> on 2018/01/05 21:31:08 UTC

Force flush of IGFS to secondary file system in DUAL_ASYNC mode

Hi,

When using IGFS with a secondary file system, with write behind configured
by using DUAL_ASYNC IgfsMode, is there any way to force the flush of the
data from the Ignite caches into the secondary file system? A possible
scenario here might be a temporary cluster with Ignite installed, that uses
IGFS with DUAL_ASYNC to write to an HDFS cluster running in a permanent
cluster that is configured as the secondary file system. In order to be
able to shutdown this cluster we need to know that all the data has been
flushed to HDFS or we might have data loss. For what I see in
http://apache-ignite-users.70518.x6.nabble.com/Flush-the-cache-into-the-persistence-store-manually-td5077.html
this wasn't available at the time that question was answered. The solution
proposed there seems to be traversing the cache writing each cached entry
to the data store that is cached. But for IGFS I understand that is not so
straightforward, because the dataCache and metadataCache used by IGFS don't
store the HDFS files directly, but the result of splitting them into
pieces.

Is there any way to flush the data from IGFS into HDFS? If not, is there
any recommendation about how we could traverse the dataCache and
metadataCache used by IGFS to manually write the data into HDFS? If we do
that traversal, is there any way to avoid the async writes of IGFS and the
write done in that traversal to interfere with each other, or lead to
duplicate writes?

Thanks a lot for your help!

Juan Rodriguez Hortala

Re: Force flush of IGFS to secondary file system in DUAL_ASYNC mode

Posted by Juan Rodríguez Hortalá <ju...@gmail.com>.
Hi llya,

Thanks a lot for the detailed answer. It's nice to know there is a clear
path to achieve that flush.

Greetings,

Juan

On Mon, Jan 8, 2018 at 4:33 AM, ilya.kasnacheev <il...@gmail.com>
wrote:

> Hello!
>
> After reviewing IGFS code, I think that you can do the following:
>
> You should save all file paths that are being migrated, and then call
> await(collectionWithAllFilePaths) on IgfsImpl. If it's a huge number of
> files, I imagine you can do this in batches.
>
> It will do the same synchronous wait that DUAL_SYNC would do, just from a
> different entry point. After await() returns you are safe to close IgfsImpl
> and shutdown your cluster.
>
> Note that I would like to have the same behaviour for
> IgfsImpl.close(cancel:
> false), but it's NOT there yet. I have filed
> https://issues.apache.org/jira/browse/IGNITE-7356 - do not hesitate to
> comment.
>
> Regards,
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Force flush of IGFS to secondary file system in DUAL_ASYNC mode

Posted by "ilya.kasnacheev" <il...@gmail.com>.
Hello!

After reviewing IGFS code, I think that you can do the following:

You should save all file paths that are being migrated, and then call
await(collectionWithAllFilePaths) on IgfsImpl. If it's a huge number of
files, I imagine you can do this in batches.

It will do the same synchronous wait that DUAL_SYNC would do, just from a
different entry point. After await() returns you are safe to close IgfsImpl
and shutdown your cluster.

Note that I would like to have the same behaviour for IgfsImpl.close(cancel:
false), but it's NOT there yet. I have filed
https://issues.apache.org/jira/browse/IGNITE-7356 - do not hesitate to
comment.

Regards,



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/