You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by pragmaticbigdata <am...@gmail.com> on 2017/01/24 14:00:37 UTC

IGFS Questions

I have some questions of deploying IGFS as a cache layer given that ignite
could be deployed both as a key-value store and as a file system

1. How does IGFS behave when deployed in standalone mode? I wanted to
confirm that there is no durability in this mode. Assuming I persist a
parquet file on IGFS, if the cluster goes down I lose the file, right?
2. Do we get the ability to specify the fact that the file stored in IGFS
could be both partitioned (with backup nodes) or replicated?
3. IGFS can act as a cache layer over HDFS and local file system, can it act
as a caching layer over S3 store?
4. As with the key-value store, can I configure tiered storage in IGFS i.e.
given that IGFS is configured with local file system as the secondary store
and the ignite cluster of 3 server nodes configured with 5GB memory each,
would the data spill over to the local disk if I try to load a 25GB file
into IGFS? If so, what is the configuration needed?
5. Can I configure local SSD disks as the secondary store for IGFS?
6. I browsed through the documentation but I didn't find the capability of
pinning and unpinning files in IGFS. I am looking something similar to what 
alluxio
<http://www.alluxio.org/docs/master/en/Tiered-Storage-on-Alluxio.html#pinning-files> 
provides. Can it be implemented?
7. Could you elaborate a bit on how IGFS Proxy Mode works? What is its
recommended use case?
8. With DUAL_ASYNC (write-behind mode), does ignite have failover guarantees
which is lacking in the key-value store -
https://issues.apache.org/jira/browse/IGNITE-1897?

Thanks in advance.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/IGFS-Questions-tp10217.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: IGFS Questions

Posted by vkulichenko <va...@gmail.com>.
pragmaticbigdata wrote
> I will spend sometime in understanding what this means but by "Hadoop
> compliant implementation" are you hinting that HDFS needs to be running
> even if I have S3 as the secondary file system?

It's any FS that has a connector that implements
org.apache.hadoop.fs.FileSystem. HDFS client is just one of many such file
systems.


pragmaticbigdata wrote
> I think my question got misunderstood. I wanted to know if IGFS can
> overflow to local disk whenever data does not fit in-memory?

If you enable swap space, evicted entries will be saved there. This provides
the overflow you're looking for.


pragmaticbigdata wrote
> Ok. I have misunderstood the local disk capability that was added as part
> of https://issues.apache.org/jira/browse/IGNITE-1926. I understood that
> IGFS could be backed up by local disk stores where each IFGS node would
> save the data loaded in-memory on that node to the disk the server has.
> Could you please elaborate on the shared disk implementation?

I think Vladimir meant a regular shared FS here. Secondary file system can't
be split across nodes.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/IGFS-Questions-tp10217p10297.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: IGFS Questions

Posted by pragmaticbigdata <am...@gmail.com>.
Thanks for the replies. I have a few follow up questions.


> Yes, as long as you have Hadoop-compliant implementation of S3 file system
> (e.g. org.apache.hadoop.fs.s3.S3FileSystem).

I will spend sometime in understanding what this means but by "Hadoop
compliant implementation" are you hinting that HDFS needs to be running even
if I have S3 as the secondary file system?


> You can configure evictions from data cache. Please refer to
> org.apache.ignite.cache.eviction.igfs.IgfsPerBlockLruEvictionPolicy class.

I think my question got misunderstood. I wanted to know if IGFS can overflow
to local disk whenever data does not fit in-memory?


> Underlying file system must be shared between all nodes in cluster. If it
> is true, then you can use
> org.apache.ignite.igfs.secondary.local.LocalIgfsSecondaryFileSystem

Ok. I have misunderstood the local disk capability that was added as part of
https://issues.apache.org/jira/browse/IGNITE-1926. I understood that IGFS
could be backed up by local disk stores where each IFGS node would save the
data loaded in-memory on that node to the disk the server has. Could you
please elaborate on the shared disk implementation?

Thanks.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/IGFS-Questions-tp10217p10289.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: IGFS Questions

Posted by Vladimir Ozerov <vo...@gridgain.com>.
Hi,

1. Durability depends on IGFS mode. In PRIMARY there is no durability, In
other modes IGFS will propagate writes to underlying file system (e.g. to
HDFS).
2. Files stored in IGFS are always partitioned. You can specify number of
backups and/or REPLICATED mode in data cache configuration.
3. Yes, as long as you have Hadoop-compliant implementation of S3 file
system (e.g. org.apache.hadoop.fs.s3.S3FileSystem).
4. You can configure evictions from data cache. Please refer to
org.apache.ignite.cache.eviction.igfs.IgfsPerBlockLruEvictionPolicy class.
5. Underlying file system must be shared between all nodes in cluster. If
it is true, then you can use
org.apache.ignite.igfs.secondary.local.LocalIgfsSecondaryFileSystem.
6. You can define pinned files before node start. But you cannot change it
in runtime. Please refer to IgfsPerBlockLruEvictionPolicy.setExcludePaths()
method.
7. PROXY mode doesn't cache any data, but simply delegates to secondary
file system. It is useful when you do not want to cache certain part of
data at all. For example, if you access data only once.
8. No. Write-behind speeds up user operations at the cost of consistency
guarantees.


On Tue, Jan 24, 2017 at 5:00 PM, pragmaticbigdata <am...@gmail.com>
wrote:

> I have some questions of deploying IGFS as a cache layer given that ignite
> could be deployed both as a key-value store and as a file system
>
> 1. How does IGFS behave when deployed in standalone mode? I wanted to
> confirm that there is no durability in this mode. Assuming I persist a
> parquet file on IGFS, if the cluster goes down I lose the file, right?
> 2. Do we get the ability to specify the fact that the file stored in IGFS
> could be both partitioned (with backup nodes) or replicated?
> 3. IGFS can act as a cache layer over HDFS and local file system, can it
> act
> as a caching layer over S3 store?
> 4. As with the key-value store, can I configure tiered storage in IGFS i.e.
> given that IGFS is configured with local file system as the secondary store
> and the ignite cluster of 3 server nodes configured with 5GB memory each,
> would the data spill over to the local disk if I try to load a 25GB file
> into IGFS? If so, what is the configuration needed?
> 5. Can I configure local SSD disks as the secondary store for IGFS?
> 6. I browsed through the documentation but I didn't find the capability of
> pinning and unpinning files in IGFS. I am looking something similar to what
> alluxio
> <http://www.alluxio.org/docs/master/en/Tiered-Storage-on-
> Alluxio.html#pinning-files>
> provides. Can it be implemented?
> 7. Could you elaborate a bit on how IGFS Proxy Mode works? What is its
> recommended use case?
> 8. With DUAL_ASYNC (write-behind mode), does ignite have failover
> guarantees
> which is lacking in the key-value store -
> https://issues.apache.org/jira/browse/IGNITE-1897?
>
> Thanks in advance.
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/IGFS-Questions-tp10217.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>