You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Vladimir Ozerov <vo...@gridgain.com> on 2015/03/02 14:12:40 UTC

Ignite File System (re)design.

HI all,

We spend some time on discussions about file system and Hadoop APIs. There
were two possible ways to improve current non-obvious API.
First idea was to leave API more or less the same with only some cosmetic
changes, mainly class names.
Second idea was to remove all secondary file system configuration
parameters from IgfsConfiguration and move it to Hadoop module. Then IGFS
could be wired up with Hadoop secondary file system with help of some
private interface which are not exposed to users.

I think the first solution is better because currently secondary file
system in IGFS is a kind of extension point. User is free to implement his
own secondary storage and use it in pretty the same way as store is used in
cache. I do not see any sensible reasons why we should remove this
extension point and hide it in Hadoop module. Therefore, I designed new API
using the first approach and the draft put into the branch ignite-386.
Please feel free to review and comment it.

I'll also briefly go through the new design here:

Core module:
1) o.a.i.IgniteFileSystem - user interface to work with our native file
system. Obtained using Ignite.fileSystem() method
Based on "IgniteFs" and "Igfs" interfaces in current implementation

2) o.a.i.filesystem.SecondaryFileSystem - API for creating secondary file
systems for IGFS.
Based on "Igfs" interface in current implementation.

Note that there is no more direct link between IgniteFileSystem and
SecondaryFileSystem, as these are completely different entities.

3) o.a.i.configuration.FileSystemConfiguration - configuration bean for
IgniteFileSystem. It has setter
"setSecondaryFileSystem(SecondaryFileSystem)".

Hadoop module:
1) There are 4 map-reduce classes under o.a.i.hadoop.mapreduce package.
Their packages reflect corresponding packages in Hadoop API. E.g.:
org.apache.ignite.[hadoop.mapreduce.protocol.IgniteHadoopClientProtocol]
implements org.apache.[hadoop.mapreduce.protocol.ClientProtocol].

2) Two file system implementations named "IgniteHadoopFileSystem" for v1
and v2 Hadoops.

3) IgniteHadoopSecondaryFileSystem - implementation of SecondaryFileSystem
from core module, which is capable of delegating native IGFS calls to
underlying Hadoop FileSystem.
It is named "IgfsHadoopFileSystemWrapper" in current implementation.

Let me give you an example of how user is going to configure it now.

1) Ignite configuration:
<bean class="org.apache.ignite.configuration.IgniteConfiguration">
    <property name="fileSystemConfiguration">
        <list>
            <bean
class="org.apache.ignite.configuration.FileSystemConfiguration">
                <!-- Delegate to real HDFS. -->
                <property name="secondaryFileSystem">
                    <bean
class="örg.apache.ignite.hadoop.fs.IgniteHadoopSecondaryFileSystem">
                        <constructor-arg value="hdfs://192.168.1.23"/>
                    </bean>
                </property>
            </bean>
        </list>
    </property>
</bean>

2) core-site.xml:
<configuration>
    <property>
        <name>fs.default.name</name>
        <value>igfs:///</value>
    </property>
    <property>
        <name>fs.igfs.impl</name>
        <value>org.apache.ignite.igfs.hadoop.v1.IgfsHadoopFileSystem</value>
    </property>
    <property>
        <name>fs.AbstractFileSystem.igfs.impl</name>
        <value>org.apache.ignite.igfs.hadoop.v2.IgfsHadoopFileSystem</value>
    </property>
</configuration>

Seems pretty clear and consistent to me.

Thoughts?

Vladimir.

Re: Ignite File System (re)design.

Posted by Konstantin Boudnik <co...@apache.org>.
On Mon, Mar 02, 2015 at 08:39PM, Dmitriy Setrakyan wrote:
> I agree. I would actually drop Hadoop 1 altogether, especially given that
> it does not have Ignite MapReduce acceleration component.

Even better ;)

> 
> D.
> 
> On Mon, Mar 2, 2015 at 8:07 PM, Konstantin Boudnik <co...@apache.org> wrote:
> 
> > Seems pretty good. Also, considering that Hadoop1.x is pretty much a
> > dead-meat, specifying the implementor for Hadoop v1 should be optional.
> >
> > Cos
> >
> > On Mon, Mar 02, 2015 at 04:12PM, Vladimir Ozerov wrote:
> > > HI all,
> > >
> > > We spend some time on discussions about file system and Hadoop APIs.
> > There
> > > were two possible ways to improve current non-obvious API.
> > > First idea was to leave API more or less the same with only some cosmetic
> > > changes, mainly class names.
> > > Second idea was to remove all secondary file system configuration
> > > parameters from IgfsConfiguration and move it to Hadoop module. Then IGFS
> > > could be wired up with Hadoop secondary file system with help of some
> > > private interface which are not exposed to users.
> > >
> > > I think the first solution is better because currently secondary file
> > > system in IGFS is a kind of extension point. User is free to implement
> > his
> > > own secondary storage and use it in pretty the same way as store is used
> > in
> > > cache. I do not see any sensible reasons why we should remove this
> > > extension point and hide it in Hadoop module. Therefore, I designed new
> > API
> > > using the first approach and the draft put into the branch ignite-386.
> > > Please feel free to review and comment it.
> > >
> > > I'll also briefly go through the new design here:
> > >
> > > Core module:
> > > 1) o.a.i.IgniteFileSystem - user interface to work with our native file
> > > system. Obtained using Ignite.fileSystem() method
> > > Based on "IgniteFs" and "Igfs" interfaces in current implementation
> > >
> > > 2) o.a.i.filesystem.SecondaryFileSystem - API for creating secondary file
> > > systems for IGFS.
> > > Based on "Igfs" interface in current implementation.
> > >
> > > Note that there is no more direct link between IgniteFileSystem and
> > > SecondaryFileSystem, as these are completely different entities.
> > >
> > > 3) o.a.i.configuration.FileSystemConfiguration - configuration bean for
> > > IgniteFileSystem. It has setter
> > > "setSecondaryFileSystem(SecondaryFileSystem)".
> > >
> > > Hadoop module:
> > > 1) There are 4 map-reduce classes under o.a.i.hadoop.mapreduce package.
> > > Their packages reflect corresponding packages in Hadoop API. E.g.:
> > > org.apache.ignite.[hadoop.mapreduce.protocol.IgniteHadoopClientProtocol]
> > > implements org.apache.[hadoop.mapreduce.protocol.ClientProtocol].
> > >
> > > 2) Two file system implementations named "IgniteHadoopFileSystem" for v1
> > > and v2 Hadoops.
> > >
> > > 3) IgniteHadoopSecondaryFileSystem - implementation of
> > SecondaryFileSystem
> > > from core module, which is capable of delegating native IGFS calls to
> > > underlying Hadoop FileSystem.
> > > It is named "IgfsHadoopFileSystemWrapper" in current implementation.
> > >
> > > Let me give you an example of how user is going to configure it now.
> > >
> > > 1) Ignite configuration:
> > > <bean class="org.apache.ignite.configuration.IgniteConfiguration">
> > >     <property name="fileSystemConfiguration">
> > >         <list>
> > >             <bean
> > > class="org.apache.ignite.configuration.FileSystemConfiguration">
> > >                 <!-- Delegate to real HDFS. -->
> > >                 <property name="secondaryFileSystem">
> > >                     <bean
> > > class="örg.apache.ignite.hadoop.fs.IgniteHadoopSecondaryFileSystem">
> > >                         <constructor-arg value="hdfs://192.168.1.23"/>
> > >                     </bean>
> > >                 </property>
> > >             </bean>
> > >         </list>
> > >     </property>
> > > </bean>
> > >
> > > 2) core-site.xml:
> > > <configuration>
> > >     <property>
> > >         <name>fs.default.name</name>
> > >         <value>igfs:///</value>
> > >     </property>
> > >     <property>
> > >         <name>fs.igfs.impl</name>
> > >
> >  <value>org.apache.ignite.igfs.hadoop.v1.IgfsHadoopFileSystem</value>
> > >     </property>
> > >     <property>
> > >         <name>fs.AbstractFileSystem.igfs.impl</name>
> > >
> >  <value>org.apache.ignite.igfs.hadoop.v2.IgfsHadoopFileSystem</value>
> > >     </property>
> > > </configuration>
> > >
> > > Seems pretty clear and consistent to me.
> > >
> > > Thoughts?
> > >
> > > Vladimir.
> >

Re: Ignite File System (re)design.

Posted by Dmitriy Setrakyan <ds...@apache.org>.
I agree. I would actually drop Hadoop 1 altogether, especially given that
it does not have Ignite MapReduce acceleration component.

D.

On Mon, Mar 2, 2015 at 8:07 PM, Konstantin Boudnik <co...@apache.org> wrote:

> Seems pretty good. Also, considering that Hadoop1.x is pretty much a
> dead-meat, specifying the implementor for Hadoop v1 should be optional.
>
> Cos
>
> On Mon, Mar 02, 2015 at 04:12PM, Vladimir Ozerov wrote:
> > HI all,
> >
> > We spend some time on discussions about file system and Hadoop APIs.
> There
> > were two possible ways to improve current non-obvious API.
> > First idea was to leave API more or less the same with only some cosmetic
> > changes, mainly class names.
> > Second idea was to remove all secondary file system configuration
> > parameters from IgfsConfiguration and move it to Hadoop module. Then IGFS
> > could be wired up with Hadoop secondary file system with help of some
> > private interface which are not exposed to users.
> >
> > I think the first solution is better because currently secondary file
> > system in IGFS is a kind of extension point. User is free to implement
> his
> > own secondary storage and use it in pretty the same way as store is used
> in
> > cache. I do not see any sensible reasons why we should remove this
> > extension point and hide it in Hadoop module. Therefore, I designed new
> API
> > using the first approach and the draft put into the branch ignite-386.
> > Please feel free to review and comment it.
> >
> > I'll also briefly go through the new design here:
> >
> > Core module:
> > 1) o.a.i.IgniteFileSystem - user interface to work with our native file
> > system. Obtained using Ignite.fileSystem() method
> > Based on "IgniteFs" and "Igfs" interfaces in current implementation
> >
> > 2) o.a.i.filesystem.SecondaryFileSystem - API for creating secondary file
> > systems for IGFS.
> > Based on "Igfs" interface in current implementation.
> >
> > Note that there is no more direct link between IgniteFileSystem and
> > SecondaryFileSystem, as these are completely different entities.
> >
> > 3) o.a.i.configuration.FileSystemConfiguration - configuration bean for
> > IgniteFileSystem. It has setter
> > "setSecondaryFileSystem(SecondaryFileSystem)".
> >
> > Hadoop module:
> > 1) There are 4 map-reduce classes under o.a.i.hadoop.mapreduce package.
> > Their packages reflect corresponding packages in Hadoop API. E.g.:
> > org.apache.ignite.[hadoop.mapreduce.protocol.IgniteHadoopClientProtocol]
> > implements org.apache.[hadoop.mapreduce.protocol.ClientProtocol].
> >
> > 2) Two file system implementations named "IgniteHadoopFileSystem" for v1
> > and v2 Hadoops.
> >
> > 3) IgniteHadoopSecondaryFileSystem - implementation of
> SecondaryFileSystem
> > from core module, which is capable of delegating native IGFS calls to
> > underlying Hadoop FileSystem.
> > It is named "IgfsHadoopFileSystemWrapper" in current implementation.
> >
> > Let me give you an example of how user is going to configure it now.
> >
> > 1) Ignite configuration:
> > <bean class="org.apache.ignite.configuration.IgniteConfiguration">
> >     <property name="fileSystemConfiguration">
> >         <list>
> >             <bean
> > class="org.apache.ignite.configuration.FileSystemConfiguration">
> >                 <!-- Delegate to real HDFS. -->
> >                 <property name="secondaryFileSystem">
> >                     <bean
> > class="örg.apache.ignite.hadoop.fs.IgniteHadoopSecondaryFileSystem">
> >                         <constructor-arg value="hdfs://192.168.1.23"/>
> >                     </bean>
> >                 </property>
> >             </bean>
> >         </list>
> >     </property>
> > </bean>
> >
> > 2) core-site.xml:
> > <configuration>
> >     <property>
> >         <name>fs.default.name</name>
> >         <value>igfs:///</value>
> >     </property>
> >     <property>
> >         <name>fs.igfs.impl</name>
> >
>  <value>org.apache.ignite.igfs.hadoop.v1.IgfsHadoopFileSystem</value>
> >     </property>
> >     <property>
> >         <name>fs.AbstractFileSystem.igfs.impl</name>
> >
>  <value>org.apache.ignite.igfs.hadoop.v2.IgfsHadoopFileSystem</value>
> >     </property>
> > </configuration>
> >
> > Seems pretty clear and consistent to me.
> >
> > Thoughts?
> >
> > Vladimir.
>

Re: Ignite File System (re)design.

Posted by Konstantin Boudnik <co...@apache.org>.
Seems pretty good. Also, considering that Hadoop1.x is pretty much a
dead-meat, specifying the implementor for Hadoop v1 should be optional.

Cos

On Mon, Mar 02, 2015 at 04:12PM, Vladimir Ozerov wrote:
> HI all,
> 
> We spend some time on discussions about file system and Hadoop APIs. There
> were two possible ways to improve current non-obvious API.
> First idea was to leave API more or less the same with only some cosmetic
> changes, mainly class names.
> Second idea was to remove all secondary file system configuration
> parameters from IgfsConfiguration and move it to Hadoop module. Then IGFS
> could be wired up with Hadoop secondary file system with help of some
> private interface which are not exposed to users.
> 
> I think the first solution is better because currently secondary file
> system in IGFS is a kind of extension point. User is free to implement his
> own secondary storage and use it in pretty the same way as store is used in
> cache. I do not see any sensible reasons why we should remove this
> extension point and hide it in Hadoop module. Therefore, I designed new API
> using the first approach and the draft put into the branch ignite-386.
> Please feel free to review and comment it.
> 
> I'll also briefly go through the new design here:
> 
> Core module:
> 1) o.a.i.IgniteFileSystem - user interface to work with our native file
> system. Obtained using Ignite.fileSystem() method
> Based on "IgniteFs" and "Igfs" interfaces in current implementation
> 
> 2) o.a.i.filesystem.SecondaryFileSystem - API for creating secondary file
> systems for IGFS.
> Based on "Igfs" interface in current implementation.
> 
> Note that there is no more direct link between IgniteFileSystem and
> SecondaryFileSystem, as these are completely different entities.
> 
> 3) o.a.i.configuration.FileSystemConfiguration - configuration bean for
> IgniteFileSystem. It has setter
> "setSecondaryFileSystem(SecondaryFileSystem)".
> 
> Hadoop module:
> 1) There are 4 map-reduce classes under o.a.i.hadoop.mapreduce package.
> Their packages reflect corresponding packages in Hadoop API. E.g.:
> org.apache.ignite.[hadoop.mapreduce.protocol.IgniteHadoopClientProtocol]
> implements org.apache.[hadoop.mapreduce.protocol.ClientProtocol].
> 
> 2) Two file system implementations named "IgniteHadoopFileSystem" for v1
> and v2 Hadoops.
> 
> 3) IgniteHadoopSecondaryFileSystem - implementation of SecondaryFileSystem
> from core module, which is capable of delegating native IGFS calls to
> underlying Hadoop FileSystem.
> It is named "IgfsHadoopFileSystemWrapper" in current implementation.
> 
> Let me give you an example of how user is going to configure it now.
> 
> 1) Ignite configuration:
> <bean class="org.apache.ignite.configuration.IgniteConfiguration">
>     <property name="fileSystemConfiguration">
>         <list>
>             <bean
> class="org.apache.ignite.configuration.FileSystemConfiguration">
>                 <!-- Delegate to real HDFS. -->
>                 <property name="secondaryFileSystem">
>                     <bean
> class="örg.apache.ignite.hadoop.fs.IgniteHadoopSecondaryFileSystem">
>                         <constructor-arg value="hdfs://192.168.1.23"/>
>                     </bean>
>                 </property>
>             </bean>
>         </list>
>     </property>
> </bean>
> 
> 2) core-site.xml:
> <configuration>
>     <property>
>         <name>fs.default.name</name>
>         <value>igfs:///</value>
>     </property>
>     <property>
>         <name>fs.igfs.impl</name>
>         <value>org.apache.ignite.igfs.hadoop.v1.IgfsHadoopFileSystem</value>
>     </property>
>     <property>
>         <name>fs.AbstractFileSystem.igfs.impl</name>
>         <value>org.apache.ignite.igfs.hadoop.v2.IgfsHadoopFileSystem</value>
>     </property>
> </configuration>
> 
> Seems pretty clear and consistent to me.
> 
> Thoughts?
> 
> Vladimir.

Re: Ignite File System (re)design.

Posted by Dmitriy Setrakyan <ds...@apache.org>.
I like it.

On Mon, Mar 2, 2015 at 5:12 AM, Vladimir Ozerov <vo...@gridgain.com>
wrote:

> HI all,
>
> We spend some time on discussions about file system and Hadoop APIs. There
> were two possible ways to improve current non-obvious API.
> First idea was to leave API more or less the same with only some cosmetic
> changes, mainly class names.
> Second idea was to remove all secondary file system configuration
> parameters from IgfsConfiguration and move it to Hadoop module. Then IGFS
> could be wired up with Hadoop secondary file system with help of some
> private interface which are not exposed to users.
>
> I think the first solution is better because currently secondary file
> system in IGFS is a kind of extension point. User is free to implement his
> own secondary storage and use it in pretty the same way as store is used in
> cache. I do not see any sensible reasons why we should remove this
> extension point and hide it in Hadoop module. Therefore, I designed new API
> using the first approach and the draft put into the branch ignite-386.
> Please feel free to review and comment it.
>
> I'll also briefly go through the new design here:
>
> Core module:
> 1) o.a.i.IgniteFileSystem - user interface to work with our native file
> system. Obtained using Ignite.fileSystem() method
> Based on "IgniteFs" and "Igfs" interfaces in current implementation
>
> 2) o.a.i.filesystem.SecondaryFileSystem - API for creating secondary file
> systems for IGFS.
> Based on "Igfs" interface in current implementation.
>
> Note that there is no more direct link between IgniteFileSystem and
> SecondaryFileSystem, as these are completely different entities.
>
> 3) o.a.i.configuration.FileSystemConfiguration - configuration bean for
> IgniteFileSystem. It has setter
> "setSecondaryFileSystem(SecondaryFileSystem)".
>
> Hadoop module:
> 1) There are 4 map-reduce classes under o.a.i.hadoop.mapreduce package.
> Their packages reflect corresponding packages in Hadoop API. E.g.:
> org.apache.ignite.[hadoop.mapreduce.protocol.IgniteHadoopClientProtocol]
> implements org.apache.[hadoop.mapreduce.protocol.ClientProtocol].
>
> 2) Two file system implementations named "IgniteHadoopFileSystem" for v1
> and v2 Hadoops.
>
> 3) IgniteHadoopSecondaryFileSystem - implementation of SecondaryFileSystem
> from core module, which is capable of delegating native IGFS calls to
> underlying Hadoop FileSystem.
> It is named "IgfsHadoopFileSystemWrapper" in current implementation.
>
> Let me give you an example of how user is going to configure it now.
>
> 1) Ignite configuration:
> <bean class="org.apache.ignite.configuration.IgniteConfiguration">
>     <property name="fileSystemConfiguration">
>         <list>
>             <bean
> class="org.apache.ignite.configuration.FileSystemConfiguration">
>                 <!-- Delegate to real HDFS. -->
>                 <property name="secondaryFileSystem">
>                     <bean
> class="örg.apache.ignite.hadoop.fs.IgniteHadoopSecondaryFileSystem">
>                         <constructor-arg value="hdfs://192.168.1.23"/>
>                     </bean>
>                 </property>
>             </bean>
>         </list>
>     </property>
> </bean>
>
> 2) core-site.xml:
> <configuration>
>     <property>
>         <name>fs.default.name</name>
>         <value>igfs:///</value>
>     </property>
>     <property>
>         <name>fs.igfs.impl</name>
>
> <value>org.apache.ignite.igfs.hadoop.v1.IgfsHadoopFileSystem</value>
>     </property>
>     <property>
>         <name>fs.AbstractFileSystem.igfs.impl</name>
>
> <value>org.apache.ignite.igfs.hadoop.v2.IgfsHadoopFileSystem</value>
>     </property>
> </configuration>
>
> Seems pretty clear and consistent to me.
>
> Thoughts?
>
> Vladimir.
>