You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Vladimir Ozerov <vo...@gridgain.com> on 2015/04/16 13:07:31 UTC

Integration with external platforms.

Hi,

I'd like to propose an idea of creating new Ignite component for
integration with other platforms such as .Net, Ruby, NodeJS, etc.

Earlier in GridGain we had thin TCP clients for Java and .Net. They had
limited features and not-so-good performance (e.g. due to inability to
reliable map task to affinity node, etc.). For now Java client is in
open-source and is used only for internal purposes, and .Net client was
fully reworked to use JVM started in the same process instead of TCP and is
currently GridGain enterprise feature.

But as we see growing interest to the product it makes sense to expose some
native interfaces for easy integration with our product from any platform.

Let's discuss on how platforms integration architecture should be.

*1. JVM placement.*
One of the most important points is how native platform will communicate
with JVM with started node. There are number of approaches to consider:
- Start JVM in the same process. This will allow for fast communication
between JVM and the native platform. The drawback of this approach is that
we can start only one JVM per process. As a result this solution might not
work in some environments (especially development ones), e.g. app servers
when multiple native applications run in the same process and each
application want to start a node with different JVM properties, or
multi-process environments when there is a coordinator process which spawns
child processes with limited lifecycle on demand (Apache, IIS, NodeJS, etc).
- Connect to JVM using some IPC mechanism (shared memory, pipes). This
approach might be a bit slower than the first one due to IPC overhead, but
still pretty fast. To implement it we probably will have to create some
intermediate management application which will start nodes in different
processes and provide handles for native application to connect with them.
This approach will be more flexible than the first one.
- Connect to JVM using TCP. This will be the slowest one, but offer even
greater flexibility, as we will be able to transaprently connect to nodes
even on another hosts. However, this raises some failover questions.

In summary, I think we should choose "JVM in the same process" approach as
we already have experience with it and it is prooved to be functional and
performant, but create careful abstraction (facade) for node communication
logic, so that shmem/pipes/tcp approaches can be implemented easily if
needed without distirbing other components.

*2. Data transfer and serialization.*
Another important point - how to pass data between Java and non-Java
platforms. Obviously we will have to provide some common format for both
interacting platforms, so that data serialized on one side could be
deserialized on another if needed.
For JVM-in-the-same-proc approach it make sense to organize data transfer
over offheap memory. Earlier we experimented with more sophisticated
mechanisms like "pin Java heap array in native platform -> write directly
to that array -> unpin", but this approach have some serious problems (like
JVM intrinsic method hangs while array is pinned), while not providing
significant perofrmance benefit.
So I think data transfer over offheap will be enough as this is simple and
reliable solution with acceptable performance.
Also we must remember that platforms may potentially have different
mechanisms for data transfer. E.g., sometimes we have to marshal object to
bytes before passing it to Java, sometimes we may just pass a pointer (e.g.
structs in C or .Net with known layout), etc.. We should be able to
potentially support all these cases

In summary I propose to use offheap as a default implementation, while
still leaving a room for changing this if needed. E.g. instead of passing
offheap pointer + data length:

void invokeOtherPlatform(long dataPointer, int dataLen);

we should design it as:

void invokeOtherPlatform(long pointer);

where pointer will encode all information required for another platform to
read the data. E.g. it can be a pointer to memory region where the first 4
bytes are data length and the rest are serialzied object.

*3. Queries support*
Queries is one of the most demanded features of the product. But at the
moment it can only work with Java objects because it uses Java
serialization to get fields from it.
We will have to provide user a way to alter it somehow so that objects from
native platforms are supported as well.
Good candidate for this is IgniteCacheObjectProcessor interface which is
responsible for objects serialization.
We will have to investigate what should be done to let it's implementation
(either default or some custom) work with objects from other platforms.

*4. Extensibility*
We will have a set of C/C++ interfaces exposing basic features (e.g. cache,
compute, queries, etc.).
But as we do not know in advance what implementors will want to do apart
from regular Java methods, it make sense to leave some extensibility
points. At the very first glance they may look as follows:

interface Cache {
    void get(void* inData, void* outData); // Regular cache operation.
    bool put(void* outData); // Another regular cache operation.
    ...
    void invoke(int operationType, void* inData, void* outData); //
Extensibility point.
}

In this example we define "invoke" method where use may pass virtually
anything. So, when some new functionallity is required he will implement it
in Java and inject it into Ignite somehow (e.g. through config) and
implement it in native platform. But he WILL NOT have to change any Ignite
C interfaces and rebuild them.

*5. Configuration.*
Last, but not least - how to configure Ignite on other platforms. Currently
the only way to do that is Spring XML. This approach works well for Java
developers, but is not so good for others, because a developer who is not
familiar with Java/Spring will have to learn quite a bit things about them.
E.g. try configuring HashMap in Spring with an int key/value :-) Non-java
developers will have hard time doing this.
So probably we will have to let users use native mechanisms of their
platforms for configuration. This is not really critical from features
perspective, but will significantly improve user experience.

Please share your thoughs and ideas about that.

Vladimir.

Re: Integration with external platforms.

Posted by Branko Čibej <br...@apache.org>.
On 03.05.2015 21:32, Konstantin Boudnik wrote:
> On Sun, May 03, 2015 at 04:45AM, Branko Čibej wrote:
>> On 29.04.2015 12:07, Vladimir Ozerov wrote:
>>> My opinion is that a product created for particular platform (say, Python),
>>> should not smell Java. Spring XML is a nice standard in Java community. But
>>> I do not think that regular Node.JS/Python/Ruby/.Net/CPP developer knows
>>> what Spring is.
>> Oh, we know what Spring is ... it's just extremely Java-centric. Or
>> shall we say, JVM-centric.
>>
>>> Furthermore, for now it is extremely hard to define native components in
>>> Spring configuration. E.g., here is a short XML snippet on what native .Net
>>> cache store configuration with a single int property looks like in Spring
>>> (taken from GridGain):
>>> <property name="cacheStoreFactory">
>>>     <bean
>>> class="org.gridgain.grid.interop.dotnet.InteropDotNetCacheStoreFactory">
>>>         <property name="assemblyName" value="GridGainTest"/>
>>>         <property name="className"
>>> value="GridGain.Cache.Store.GridCacheTestStore"/>
>>>         <property name="properties">
>>>             <map>
>>>                 <entry key="myProperty">
>>>                     <value type="java.lang.Integer">42</value>
>>>                 </entry>
>>>             </map>
>>>         </property>
>>>     </bean>
>>> </property>
>>>
>>> Here we force user to know Spring synthax and that Spring will treat any
>>> map entry key/value as String unless it is explicitly stated that other
>>> type is needed. Looks ugly and difficult.
>> Exactly. Outside the Java world, platform/language agnostic interfaces
>> are defined in some flavour of standard-ish IDL. Or in this case, DDL.
>>
>>> Instead user want to have something like this defined using some
>>> industry-approved format for his platform:
>>> <storeFactory>
>>>     <GridGainTest#GridGain.Cache.Store.GridCacheTestStore myProperty=42 />
>>> </storeFactory>
>>>
>>> I cannot say anything about DSL as I never worked with any, but I am 100%
>>> sure that Spring XML is not an option for most other platforms.
>> Oh, theoretically, anyone can write code that consumes and generates
>> Spring XML. I'm just not sure that it makes sense to carry along all the
>> complexity for something that's, essentially, a simple structure definition.
>>
>> Have you considered using the Apache Thrift IDL to define the
>> interfaces? That avoids inventing yet another structural definition
>> language, it's well-established and far more readable than XML, and
>> binding generators for many scripting and compiled languages have
>> already been written. IIRC you can use Thrift interfaces without the
>> protocol baggage.
> If an IDL is to consider then Thrift might be a decent option (no matter how
> unpleasant my own experience with it was about 4 years back; could be the Hive
> fault, actually :) Another one - Protobuf - has some serious performance
> problems coming mainly from their stubborn String-copy approach (check for
> yourself http://bit.ly/1GSaSVk). Notorious protoc incompatibilities (2.4.1 vs
> 2.5 transition was a nightmare in Hadoop project) are something to stay away
> from as well.
>
> Another some times mentioned alternative is Avro aka RPC framework, which has
> certain advantages like dynamic schema, etc. However I don't see Avro gaining
> any real momentum outside of Hadoop ecosystem (which IMO says a lot).
> Besides, Avro supports a way fewer language bindings; their last release was
> about a year ago now; the mailing lists aren't that active. Just from these
> standpoints I'd be careful even considering Avro.
>
> In general, I agree with Brane - if there's something decent that can be used
> in a clean, orthogonal way to provide for easier integration with 3rd party
> software - let's rather reuse it, instead of inventing (and spending time on)
> our own.
>
> Cos
>
> P.S. And hopefully no one will seriously consider CORBA, pretty please ;)

Muhahahaha, why use Corba if you can go two levels of magnitude worse
with J2EE :)

-- Brane

Re: Integration with external platforms.

Posted by Konstantin Boudnik <co...@apache.org>.
On Sun, May 03, 2015 at 04:45AM, Branko Čibej wrote:
> On 29.04.2015 12:07, Vladimir Ozerov wrote:
> > My opinion is that a product created for particular platform (say, Python),
> > should not smell Java. Spring XML is a nice standard in Java community. But
> > I do not think that regular Node.JS/Python/Ruby/.Net/CPP developer knows
> > what Spring is.
> 
> Oh, we know what Spring is ... it's just extremely Java-centric. Or
> shall we say, JVM-centric.
> 
> > Furthermore, for now it is extremely hard to define native components in
> > Spring configuration. E.g., here is a short XML snippet on what native .Net
> > cache store configuration with a single int property looks like in Spring
> > (taken from GridGain):
> > <property name="cacheStoreFactory">
> >     <bean
> > class="org.gridgain.grid.interop.dotnet.InteropDotNetCacheStoreFactory">
> >         <property name="assemblyName" value="GridGainTest"/>
> >         <property name="className"
> > value="GridGain.Cache.Store.GridCacheTestStore"/>
> >         <property name="properties">
> >             <map>
> >                 <entry key="myProperty">
> >                     <value type="java.lang.Integer">42</value>
> >                 </entry>
> >             </map>
> >         </property>
> >     </bean>
> > </property>
> >
> > Here we force user to know Spring synthax and that Spring will treat any
> > map entry key/value as String unless it is explicitly stated that other
> > type is needed. Looks ugly and difficult.
> 
> Exactly. Outside the Java world, platform/language agnostic interfaces
> are defined in some flavour of standard-ish IDL. Or in this case, DDL.
> 
> > Instead user want to have something like this defined using some
> > industry-approved format for his platform:
> > <storeFactory>
> >     <GridGainTest#GridGain.Cache.Store.GridCacheTestStore myProperty=42 />
> > </storeFactory>
> >
> > I cannot say anything about DSL as I never worked with any, but I am 100%
> > sure that Spring XML is not an option for most other platforms.
> 
> Oh, theoretically, anyone can write code that consumes and generates
> Spring XML. I'm just not sure that it makes sense to carry along all the
> complexity for something that's, essentially, a simple structure definition.
> 
> Have you considered using the Apache Thrift IDL to define the
> interfaces? That avoids inventing yet another structural definition
> language, it's well-established and far more readable than XML, and
> binding generators for many scripting and compiled languages have
> already been written. IIRC you can use Thrift interfaces without the
> protocol baggage.

If an IDL is to consider then Thrift might be a decent option (no matter how
unpleasant my own experience with it was about 4 years back; could be the Hive
fault, actually :) Another one - Protobuf - has some serious performance
problems coming mainly from their stubborn String-copy approach (check for
yourself http://bit.ly/1GSaSVk). Notorious protoc incompatibilities (2.4.1 vs
2.5 transition was a nightmare in Hadoop project) are something to stay away
from as well.

Another some times mentioned alternative is Avro aka RPC framework, which has
certain advantages like dynamic schema, etc. However I don't see Avro gaining
any real momentum outside of Hadoop ecosystem (which IMO says a lot).
Besides, Avro supports a way fewer language bindings; their last release was
about a year ago now; the mailing lists aren't that active. Just from these
standpoints I'd be careful even considering Avro.

In general, I agree with Brane - if there's something decent that can be used
in a clean, orthogonal way to provide for easier integration with 3rd party
software - let's rather reuse it, instead of inventing (and spending time on)
our own.

Cos

P.S. And hopefully no one will seriously consider CORBA, pretty please ;)


Re: Integration with external platforms.

Posted by Branko Čibej <br...@apache.org>.
On 29.04.2015 12:07, Vladimir Ozerov wrote:
> My opinion is that a product created for particular platform (say, Python),
> should not smell Java. Spring XML is a nice standard in Java community. But
> I do not think that regular Node.JS/Python/Ruby/.Net/CPP developer knows
> what Spring is.

Oh, we know what Spring is ... it's just extremely Java-centric. Or
shall we say, JVM-centric.

> Furthermore, for now it is extremely hard to define native components in
> Spring configuration. E.g., here is a short XML snippet on what native .Net
> cache store configuration with a single int property looks like in Spring
> (taken from GridGain):
> <property name="cacheStoreFactory">
>     <bean
> class="org.gridgain.grid.interop.dotnet.InteropDotNetCacheStoreFactory">
>         <property name="assemblyName" value="GridGainTest"/>
>         <property name="className"
> value="GridGain.Cache.Store.GridCacheTestStore"/>
>         <property name="properties">
>             <map>
>                 <entry key="myProperty">
>                     <value type="java.lang.Integer">42</value>
>                 </entry>
>             </map>
>         </property>
>     </bean>
> </property>
>
> Here we force user to know Spring synthax and that Spring will treat any
> map entry key/value as String unless it is explicitly stated that other
> type is needed. Looks ugly and difficult.

Exactly. Outside the Java world, platform/language agnostic interfaces
are defined in some flavour of standard-ish IDL. Or in this case, DDL.

> Instead user want to have something like this defined using some
> industry-approved format for his platform:
> <storeFactory>
>     <GridGainTest#GridGain.Cache.Store.GridCacheTestStore myProperty=42 />
> </storeFactory>
>
> I cannot say anything about DSL as I never worked with any, but I am 100%
> sure that Spring XML is not an option for most other platforms.

Oh, theoretically, anyone can write code that consumes and generates
Spring XML. I'm just not sure that it makes sense to carry along all the
complexity for something that's, essentially, a simple structure definition.

Have you considered using the Apache Thrift IDL to define the
interfaces? That avoids inventing yet another structural definition
language, it's well-established and far more readable than XML, and
binding generators for many scripting and compiled languages have
already been written. IIRC you can use Thrift interfaces without the
protocol baggage.

-- Brane


Re: Integration with external platforms.

Posted by Vladimir Ozerov <vo...@gridgain.com>.
My opinion is that a product created for particular platform (say, Python),
should not smell Java. Spring XML is a nice standard in Java community. But
I do not think that regular Node.JS/Python/Ruby/.Net/CPP developer knows
what Spring is.

Furthermore, for now it is extremely hard to define native components in
Spring configuration. E.g., here is a short XML snippet on what native .Net
cache store configuration with a single int property looks like in Spring
(taken from GridGain):
<property name="cacheStoreFactory">
    <bean
class="org.gridgain.grid.interop.dotnet.InteropDotNetCacheStoreFactory">
        <property name="assemblyName" value="GridGainTest"/>
        <property name="className"
value="GridGain.Cache.Store.GridCacheTestStore"/>
        <property name="properties">
            <map>
                <entry key="myProperty">
                    <value type="java.lang.Integer">42</value>
                </entry>
            </map>
        </property>
    </bean>
</property>

Here we force user to know Spring synthax and that Spring will treat any
map entry key/value as String unless it is explicitly stated that other
type is needed. Looks ugly and difficult.

Instead user want to have something like this defined using some
industry-approved format for his platform:
<storeFactory>
    <GridGainTest#GridGain.Cache.Store.GridCacheTestStore myProperty=42 />
</storeFactory>

I cannot say anything about DSL as I never worked with any, but I am 100%
sure that Spring XML is not an option for most other platforms.

On the other hand, this is important issue, but it doesn't block
development of other interoperability stuff, and we can live with only
Spring XML for some  time (until we get flooded with questions from users).

Vladimir.

On Wed, Apr 29, 2015 at 2:39 AM, Dmitriy Setrakyan <ds...@apache.org>
wrote:

> On Tue, Apr 28, 2015 at 6:02 PM, Konstantin Boudnik <co...@apache.org>
> wrote:
>
> > Took me a while to (re)read and think about this. It seems to be getting
> > more
> > and more important as we see a growing interest from other ASF projects
> to
> > get
> > better integration with Ignite.
> >
> > I think all these are very valid points. I'd say the integration with
> > non-JVM
> > apps aren't that high-priority, but I might be mistaken in my judgement.
> >
>
> I think we should prioritize them higher. Currently, Apache Ignite is very
> feature rich, but lacks easy integration with other non-JVM platforms, like
> Python, or Ruby. We do support Memcached protocol, which can be natively
> used from other platforms, but it does not expose the full Ignite
> functionality, especially for compute features.
>
> However, compute can be easily supported for any platform by starting an
> external process form a Java job. I will fire up another discussion on it
> and create some tickets, based on the outcome.
>
>
> >
> > I wanted to specifically comment on #5 *Configuration* as UX is very
> > important
> > indeed. And as always, I am thinking that perhaps having a clean DSL
> might
> > help with overcoming that hurdle: DSL can be generated by anything, it is
> > humanly readable, and doesn't require much of the syntactic overhead.
> >
>
> Are we seriously saying that Spring is hard to use? Yes, I agree, it can be
> too verbose in some cases, but it is pretty much an industry standard for
> XML configuration right now. Introducing a custom configuration DSL will
> only add to the learning curve of Ignite, not making it simpler, IMHO.
>
> This has been an ongoing topic between me and Cos. Do other community
> members have any opinion on the matter?
>
>
> >
> > Cos
> >
> > On Thu, Apr 16, 2015 at 02:07PM, Vladimir Ozerov wrote:
> > > Hi,
> > >
> > > I'd like to propose an idea of creating new Ignite component for
> > > integration with other platforms such as .Net, Ruby, NodeJS, etc.
> > >
> > > Earlier in GridGain we had thin TCP clients for Java and .Net. They had
> > > limited features and not-so-good performance (e.g. due to inability to
> > > reliable map task to affinity node, etc.). For now Java client is in
> > > open-source and is used only for internal purposes, and .Net client was
> > > fully reworked to use JVM started in the same process instead of TCP
> and
> > is
> > > currently GridGain enterprise feature.
> > >
> > > But as we see growing interest to the product it makes sense to expose
> > some
> > > native interfaces for easy integration with our product from any
> > platform.
> > >
> > > Let's discuss on how platforms integration architecture should be.
> > >
> > > *1. JVM placement.*
> > > One of the most important points is how native platform will
> communicate
> > > with JVM with started node. There are number of approaches to consider:
> > > - Start JVM in the same process. This will allow for fast communication
> > > between JVM and the native platform. The drawback of this approach is
> > that
> > > we can start only one JVM per process. As a result this solution might
> > not
> > > work in some environments (especially development ones), e.g. app
> servers
> > > when multiple native applications run in the same process and each
> > > application want to start a node with different JVM properties, or
> > > multi-process environments when there is a coordinator process which
> > spawns
> > > child processes with limited lifecycle on demand (Apache, IIS, NodeJS,
> > etc).
> > > - Connect to JVM using some IPC mechanism (shared memory, pipes). This
> > > approach might be a bit slower than the first one due to IPC overhead,
> > but
> > > still pretty fast. To implement it we probably will have to create some
> > > intermediate management application which will start nodes in different
> > > processes and provide handles for native application to connect with
> > them.
> > > This approach will be more flexible than the first one.
> > > - Connect to JVM using TCP. This will be the slowest one, but offer
> even
> > > greater flexibility, as we will be able to transaprently connect to
> nodes
> > > even on another hosts. However, this raises some failover questions.
> > >
> > > In summary, I think we should choose "JVM in the same process" approach
> > as
> > > we already have experience with it and it is prooved to be functional
> and
> > > performant, but create careful abstraction (facade) for node
> > communication
> > > logic, so that shmem/pipes/tcp approaches can be implemented easily if
> > > needed without distirbing other components.
> > >
> > > *2. Data transfer and serialization.*
> > > Another important point - how to pass data between Java and non-Java
> > > platforms. Obviously we will have to provide some common format for
> both
> > > interacting platforms, so that data serialized on one side could be
> > > deserialized on another if needed.
> > > For JVM-in-the-same-proc approach it make sense to organize data
> transfer
> > > over offheap memory. Earlier we experimented with more sophisticated
> > > mechanisms like "pin Java heap array in native platform -> write
> directly
> > > to that array -> unpin", but this approach have some serious problems
> > (like
> > > JVM intrinsic method hangs while array is pinned), while not providing
> > > significant perofrmance benefit.
> > > So I think data transfer over offheap will be enough as this is simple
> > and
> > > reliable solution with acceptable performance.
> > > Also we must remember that platforms may potentially have different
> > > mechanisms for data transfer. E.g., sometimes we have to marshal object
> > to
> > > bytes before passing it to Java, sometimes we may just pass a pointer
> > (e.g.
> > > structs in C or .Net with known layout), etc.. We should be able to
> > > potentially support all these cases
> > >
> > > In summary I propose to use offheap as a default implementation, while
> > > still leaving a room for changing this if needed. E.g. instead of
> passing
> > > offheap pointer + data length:
> > >
> > > void invokeOtherPlatform(long dataPointer, int dataLen);
> > >
> > > we should design it as:
> > >
> > > void invokeOtherPlatform(long pointer);
> > >
> > > where pointer will encode all information required for another platform
> > to
> > > read the data. E.g. it can be a pointer to memory region where the
> first
> > 4
> > > bytes are data length and the rest are serialzied object.
> > >
> > > *3. Queries support*
> > > Queries is one of the most demanded features of the product. But at the
> > > moment it can only work with Java objects because it uses Java
> > > serialization to get fields from it.
> > > We will have to provide user a way to alter it somehow so that objects
> > from
> > > native platforms are supported as well.
> > > Good candidate for this is IgniteCacheObjectProcessor interface which
> is
> > > responsible for objects serialization.
> > > We will have to investigate what should be done to let it's
> > implementation
> > > (either default or some custom) work with objects from other platforms.
> > >
> > > *4. Extensibility*
> > > We will have a set of C/C++ interfaces exposing basic features (e.g.
> > cache,
> > > compute, queries, etc.).
> > > But as we do not know in advance what implementors will want to do
> apart
> > > from regular Java methods, it make sense to leave some extensibility
> > > points. At the very first glance they may look as follows:
> > >
> > > interface Cache {
> > >     void get(void* inData, void* outData); // Regular cache operation.
> > >     bool put(void* outData); // Another regular cache operation.
> > >     ...
> > >     void invoke(int operationType, void* inData, void* outData); //
> > > Extensibility point.
> > > }
> > >
> > > In this example we define "invoke" method where use may pass virtually
> > > anything. So, when some new functionallity is required he will
> implement
> > it
> > > in Java and inject it into Ignite somehow (e.g. through config) and
> > > implement it in native platform. But he WILL NOT have to change any
> > Ignite
> > > C interfaces and rebuild them.
> > >
> > > *5. Configuration.*
> > > Last, but not least - how to configure Ignite on other platforms.
> > Currently
> > > the only way to do that is Spring XML. This approach works well for
> Java
> > > developers, but is not so good for others, because a developer who is
> not
> > > familiar with Java/Spring will have to learn quite a bit things about
> > them.
> > > E.g. try configuring HashMap in Spring with an int key/value :-)
> Non-java
> > > developers will have hard time doing this.
> > > So probably we will have to let users use native mechanisms of their
> > > platforms for configuration. This is not really critical from features
> > > perspective, but will significantly improve user experience.
> > >
> > > Please share your thoughs and ideas about that.
> > >
> > > Vladimir.
> >
>

Re: Integration with external platforms.

Posted by Dmitriy Setrakyan <ds...@apache.org>.
On Tue, Apr 28, 2015 at 6:02 PM, Konstantin Boudnik <co...@apache.org> wrote:

> Took me a while to (re)read and think about this. It seems to be getting
> more
> and more important as we see a growing interest from other ASF projects to
> get
> better integration with Ignite.
>
> I think all these are very valid points. I'd say the integration with
> non-JVM
> apps aren't that high-priority, but I might be mistaken in my judgement.
>

I think we should prioritize them higher. Currently, Apache Ignite is very
feature rich, but lacks easy integration with other non-JVM platforms, like
Python, or Ruby. We do support Memcached protocol, which can be natively
used from other platforms, but it does not expose the full Ignite
functionality, especially for compute features.

However, compute can be easily supported for any platform by starting an
external process form a Java job. I will fire up another discussion on it
and create some tickets, based on the outcome.


>
> I wanted to specifically comment on #5 *Configuration* as UX is very
> important
> indeed. And as always, I am thinking that perhaps having a clean DSL might
> help with overcoming that hurdle: DSL can be generated by anything, it is
> humanly readable, and doesn't require much of the syntactic overhead.
>

Are we seriously saying that Spring is hard to use? Yes, I agree, it can be
too verbose in some cases, but it is pretty much an industry standard for
XML configuration right now. Introducing a custom configuration DSL will
only add to the learning curve of Ignite, not making it simpler, IMHO.

This has been an ongoing topic between me and Cos. Do other community
members have any opinion on the matter?


>
> Cos
>
> On Thu, Apr 16, 2015 at 02:07PM, Vladimir Ozerov wrote:
> > Hi,
> >
> > I'd like to propose an idea of creating new Ignite component for
> > integration with other platforms such as .Net, Ruby, NodeJS, etc.
> >
> > Earlier in GridGain we had thin TCP clients for Java and .Net. They had
> > limited features and not-so-good performance (e.g. due to inability to
> > reliable map task to affinity node, etc.). For now Java client is in
> > open-source and is used only for internal purposes, and .Net client was
> > fully reworked to use JVM started in the same process instead of TCP and
> is
> > currently GridGain enterprise feature.
> >
> > But as we see growing interest to the product it makes sense to expose
> some
> > native interfaces for easy integration with our product from any
> platform.
> >
> > Let's discuss on how platforms integration architecture should be.
> >
> > *1. JVM placement.*
> > One of the most important points is how native platform will communicate
> > with JVM with started node. There are number of approaches to consider:
> > - Start JVM in the same process. This will allow for fast communication
> > between JVM and the native platform. The drawback of this approach is
> that
> > we can start only one JVM per process. As a result this solution might
> not
> > work in some environments (especially development ones), e.g. app servers
> > when multiple native applications run in the same process and each
> > application want to start a node with different JVM properties, or
> > multi-process environments when there is a coordinator process which
> spawns
> > child processes with limited lifecycle on demand (Apache, IIS, NodeJS,
> etc).
> > - Connect to JVM using some IPC mechanism (shared memory, pipes). This
> > approach might be a bit slower than the first one due to IPC overhead,
> but
> > still pretty fast. To implement it we probably will have to create some
> > intermediate management application which will start nodes in different
> > processes and provide handles for native application to connect with
> them.
> > This approach will be more flexible than the first one.
> > - Connect to JVM using TCP. This will be the slowest one, but offer even
> > greater flexibility, as we will be able to transaprently connect to nodes
> > even on another hosts. However, this raises some failover questions.
> >
> > In summary, I think we should choose "JVM in the same process" approach
> as
> > we already have experience with it and it is prooved to be functional and
> > performant, but create careful abstraction (facade) for node
> communication
> > logic, so that shmem/pipes/tcp approaches can be implemented easily if
> > needed without distirbing other components.
> >
> > *2. Data transfer and serialization.*
> > Another important point - how to pass data between Java and non-Java
> > platforms. Obviously we will have to provide some common format for both
> > interacting platforms, so that data serialized on one side could be
> > deserialized on another if needed.
> > For JVM-in-the-same-proc approach it make sense to organize data transfer
> > over offheap memory. Earlier we experimented with more sophisticated
> > mechanisms like "pin Java heap array in native platform -> write directly
> > to that array -> unpin", but this approach have some serious problems
> (like
> > JVM intrinsic method hangs while array is pinned), while not providing
> > significant perofrmance benefit.
> > So I think data transfer over offheap will be enough as this is simple
> and
> > reliable solution with acceptable performance.
> > Also we must remember that platforms may potentially have different
> > mechanisms for data transfer. E.g., sometimes we have to marshal object
> to
> > bytes before passing it to Java, sometimes we may just pass a pointer
> (e.g.
> > structs in C or .Net with known layout), etc.. We should be able to
> > potentially support all these cases
> >
> > In summary I propose to use offheap as a default implementation, while
> > still leaving a room for changing this if needed. E.g. instead of passing
> > offheap pointer + data length:
> >
> > void invokeOtherPlatform(long dataPointer, int dataLen);
> >
> > we should design it as:
> >
> > void invokeOtherPlatform(long pointer);
> >
> > where pointer will encode all information required for another platform
> to
> > read the data. E.g. it can be a pointer to memory region where the first
> 4
> > bytes are data length and the rest are serialzied object.
> >
> > *3. Queries support*
> > Queries is one of the most demanded features of the product. But at the
> > moment it can only work with Java objects because it uses Java
> > serialization to get fields from it.
> > We will have to provide user a way to alter it somehow so that objects
> from
> > native platforms are supported as well.
> > Good candidate for this is IgniteCacheObjectProcessor interface which is
> > responsible for objects serialization.
> > We will have to investigate what should be done to let it's
> implementation
> > (either default or some custom) work with objects from other platforms.
> >
> > *4. Extensibility*
> > We will have a set of C/C++ interfaces exposing basic features (e.g.
> cache,
> > compute, queries, etc.).
> > But as we do not know in advance what implementors will want to do apart
> > from regular Java methods, it make sense to leave some extensibility
> > points. At the very first glance they may look as follows:
> >
> > interface Cache {
> >     void get(void* inData, void* outData); // Regular cache operation.
> >     bool put(void* outData); // Another regular cache operation.
> >     ...
> >     void invoke(int operationType, void* inData, void* outData); //
> > Extensibility point.
> > }
> >
> > In this example we define "invoke" method where use may pass virtually
> > anything. So, when some new functionallity is required he will implement
> it
> > in Java and inject it into Ignite somehow (e.g. through config) and
> > implement it in native platform. But he WILL NOT have to change any
> Ignite
> > C interfaces and rebuild them.
> >
> > *5. Configuration.*
> > Last, but not least - how to configure Ignite on other platforms.
> Currently
> > the only way to do that is Spring XML. This approach works well for Java
> > developers, but is not so good for others, because a developer who is not
> > familiar with Java/Spring will have to learn quite a bit things about
> them.
> > E.g. try configuring HashMap in Spring with an int key/value :-) Non-java
> > developers will have hard time doing this.
> > So probably we will have to let users use native mechanisms of their
> > platforms for configuration. This is not really critical from features
> > perspective, but will significantly improve user experience.
> >
> > Please share your thoughs and ideas about that.
> >
> > Vladimir.
>

Re: Integration with external platforms.

Posted by Konstantin Boudnik <co...@apache.org>.
Took me a while to (re)read and think about this. It seems to be getting more
and more important as we see a growing interest from other ASF projects to get
better integration with Ignite.

I think all these are very valid points. I'd say the integration with non-JVM
apps aren't that high-priority, but I might be mistaken in my judgement.

I wanted to specifically comment on #5 *Configuration* as UX is very important
indeed. And as always, I am thinking that perhaps having a clean DSL might
help with overcoming that hurdle: DSL can be generated by anything, it is
humanly readable, and doesn't require much of the syntactic overhead.

Cos

On Thu, Apr 16, 2015 at 02:07PM, Vladimir Ozerov wrote:
> Hi,
> 
> I'd like to propose an idea of creating new Ignite component for
> integration with other platforms such as .Net, Ruby, NodeJS, etc.
> 
> Earlier in GridGain we had thin TCP clients for Java and .Net. They had
> limited features and not-so-good performance (e.g. due to inability to
> reliable map task to affinity node, etc.). For now Java client is in
> open-source and is used only for internal purposes, and .Net client was
> fully reworked to use JVM started in the same process instead of TCP and is
> currently GridGain enterprise feature.
> 
> But as we see growing interest to the product it makes sense to expose some
> native interfaces for easy integration with our product from any platform.
> 
> Let's discuss on how platforms integration architecture should be.
> 
> *1. JVM placement.*
> One of the most important points is how native platform will communicate
> with JVM with started node. There are number of approaches to consider:
> - Start JVM in the same process. This will allow for fast communication
> between JVM and the native platform. The drawback of this approach is that
> we can start only one JVM per process. As a result this solution might not
> work in some environments (especially development ones), e.g. app servers
> when multiple native applications run in the same process and each
> application want to start a node with different JVM properties, or
> multi-process environments when there is a coordinator process which spawns
> child processes with limited lifecycle on demand (Apache, IIS, NodeJS, etc).
> - Connect to JVM using some IPC mechanism (shared memory, pipes). This
> approach might be a bit slower than the first one due to IPC overhead, but
> still pretty fast. To implement it we probably will have to create some
> intermediate management application which will start nodes in different
> processes and provide handles for native application to connect with them.
> This approach will be more flexible than the first one.
> - Connect to JVM using TCP. This will be the slowest one, but offer even
> greater flexibility, as we will be able to transaprently connect to nodes
> even on another hosts. However, this raises some failover questions.
> 
> In summary, I think we should choose "JVM in the same process" approach as
> we already have experience with it and it is prooved to be functional and
> performant, but create careful abstraction (facade) for node communication
> logic, so that shmem/pipes/tcp approaches can be implemented easily if
> needed without distirbing other components.
> 
> *2. Data transfer and serialization.*
> Another important point - how to pass data between Java and non-Java
> platforms. Obviously we will have to provide some common format for both
> interacting platforms, so that data serialized on one side could be
> deserialized on another if needed.
> For JVM-in-the-same-proc approach it make sense to organize data transfer
> over offheap memory. Earlier we experimented with more sophisticated
> mechanisms like "pin Java heap array in native platform -> write directly
> to that array -> unpin", but this approach have some serious problems (like
> JVM intrinsic method hangs while array is pinned), while not providing
> significant perofrmance benefit.
> So I think data transfer over offheap will be enough as this is simple and
> reliable solution with acceptable performance.
> Also we must remember that platforms may potentially have different
> mechanisms for data transfer. E.g., sometimes we have to marshal object to
> bytes before passing it to Java, sometimes we may just pass a pointer (e.g.
> structs in C or .Net with known layout), etc.. We should be able to
> potentially support all these cases
> 
> In summary I propose to use offheap as a default implementation, while
> still leaving a room for changing this if needed. E.g. instead of passing
> offheap pointer + data length:
> 
> void invokeOtherPlatform(long dataPointer, int dataLen);
> 
> we should design it as:
> 
> void invokeOtherPlatform(long pointer);
> 
> where pointer will encode all information required for another platform to
> read the data. E.g. it can be a pointer to memory region where the first 4
> bytes are data length and the rest are serialzied object.
> 
> *3. Queries support*
> Queries is one of the most demanded features of the product. But at the
> moment it can only work with Java objects because it uses Java
> serialization to get fields from it.
> We will have to provide user a way to alter it somehow so that objects from
> native platforms are supported as well.
> Good candidate for this is IgniteCacheObjectProcessor interface which is
> responsible for objects serialization.
> We will have to investigate what should be done to let it's implementation
> (either default or some custom) work with objects from other platforms.
> 
> *4. Extensibility*
> We will have a set of C/C++ interfaces exposing basic features (e.g. cache,
> compute, queries, etc.).
> But as we do not know in advance what implementors will want to do apart
> from regular Java methods, it make sense to leave some extensibility
> points. At the very first glance they may look as follows:
> 
> interface Cache {
>     void get(void* inData, void* outData); // Regular cache operation.
>     bool put(void* outData); // Another regular cache operation.
>     ...
>     void invoke(int operationType, void* inData, void* outData); //
> Extensibility point.
> }
> 
> In this example we define "invoke" method where use may pass virtually
> anything. So, when some new functionallity is required he will implement it
> in Java and inject it into Ignite somehow (e.g. through config) and
> implement it in native platform. But he WILL NOT have to change any Ignite
> C interfaces and rebuild them.
> 
> *5. Configuration.*
> Last, but not least - how to configure Ignite on other platforms. Currently
> the only way to do that is Spring XML. This approach works well for Java
> developers, but is not so good for others, because a developer who is not
> familiar with Java/Spring will have to learn quite a bit things about them.
> E.g. try configuring HashMap in Spring with an int key/value :-) Non-java
> developers will have hard time doing this.
> So probably we will have to let users use native mechanisms of their
> platforms for configuration. This is not really critical from features
> perspective, but will significantly improve user experience.
> 
> Please share your thoughs and ideas about that.
> 
> Vladimir.