You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Courtney Robinson <co...@hypi.io> on 2020/12/26 20:39:06 UTC

Ignite in-memory + other SQL store without fully loading all data into Ignite

We've been using Ignite in production for almost 3 years and we love the
platform but there are some increasingly frustrating points we run into.
Before the holidays a few of our engineers started looking around and have
now presented a serious case for migrating from Ignite. We would end up
using at least 3 technologies they've identified to bridge the gap left by
Ignite features but have presented good cases for why managing these
individually would be a more flexible solution we could grow with.

I am not keen on the idea as it presents a major refactor that would likely
take 6 months to get to production but I understand and agree with the
points they've made. I'm trying to find a middle ground as this seems like
the nuclear option to me.

Of the top of my head some things they've raised are:

   1. Lack of tooling
      1. Inability to change a column type + no support in schema migration
      tools that we've found (even deleting the column we can't reuse the name)
      2. We had to build our own backup solution and even now backup has
      landed in 2.9 we can't use it directly because we have implemented
      relatively granular backup to be able to push to S3 compatible APIs (Ceph
      in our case) and restore partially to per hour granularity. Whilst we've
      done it, it took some serious engineering effort and time. We considered
      open sourcing it but it was done in a way that's tightly coupled to our
      internal stack and APIs.
      2. Inconsistency between various Ignite APIs.
      1. Transactions on KV, none of SQL (or now in beta but work seem
      seems to have paused?)
      2. SQL limitations - SELECT queries never read through data from the
      external database
      3. Even if we implemented a CacheStore we have to load all data into
      Ignite to use SELECT
      4. No referential integrity enforcement
   3. It is incredibly easy to corrupt the data in Ignite persistence.
   We've gotten better due to operational experience but upgrades (in k8s)
   still on occasion lead to one or two nodes being corrupt when their pod was
   stopped

I'll stop there but the point is, after 3yrs in production the team feels
like they're always running up against a wall and that Ignite has created
that wall.

My goal in writing this is to find out more about why the limitation around
CacheStore exists, how does Ignite persistence achieve partially caching
data in memory and pulling from disk if the data is not in memory and why
can't that apply to a CacheStore as well?

What would it take to make it so that Ignite's SQL operations could be
pushed down to a CacheStore implementation?
Ignite's a relatively large code base, so hints about which
classes/interfaces to investigate if we're looking to replace Ignite
persistence would be incredibly useful. My idea at the moment is to have
Ignite as the in-memory SQL layer with a SQL MPP providing persistence.

To me right now the path forward is for us to put the work into removing
these Ignite limitations if possible. We have a mixture of on-premise
clients for our product as well as a multi-tenant SaaS version - some of
these on-prem clients depend on Ignite's in-memory capabilities and so we
can't easily take this away.

FYI it doesn't have to be CacheStore, I realise this just inherits the
JCache interface just generally can something like CacheStore be
implemented to replace the integration that Ignite persistence provides?

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io

Re: Ignite in-memory + other SQL store without fully loading all data into Ignite

Posted by Courtney Robinson <co...@hypi.io>.

Hey Denis,

It's been a while. Hope you've been keeping well!
A meet the second week of Jan will be great. The agenda looks good too.

We've been watching the calcite SQL engine
<https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine>
wiki
and task list
<https://cwiki.apache.org/confluence/display/IGNITE/Apache+Calcite-powered+SQL+Engine+Roadmap>
but
I got the impression it wouldn't be that soon. May be worth getting
involved there to keep some of the calcite APIs exposed for us to be able
to tailor it because my ideal design would have Ignite in front of anything
we end up doing.

We'll sort out the call details closer to the time. Thanks Denis, speak to
you and Val in two weeks.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


On Wed, Dec 30, 2020 at 12:02 AM Denis Magda <dm...@apache.org> wrote:

> Hi Courtney,
>
> Glad to hear from you! It's been a while since we met last time. It's
> truly disappointing seeing you struggle with Ignite that much. Thanks for
> being open and kicking the discussion off.
>
> How about three of us (you, Val, and I) meet the second week of January
> and talk out the issues? Then you can share a talk summary here with a
> broader community, if you wish. Between us, I've been personally
> championing the column-type-change feature for a while, and with this IEP
> of Ignite 3.0
> <https://cwiki.apache.org/confluence/display/IGNITE/IEP-54%3A+Schema-first+Approach>
> it should be doable.
>
> In the meantime, some items for our agenda and pointers:
>
>    - How does your upgrade procedure look like? Ignite doesn't have a
>    rolling-upgrade feature; thus, to upgrade from version A to B in a
>    *consistent way* the one should stop the cluster.
>    - re: the frustration about spending too much time on the
>    administration and infrastructure-related activities. Are you considering
>    any managed service options? As a developer, I understand this frustration.
>    I was just lucky to stay away from a need to administer Postgres, Oracle,
>    MySQL, or Ignite just because we either had an IT administrator or made use
>    of a managed-service option. Not selling anything here, just know that all
>    those administration routines are unavoidable. The thing with Ignite is
>    that distributed nature makes things more complicated.
>    - Ignite ML and SQL: in fact, you can already call Ignite ML models
>    from a SQL query. I need to search for pointers. @Alexey Zinoviev
>    <za...@gmail.com>, you can probably share some examples quicker.
>    - Calcite and Ignite: in Q2 we're planning to release the
>    Calcite-powered SQL engine in Ignite. That would be a dramatic improvement
>    in our SQL capabilities. It should be possible to enable push-downs much
>    more easily.
>
>
> -
> Denis
>
>
> On Mon, Dec 28, 2020 at 11:50 PM Courtney Robinson <
> courtney.robinson@hypi.io> wrote:
>
>> Hi Val,
>> Thanks. You're not missing anything and we have been using Ignite
>> persistence for years.
>> Among the reasons we want to switch to in-memory only is how easily we
>> seem to get corrupt nodes. I mentioned in the first email. We haven't had a
>> situation where upgrading corrupts more than 1 of our 3 replicas but we
>> genuinely fear upgrading sometimes as a result and is why we put so much
>> effort into our backup/restore solution.
>>
>> One of the arguments for disabling persistence is that other DBs seems to
>> have had longer to solve the issue. We've operated Postgres for longer than
>> Ignite persistence and have never had it corrupt its data on disk
>> (admittedly it's not distributed so that could play a role) but we've also
>> been running Cassandra for longer as well without ever having any data
>> corruption. All 3 in Kubernetes. With Ignite we've tried many things and a
>> few settings between Ignite rebalanceDelay (I forget which other ones) and
>> k8s readiness/liveness probes seems to have landed in a sweet spot that's
>> reduced how often it happens but if we happen to have any issues with the
>> k8s control plane, scheduling delays or network issues then the chances of
>> it skyrockets. It then requires manual intervention I believe the most
>> frequent issue is that a node will start and the topology diverged so the
>> cluster doesn't think it's a part of it, the most effective thing we've
>> found is deleting the data on that node and having it join as a new node
>> for re-balancing to kicking and push data back to it. I think we have a few
>> tickets somewhere with details.
>>
>> This is the primary motivator for wanting to replace Ignite persistence.
>>
>>  Here's our use case:
>> We provide a low-code app development platform. As a developer you define
>> tables in a visual interface or via GraphQL. We map these on to Ignite
>> tables. It is a multi-tenant system complete with
>> authentication/authorisation, serverless functions, API Gateway and
>> more...all the bells and whistles for building modern applications in
>> record time. We make it easy and convenient to get the scale and
>> performance you expect from a large company with as little as the effort of
>> an intern.
>>
>> The thing about convenience is that there are expectations set by others
>> (the industry in this case) that the convenient things you provide include
>> a baseline that matches industry or whoever has set the expectations.
>> It is a sizeable task to implement/enforce referential integrity and
>> delete cascade application side and I don't know of any distributed
>> database which has this built in but the lack of transactions in SQL means
>> we can't really implement it without pretty much implementing what the DB
>> should be doing to keep track and rollback on failure.
>>
>> In an earlier version of the platform we implemented this using the key
>> value APIs but we were not using SQL at all. We moved to SQL 1.5yrs ago or
>> so because we needed greater and greater query capability and we were
>> implementing a custom query layer on the KV APIs that were starting to look
>> remarkably close to what SQL could do.
>>
>> In our use case, there are only 4 tables known at development time. Every
>> app in the platform defines its tables dynamically and this has an impact
>> on what Ignite capabilities are realistic to use. Including our own UI that
>> lets users create other apps, everything is user defined.
>>
>> One issue that our customers are constantly complaining about is the
>> inability for them to change a column's type.
>> We tried generating Java classes, compiling and including Ignite
>> annotations but found that the JVM ends up doing a lot more GC leading to
>> crazy long GC pauses. Generating classes meant we could re-map the same
>> field to a different field in the generated class and silently migrate the
>> data on known types like int to float etc or with a user provided function
>> to do the conversion. It proved impractical as we'd sometimes get JVMs
>> unusable for long times and if we had this happening on 3 nodes that happen
>> to have all 3 replicas of your data then your entire app's response time
>> goes from 50 milliseconds to several seconds or however long it lasts for
>> (10s of seconds sometimes).
>>
>> This got worse as the JVM pauses would lead to Ignite map (I forget the
>> exact error but it was something about failing to execute a map during a
>> SQL query) failures on unrelated queries causing a cascade of failures
>> across tenants.
>> We're investigating a new approach for this by storing mappings from user
>> fields to different table names so we'd be able to get back the ability to
>> change the type of a field and migrate old data. We're not sure of the
>> overhead this will introduce yet, still being investigated. Your thoughts
>> here are appreciated.
>>
>> Next is on extending the available SQL dialect, we include indexing
>> capabilities for our customers. Before migrating to the SQL APIs, we
>> integrated Apache Lucene (we have a filter language that mapped to lucene
>> queries). Indices were replicated but not partitioned and we used affinity
>> compute to send queries to the right nodes which would send back a list of
>> IDs we could get using the KV APIs. This worked well and provided superior
>> filtering capabilities, migrating to the SQL APIs meant we lost this as
>> there is no way we found to extend SQL to provide a custom function that;d
>> be resolved to a custom affinity run, importantly, using the results of
>> this function to reduce/filter the data coming back from SQL queries. Right
>> now we're hacking it with some SQL operators but the search quality doesn't
>> even compare to what we could do with the Lucene + KV APIs. This is
>> something we're actively investigating as the current approach is showing
>> its limits as we grow. We've also got on-premise customers set to make this
>> unusable in Q1 as their low-code app data volumes begin to exceed what the
>> query hacks can reasonably filter on within acceptable time.
>>
>> Right now we're looking at two options: First is using the old approach
>> we had of affinity run over Lucene and using the results in an IN clause in
>> a subsequent query or forking Ignite to extend the SQL dialect and
>> integrating the lucene results in Ignite's map/filter. MVP here is doing
>> this with all supported lucene queries over a lucene index of a 1TB table.
>>
>> This last point is more of a preference than anything. We're working to
>> bring the ease and flexibility of low-code to machine learning. Ignite's ML
>> capabilities are pretty broad, my issue is that it's completely Java based.
>> It would be amazing if Ignite had taken the SQL approach or had SQL
>> integration mapped to the Java APIs (I'm thinking of Apache Madlib).
>>
>> I guess one of the frustrations is we're spending a lot of time at the DB
>> level instead of at the application level adding useful features for
>> customers. As a small startup we don't really have the bandwidth so it's a
>> constant battle between product features and infrastructure solutions.
>>
>> We are not expecting anything to happen immediately with our Ignite
>> setup, it'll stay for a while but how we use it may change drastically over
>> time. We can't easily abandon the in-memory capabilities either. One
>> suggestion being considered (to have in place by end of 2021) is using
>> Apache Calcite as the SQL layer with an Ignite driver (mapping to in-memory
>> Ignite). In this setup we'd use Calcite to push down writes to Ignite and
>> Greenplum (MPP Postgres). We'd also push down queries to Ignite in our
>> driver and where it didn't match, we'd push to Greenplum. This would give
>> us the ability to drop in the custom SQL functions to map to Lucene or
>> indeed Elasticsearch. This has the added benefit of getting things from
>> Calcite like SQL window functions. We'd effectively be using Ignite
>> in-memory to accelerate Greenplum and ES queries through a consistent SQL
>> API whilst getting the benefits of the well established and battle tested
>> Postgres tools and ecosystem...and SQL base ML. I'd appreciate your
>> thoughts here on this around the suggested Ignite use. Almost everything
>> can be pushed down to Ignite from Calcite and anything Ignite doesn't
>> support Calcite can emulate.
>>
>> Regards,
>> Courtney Robinson
>> Founder and CEO, Hypi
>> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
>>
>> <https://hypi.io>
>> https://hypi.io
>>
>>
>> On Tue, Dec 29, 2020 at 4:01 AM Valentin Kulichenko <
>> valentin.kulichenko@gmail.com> wrote:
>>
>>> Hi Courtney,
>>>
>>> Thanks for your feedback!
>>>
>>> To cut the story short, Ignite implements page memory architecture. All
>>> data is split into fixed-sized pages and any page can reside either both in
>>> memory and on disk or on disk only. Since the in-memory layer and
>>> persistence layer are natively integrated - i.e. there is a unified
>>> mechanism for page lookup - Ignite storage engine always knows where a
>>> particular page is and can transparently load it into memory if needed.
>>> From this perspective, Ignite is actually very similar to how other
>>> databases work, although it has a much bigger focus on scalability and
>>> in-memory processing, richer compute capabilities, etc.
>>>
>>> With an external database it's much more complicated - generally
>>> speaking, there is no way of knowing which data needs to be loaded from the
>>> database into memory for the purpose of an arbitrary query. This is a
>>> fundamental limitation and quite frankly, I don't think there is any other
>>> technology that can do that. Not in a consistent and performant way at
>>> least. As a matter of fact, this limitation is exactly what used to be the
>>> primary driver behind the Native Persistent development in the first place.
>>>
>>> Is there any particular reason why you can't use Native Persistence
>>> instead of an external database? It sounds like that's what you need for
>>> your use case unless I'm missing something. Can you tell us a little bit
>>> more about your use case and requirements? I'm sure we can come up with a
>>> solution that would satisfy those requirements without a need for
>>> re-implementing the whole thing.
>>>
>>> Thanks,
>>> Val
>>>
>>> On Mon, Dec 28, 2020 at 12:58 PM Courtney Robinson <
>>> courtney.robinson@hypi.io> wrote:
>>>
>>>> I know this was over the holiday so bumping. Can anyone provide any
>>>> pointers on where to start looking or anything else mentioned in the
>>>> previous email?
>>>> Thanks
>>>>
>>>> On Sat, Dec 26, 2020 at 8:39 PM Courtney Robinson <
>>>> courtney.robinson@hypi.io> wrote:
>>>>
>>>>> We've been using Ignite in production for almost 3 years and we love
>>>>> the platform but there are some increasingly frustrating points we run
>>>>> into. Before the holidays a few of our engineers started looking around and
>>>>> have now presented a serious case for migrating from Ignite. We would end
>>>>> up using at least 3 technologies they've identified to bridge the gap left
>>>>> by Ignite features but have presented good cases for why managing these
>>>>> individually would be a more flexible solution we could grow with.
>>>>>
>>>>> I am not keen on the idea as it presents a major refactor that would
>>>>> likely take 6 months to get to production but I understand and agree with
>>>>> the points they've made. I'm trying to find a middle ground as this seems
>>>>> like the nuclear option to me.
>>>>>
>>>>> Of the top of my head some things they've raised are:
>>>>>
>>>>>    1. Lack of tooling
>>>>>       1. Inability to change a column type + no support in schema
>>>>>       migration tools that we've found (even deleting the column we can't reuse
>>>>>       the name)
>>>>>       2. We had to build our own backup solution and even now backup
>>>>>       has landed in 2.9 we can't use it directly because we have implemented
>>>>>       relatively granular backup to be able to push to S3 compatible APIs (Ceph
>>>>>       in our case) and restore partially to per hour granularity. Whilst we've
>>>>>       done it, it took some serious engineering effort and time. We considered
>>>>>       open sourcing it but it was done in a way that's tightly coupled to our
>>>>>       internal stack and APIs.
>>>>>       2. Inconsistency between various Ignite APIs.
>>>>>       1. Transactions on KV, none of SQL (or now in beta but work
>>>>>       seem seems to have paused?)
>>>>>       2. SQL limitations - SELECT queries never read through data
>>>>>       from the external database
>>>>>       3. Even if we implemented a CacheStore we have to load all data
>>>>>       into Ignite to use SELECT
>>>>>       4. No referential integrity enforcement
>>>>>    3. It is incredibly easy to corrupt the data in Ignite
>>>>>    persistence. We've gotten better due to operational experience but upgrades
>>>>>    (in k8s) still on occasion lead to one or two nodes being corrupt when
>>>>>    their pod was stopped
>>>>>
>>>>> I'll stop there but the point is, after 3yrs in production the team
>>>>> feels like they're always running up against a wall and that Ignite has
>>>>> created that wall.
>>>>>
>>>>> My goal in writing this is to find out more about why the limitation
>>>>> around CacheStore exists, how does Ignite persistence achieve partially
>>>>> caching data in memory and pulling from disk if the data is not in memory
>>>>> and why can't that apply to a CacheStore as well?
>>>>>
>>>>> What would it take to make it so that Ignite's SQL operations could be
>>>>> pushed down to a CacheStore implementation?
>>>>> Ignite's a relatively large code base, so hints about which
>>>>> classes/interfaces to investigate if we're looking to replace Ignite
>>>>> persistence would be incredibly useful. My idea at the moment is to have
>>>>> Ignite as the in-memory SQL layer with a SQL MPP providing persistence.
>>>>>
>>>>> To me right now the path forward is for us to put the work into
>>>>> removing these Ignite limitations if possible. We have a mixture of
>>>>> on-premise clients for our product as well as a multi-tenant SaaS version -
>>>>> some of these on-prem clients depend on Ignite's in-memory capabilities and
>>>>> so we can't easily take this away.
>>>>>
>>>>> FYI it doesn't have to be CacheStore, I realise this just inherits the
>>>>> JCache interface just generally can something like CacheStore be
>>>>> implemented to replace the integration that Ignite persistence provides?
>>>>>
>>>>> Regards,
>>>>> Courtney Robinson
>>>>> Founder and CEO, Hypi
>>>>> https://hypi.io
>>>>>
>>>>

Re: Ignite in-memory + other SQL store without fully loading all data into Ignite

Posted by Denis Magda <dm...@apache.org>.

Hi Courtney,

Glad to hear from you! It's been a while since we met last time. It's truly
disappointing seeing you struggle with Ignite that much. Thanks for being
open and kicking the discussion off.

How about three of us (you, Val, and I) meet the second week of January and
talk out the issues? Then you can share a talk summary here with a broader
community, if you wish. Between us, I've been personally championing the
column-type-change feature for a while, and with this IEP of Ignite 3.0
<https://cwiki.apache.org/confluence/display/IGNITE/IEP-54%3A+Schema-first+Approach>
it should be doable.

In the meantime, some items for our agenda and pointers:

   - How does your upgrade procedure look like? Ignite doesn't have a
   rolling-upgrade feature; thus, to upgrade from version A to B in a
   *consistent way* the one should stop the cluster.
   - re: the frustration about spending too much time on the administration
   and infrastructure-related activities. Are you considering any managed
   service options? As a developer, I understand this frustration. I was just
   lucky to stay away from a need to administer Postgres, Oracle, MySQL, or
   Ignite just because we either had an IT administrator or made use of a
   managed-service option. Not selling anything here, just know that all those
   administration routines are unavoidable. The thing with Ignite is that
   distributed nature makes things more complicated.
   - Ignite ML and SQL: in fact, you can already call Ignite ML models from
   a SQL query. I need to search for pointers. @Alexey Zinoviev
   <za...@gmail.com>, you can probably share some examples quicker.
   - Calcite and Ignite: in Q2 we're planning to release the
   Calcite-powered SQL engine in Ignite. That would be a dramatic improvement
   in our SQL capabilities. It should be possible to enable push-downs much
   more easily.


-
Denis


On Mon, Dec 28, 2020 at 11:50 PM Courtney Robinson <
courtney.robinson@hypi.io> wrote:

> Hi Val,
> Thanks. You're not missing anything and we have been using Ignite
> persistence for years.
> Among the reasons we want to switch to in-memory only is how easily we
> seem to get corrupt nodes. I mentioned in the first email. We haven't had a
> situation where upgrading corrupts more than 1 of our 3 replicas but we
> genuinely fear upgrading sometimes as a result and is why we put so much
> effort into our backup/restore solution.
>
> One of the arguments for disabling persistence is that other DBs seems to
> have had longer to solve the issue. We've operated Postgres for longer than
> Ignite persistence and have never had it corrupt its data on disk
> (admittedly it's not distributed so that could play a role) but we've also
> been running Cassandra for longer as well without ever having any data
> corruption. All 3 in Kubernetes. With Ignite we've tried many things and a
> few settings between Ignite rebalanceDelay (I forget which other ones) and
> k8s readiness/liveness probes seems to have landed in a sweet spot that's
> reduced how often it happens but if we happen to have any issues with the
> k8s control plane, scheduling delays or network issues then the chances of
> it skyrockets. It then requires manual intervention I believe the most
> frequent issue is that a node will start and the topology diverged so the
> cluster doesn't think it's a part of it, the most effective thing we've
> found is deleting the data on that node and having it join as a new node
> for re-balancing to kicking and push data back to it. I think we have a few
> tickets somewhere with details.
>
> This is the primary motivator for wanting to replace Ignite persistence.
>
>  Here's our use case:
> We provide a low-code app development platform. As a developer you define
> tables in a visual interface or via GraphQL. We map these on to Ignite
> tables. It is a multi-tenant system complete with
> authentication/authorisation, serverless functions, API Gateway and
> more...all the bells and whistles for building modern applications in
> record time. We make it easy and convenient to get the scale and
> performance you expect from a large company with as little as the effort of
> an intern.
>
> The thing about convenience is that there are expectations set by others
> (the industry in this case) that the convenient things you provide include
> a baseline that matches industry or whoever has set the expectations.
> It is a sizeable task to implement/enforce referential integrity and
> delete cascade application side and I don't know of any distributed
> database which has this built in but the lack of transactions in SQL means
> we can't really implement it without pretty much implementing what the DB
> should be doing to keep track and rollback on failure.
>
> In an earlier version of the platform we implemented this using the key
> value APIs but we were not using SQL at all. We moved to SQL 1.5yrs ago or
> so because we needed greater and greater query capability and we were
> implementing a custom query layer on the KV APIs that were starting to look
> remarkably close to what SQL could do.
>
> In our use case, there are only 4 tables known at development time. Every
> app in the platform defines its tables dynamically and this has an impact
> on what Ignite capabilities are realistic to use. Including our own UI that
> lets users create other apps, everything is user defined.
>
> One issue that our customers are constantly complaining about is the
> inability for them to change a column's type.
> We tried generating Java classes, compiling and including Ignite
> annotations but found that the JVM ends up doing a lot more GC leading to
> crazy long GC pauses. Generating classes meant we could re-map the same
> field to a different field in the generated class and silently migrate the
> data on known types like int to float etc or with a user provided function
> to do the conversion. It proved impractical as we'd sometimes get JVMs
> unusable for long times and if we had this happening on 3 nodes that happen
> to have all 3 replicas of your data then your entire app's response time
> goes from 50 milliseconds to several seconds or however long it lasts for
> (10s of seconds sometimes).
>
> This got worse as the JVM pauses would lead to Ignite map (I forget the
> exact error but it was something about failing to execute a map during a
> SQL query) failures on unrelated queries causing a cascade of failures
> across tenants.
> We're investigating a new approach for this by storing mappings from user
> fields to different table names so we'd be able to get back the ability to
> change the type of a field and migrate old data. We're not sure of the
> overhead this will introduce yet, still being investigated. Your thoughts
> here are appreciated.
>
> Next is on extending the available SQL dialect, we include indexing
> capabilities for our customers. Before migrating to the SQL APIs, we
> integrated Apache Lucene (we have a filter language that mapped to lucene
> queries). Indices were replicated but not partitioned and we used affinity
> compute to send queries to the right nodes which would send back a list of
> IDs we could get using the KV APIs. This worked well and provided superior
> filtering capabilities, migrating to the SQL APIs meant we lost this as
> there is no way we found to extend SQL to provide a custom function that;d
> be resolved to a custom affinity run, importantly, using the results of
> this function to reduce/filter the data coming back from SQL queries. Right
> now we're hacking it with some SQL operators but the search quality doesn't
> even compare to what we could do with the Lucene + KV APIs. This is
> something we're actively investigating as the current approach is showing
> its limits as we grow. We've also got on-premise customers set to make this
> unusable in Q1 as their low-code app data volumes begin to exceed what the
> query hacks can reasonably filter on within acceptable time.
>
> Right now we're looking at two options: First is using the old approach we
> had of affinity run over Lucene and using the results in an IN clause in a
> subsequent query or forking Ignite to extend the SQL dialect and
> integrating the lucene results in Ignite's map/filter. MVP here is doing
> this with all supported lucene queries over a lucene index of a 1TB table.
>
> This last point is more of a preference than anything. We're working to
> bring the ease and flexibility of low-code to machine learning. Ignite's ML
> capabilities are pretty broad, my issue is that it's completely Java based.
> It would be amazing if Ignite had taken the SQL approach or had SQL
> integration mapped to the Java APIs (I'm thinking of Apache Madlib).
>
> I guess one of the frustrations is we're spending a lot of time at the DB
> level instead of at the application level adding useful features for
> customers. As a small startup we don't really have the bandwidth so it's a
> constant battle between product features and infrastructure solutions.
>
> We are not expecting anything to happen immediately with our Ignite setup,
> it'll stay for a while but how we use it may change drastically over time.
> We can't easily abandon the in-memory capabilities either. One suggestion
> being considered (to have in place by end of 2021) is using Apache Calcite
> as the SQL layer with an Ignite driver (mapping to in-memory Ignite). In
> this setup we'd use Calcite to push down writes to Ignite and Greenplum
> (MPP Postgres). We'd also push down queries to Ignite in our driver and
> where it didn't match, we'd push to Greenplum. This would give us the
> ability to drop in the custom SQL functions to map to Lucene or indeed
> Elasticsearch. This has the added benefit of getting things from Calcite
> like SQL window functions. We'd effectively be using Ignite in-memory to
> accelerate Greenplum and ES queries through a consistent SQL API whilst
> getting the benefits of the well established and battle tested Postgres
> tools and ecosystem...and SQL base ML. I'd appreciate your thoughts here on
> this around the suggested Ignite use. Almost everything can be pushed down
> to Ignite from Calcite and anything Ignite doesn't support Calcite can
> emulate.
>
> Regards,
> Courtney Robinson
> Founder and CEO, Hypi
> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
>
> <https://hypi.io>
> https://hypi.io
>
>
> On Tue, Dec 29, 2020 at 4:01 AM Valentin Kulichenko <
> valentin.kulichenko@gmail.com> wrote:
>
>> Hi Courtney,
>>
>> Thanks for your feedback!
>>
>> To cut the story short, Ignite implements page memory architecture. All
>> data is split into fixed-sized pages and any page can reside either both in
>> memory and on disk or on disk only. Since the in-memory layer and
>> persistence layer are natively integrated - i.e. there is a unified
>> mechanism for page lookup - Ignite storage engine always knows where a
>> particular page is and can transparently load it into memory if needed.
>> From this perspective, Ignite is actually very similar to how other
>> databases work, although it has a much bigger focus on scalability and
>> in-memory processing, richer compute capabilities, etc.
>>
>> With an external database it's much more complicated - generally
>> speaking, there is no way of knowing which data needs to be loaded from the
>> database into memory for the purpose of an arbitrary query. This is a
>> fundamental limitation and quite frankly, I don't think there is any other
>> technology that can do that. Not in a consistent and performant way at
>> least. As a matter of fact, this limitation is exactly what used to be the
>> primary driver behind the Native Persistent development in the first place.
>>
>> Is there any particular reason why you can't use Native Persistence
>> instead of an external database? It sounds like that's what you need for
>> your use case unless I'm missing something. Can you tell us a little bit
>> more about your use case and requirements? I'm sure we can come up with a
>> solution that would satisfy those requirements without a need for
>> re-implementing the whole thing.
>>
>> Thanks,
>> Val
>>
>> On Mon, Dec 28, 2020 at 12:58 PM Courtney Robinson <
>> courtney.robinson@hypi.io> wrote:
>>
>>> I know this was over the holiday so bumping. Can anyone provide any
>>> pointers on where to start looking or anything else mentioned in the
>>> previous email?
>>> Thanks
>>>
>>> On Sat, Dec 26, 2020 at 8:39 PM Courtney Robinson <
>>> courtney.robinson@hypi.io> wrote:
>>>
>>>> We've been using Ignite in production for almost 3 years and we love
>>>> the platform but there are some increasingly frustrating points we run
>>>> into. Before the holidays a few of our engineers started looking around and
>>>> have now presented a serious case for migrating from Ignite. We would end
>>>> up using at least 3 technologies they've identified to bridge the gap left
>>>> by Ignite features but have presented good cases for why managing these
>>>> individually would be a more flexible solution we could grow with.
>>>>
>>>> I am not keen on the idea as it presents a major refactor that would
>>>> likely take 6 months to get to production but I understand and agree with
>>>> the points they've made. I'm trying to find a middle ground as this seems
>>>> like the nuclear option to me.
>>>>
>>>> Of the top of my head some things they've raised are:
>>>>
>>>>    1. Lack of tooling
>>>>       1. Inability to change a column type + no support in schema
>>>>       migration tools that we've found (even deleting the column we can't reuse
>>>>       the name)
>>>>       2. We had to build our own backup solution and even now backup
>>>>       has landed in 2.9 we can't use it directly because we have implemented
>>>>       relatively granular backup to be able to push to S3 compatible APIs (Ceph
>>>>       in our case) and restore partially to per hour granularity. Whilst we've
>>>>       done it, it took some serious engineering effort and time. We considered
>>>>       open sourcing it but it was done in a way that's tightly coupled to our
>>>>       internal stack and APIs.
>>>>       2. Inconsistency between various Ignite APIs.
>>>>       1. Transactions on KV, none of SQL (or now in beta but work seem
>>>>       seems to have paused?)
>>>>       2. SQL limitations - SELECT queries never read through data from
>>>>       the external database
>>>>       3. Even if we implemented a CacheStore we have to load all data
>>>>       into Ignite to use SELECT
>>>>       4. No referential integrity enforcement
>>>>    3. It is incredibly easy to corrupt the data in Ignite persistence.
>>>>    We've gotten better due to operational experience but upgrades (in k8s)
>>>>    still on occasion lead to one or two nodes being corrupt when their pod was
>>>>    stopped
>>>>
>>>> I'll stop there but the point is, after 3yrs in production the team
>>>> feels like they're always running up against a wall and that Ignite has
>>>> created that wall.
>>>>
>>>> My goal in writing this is to find out more about why the limitation
>>>> around CacheStore exists, how does Ignite persistence achieve partially
>>>> caching data in memory and pulling from disk if the data is not in memory
>>>> and why can't that apply to a CacheStore as well?
>>>>
>>>> What would it take to make it so that Ignite's SQL operations could be
>>>> pushed down to a CacheStore implementation?
>>>> Ignite's a relatively large code base, so hints about which
>>>> classes/interfaces to investigate if we're looking to replace Ignite
>>>> persistence would be incredibly useful. My idea at the moment is to have
>>>> Ignite as the in-memory SQL layer with a SQL MPP providing persistence.
>>>>
>>>> To me right now the path forward is for us to put the work into
>>>> removing these Ignite limitations if possible. We have a mixture of
>>>> on-premise clients for our product as well as a multi-tenant SaaS version -
>>>> some of these on-prem clients depend on Ignite's in-memory capabilities and
>>>> so we can't easily take this away.
>>>>
>>>> FYI it doesn't have to be CacheStore, I realise this just inherits the
>>>> JCache interface just generally can something like CacheStore be
>>>> implemented to replace the integration that Ignite persistence provides?
>>>>
>>>> Regards,
>>>> Courtney Robinson
>>>> Founder and CEO, Hypi
>>>> https://hypi.io
>>>>
>>>

Re: Ignite in-memory + other SQL store without fully loading all data into Ignite

Posted by Courtney Robinson <co...@hypi.io>.

Hi Val,
Thanks. You're not missing anything and we have been using Ignite
persistence for years.
Among the reasons we want to switch to in-memory only is how easily we seem
to get corrupt nodes. I mentioned in the first email. We haven't had a
situation where upgrading corrupts more than 1 of our 3 replicas but we
genuinely fear upgrading sometimes as a result and is why we put so much
effort into our backup/restore solution.

One of the arguments for disabling persistence is that other DBs seems to
have had longer to solve the issue. We've operated Postgres for longer than
Ignite persistence and have never had it corrupt its data on disk
(admittedly it's not distributed so that could play a role) but we've also
been running Cassandra for longer as well without ever having any data
corruption. All 3 in Kubernetes. With Ignite we've tried many things and a
few settings between Ignite rebalanceDelay (I forget which other ones) and
k8s readiness/liveness probes seems to have landed in a sweet spot that's
reduced how often it happens but if we happen to have any issues with the
k8s control plane, scheduling delays or network issues then the chances of
it skyrockets. It then requires manual intervention I believe the most
frequent issue is that a node will start and the topology diverged so the
cluster doesn't think it's a part of it, the most effective thing we've
found is deleting the data on that node and having it join as a new node
for re-balancing to kicking and push data back to it. I think we have a few
tickets somewhere with details.

This is the primary motivator for wanting to replace Ignite persistence.

 Here's our use case:
We provide a low-code app development platform. As a developer you define
tables in a visual interface or via GraphQL. We map these on to Ignite
tables. It is a multi-tenant system complete with
authentication/authorisation, serverless functions, API Gateway and
more...all the bells and whistles for building modern applications in
record time. We make it easy and convenient to get the scale and
performance you expect from a large company with as little as the effort of
an intern.

The thing about convenience is that there are expectations set by others
(the industry in this case) that the convenient things you provide include
a baseline that matches industry or whoever has set the expectations.
It is a sizeable task to implement/enforce referential integrity and delete
cascade application side and I don't know of any distributed database which
has this built in but the lack of transactions in SQL means we can't really
implement it without pretty much implementing what the DB should be doing
to keep track and rollback on failure.

In an earlier version of the platform we implemented this using the key
value APIs but we were not using SQL at all. We moved to SQL 1.5yrs ago or
so because we needed greater and greater query capability and we were
implementing a custom query layer on the KV APIs that were starting to look
remarkably close to what SQL could do.

In our use case, there are only 4 tables known at development time. Every
app in the platform defines its tables dynamically and this has an impact
on what Ignite capabilities are realistic to use. Including our own UI that
lets users create other apps, everything is user defined.

One issue that our customers are constantly complaining about is the
inability for them to change a column's type.
We tried generating Java classes, compiling and including Ignite
annotations but found that the JVM ends up doing a lot more GC leading to
crazy long GC pauses. Generating classes meant we could re-map the same
field to a different field in the generated class and silently migrate the
data on known types like int to float etc or with a user provided function
to do the conversion. It proved impractical as we'd sometimes get JVMs
unusable for long times and if we had this happening on 3 nodes that happen
to have all 3 replicas of your data then your entire app's response time
goes from 50 milliseconds to several seconds or however long it lasts for
(10s of seconds sometimes).

This got worse as the JVM pauses would lead to Ignite map (I forget the
exact error but it was something about failing to execute a map during a
SQL query) failures on unrelated queries causing a cascade of failures
across tenants.
We're investigating a new approach for this by storing mappings from user
fields to different table names so we'd be able to get back the ability to
change the type of a field and migrate old data. We're not sure of the
overhead this will introduce yet, still being investigated. Your thoughts
here are appreciated.

Next is on extending the available SQL dialect, we include indexing
capabilities for our customers. Before migrating to the SQL APIs, we
integrated Apache Lucene (we have a filter language that mapped to lucene
queries). Indices were replicated but not partitioned and we used affinity
compute to send queries to the right nodes which would send back a list of
IDs we could get using the KV APIs. This worked well and provided superior
filtering capabilities, migrating to the SQL APIs meant we lost this as
there is no way we found to extend SQL to provide a custom function that;d
be resolved to a custom affinity run, importantly, using the results of
this function to reduce/filter the data coming back from SQL queries. Right
now we're hacking it with some SQL operators but the search quality doesn't
even compare to what we could do with the Lucene + KV APIs. This is
something we're actively investigating as the current approach is showing
its limits as we grow. We've also got on-premise customers set to make this
unusable in Q1 as their low-code app data volumes begin to exceed what the
query hacks can reasonably filter on within acceptable time.

Right now we're looking at two options: First is using the old approach we
had of affinity run over Lucene and using the results in an IN clause in a
subsequent query or forking Ignite to extend the SQL dialect and
integrating the lucene results in Ignite's map/filter. MVP here is doing
this with all supported lucene queries over a lucene index of a 1TB table.

This last point is more of a preference than anything. We're working to
bring the ease and flexibility of low-code to machine learning. Ignite's ML
capabilities are pretty broad, my issue is that it's completely Java based.
It would be amazing if Ignite had taken the SQL approach or had SQL
integration mapped to the Java APIs (I'm thinking of Apache Madlib).

I guess one of the frustrations is we're spending a lot of time at the DB
level instead of at the application level adding useful features for
customers. As a small startup we don't really have the bandwidth so it's a
constant battle between product features and infrastructure solutions.

We are not expecting anything to happen immediately with our Ignite setup,
it'll stay for a while but how we use it may change drastically over time.
We can't easily abandon the in-memory capabilities either. One suggestion
being considered (to have in place by end of 2021) is using Apache Calcite
as the SQL layer with an Ignite driver (mapping to in-memory Ignite). In
this setup we'd use Calcite to push down writes to Ignite and Greenplum
(MPP Postgres). We'd also push down queries to Ignite in our driver and
where it didn't match, we'd push to Greenplum. This would give us the
ability to drop in the custom SQL functions to map to Lucene or indeed
Elasticsearch. This has the added benefit of getting things from Calcite
like SQL window functions. We'd effectively be using Ignite in-memory to
accelerate Greenplum and ES queries through a consistent SQL API whilst
getting the benefits of the well established and battle tested Postgres
tools and ecosystem...and SQL base ML. I'd appreciate your thoughts here on
this around the suggested Ignite use. Almost everything can be pushed down
to Ignite from Calcite and anything Ignite doesn't support Calcite can
emulate.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io

On Tue, Dec 29, 2020 at 4:01 AM Valentin Kulichenko <
valentin.kulichenko@gmail.com> wrote:

> Hi Courtney,
>
> Thanks for your feedback!
>
> To cut the story short, Ignite implements page memory architecture. All
> data is split into fixed-sized pages and any page can reside either both in
> memory and on disk or on disk only. Since the in-memory layer and
> persistence layer are natively integrated - i.e. there is a unified
> mechanism for page lookup - Ignite storage engine always knows where a
> particular page is and can transparently load it into memory if needed.
> From this perspective, Ignite is actually very similar to how other
> databases work, although it has a much bigger focus on scalability and
> in-memory processing, richer compute capabilities, etc.
>
> With an external database it's much more complicated - generally speaking,
> there is no way of knowing which data needs to be loaded from the database
> into memory for the purpose of an arbitrary query. This is a fundamental
> limitation and quite frankly, I don't think there is any other
> technology that can do that. Not in a consistent and performant way at
> least. As a matter of fact, this limitation is exactly what used to be the
> primary driver behind the Native Persistent development in the first place.
>
> Is there any particular reason why you can't use Native Persistence
> instead of an external database? It sounds like that's what you need for
> your use case unless I'm missing something. Can you tell us a little bit
> more about your use case and requirements? I'm sure we can come up with a
> solution that would satisfy those requirements without a need for
> re-implementing the whole thing.
>
> Thanks,
> Val
>
> On Mon, Dec 28, 2020 at 12:58 PM Courtney Robinson <
> courtney.robinson@hypi.io> wrote:
>
>> I know this was over the holiday so bumping. Can anyone provide any
>> pointers on where to start looking or anything else mentioned in the
>> previous email?
>> Thanks
>>
>> On Sat, Dec 26, 2020 at 8:39 PM Courtney Robinson <
>> courtney.robinson@hypi.io> wrote:
>>
>>> We've been using Ignite in production for almost 3 years and we love the
>>> platform but there are some increasingly frustrating points we run into.
>>> Before the holidays a few of our engineers started looking around and have
>>> now presented a serious case for migrating from Ignite. We would end up
>>> using at least 3 technologies they've identified to bridge the gap left by
>>> Ignite features but have presented good cases for why managing these
>>> individually would be a more flexible solution we could grow with.
>>>
>>> I am not keen on the idea as it presents a major refactor that would
>>> likely take 6 months to get to production but I understand and agree with
>>> the points they've made. I'm trying to find a middle ground as this seems
>>> like the nuclear option to me.
>>>
>>> Of the top of my head some things they've raised are:
>>>
>>>    1. Lack of tooling
>>>       1. Inability to change a column type + no support in schema
>>>       migration tools that we've found (even deleting the column we can't reuse
>>>       the name)
>>>       2. We had to build our own backup solution and even now backup
>>>       has landed in 2.9 we can't use it directly because we have implemented
>>>       relatively granular backup to be able to push to S3 compatible APIs (Ceph
>>>       in our case) and restore partially to per hour granularity. Whilst we've
>>>       done it, it took some serious engineering effort and time. We considered
>>>       open sourcing it but it was done in a way that's tightly coupled to our
>>>       internal stack and APIs.
>>>       2. Inconsistency between various Ignite APIs.
>>>       1. Transactions on KV, none of SQL (or now in beta but work seem
>>>       seems to have paused?)
>>>       2. SQL limitations - SELECT queries never read through data from
>>>       the external database
>>>       3. Even if we implemented a CacheStore we have to load all data
>>>       into Ignite to use SELECT
>>>       4. No referential integrity enforcement
>>>    3. It is incredibly easy to corrupt the data in Ignite persistence.
>>>    We've gotten better due to operational experience but upgrades (in k8s)
>>>    still on occasion lead to one or two nodes being corrupt when their pod was
>>>    stopped
>>>
>>> I'll stop there but the point is, after 3yrs in production the team
>>> feels like they're always running up against a wall and that Ignite has
>>> created that wall.
>>>
>>> My goal in writing this is to find out more about why the limitation
>>> around CacheStore exists, how does Ignite persistence achieve partially
>>> caching data in memory and pulling from disk if the data is not in memory
>>> and why can't that apply to a CacheStore as well?
>>>
>>> What would it take to make it so that Ignite's SQL operations could be
>>> pushed down to a CacheStore implementation?
>>> Ignite's a relatively large code base, so hints about which
>>> classes/interfaces to investigate if we're looking to replace Ignite
>>> persistence would be incredibly useful. My idea at the moment is to have
>>> Ignite as the in-memory SQL layer with a SQL MPP providing persistence.
>>>
>>> To me right now the path forward is for us to put the work into removing
>>> these Ignite limitations if possible. We have a mixture of on-premise
>>> clients for our product as well as a multi-tenant SaaS version - some of
>>> these on-prem clients depend on Ignite's in-memory capabilities and so we
>>> can't easily take this away.
>>>
>>> FYI it doesn't have to be CacheStore, I realise this just inherits the
>>> JCache interface just generally can something like CacheStore be
>>> implemented to replace the integration that Ignite persistence provides?
>>>
>>> Regards,
>>> Courtney Robinson
>>> Founder and CEO, Hypi
>>> https://hypi.io
>>>
>>

Re: Ignite in-memory + other SQL store without fully loading all data into Ignite

Posted by Valentin Kulichenko <va...@gmail.com>.

Hi Courtney,

Thanks for your feedback!

To cut the story short, Ignite implements page memory architecture. All
data is split into fixed-sized pages and any page can reside either both in
memory and on disk or on disk only. Since the in-memory layer and
persistence layer are natively integrated - i.e. there is a unified
mechanism for page lookup - Ignite storage engine always knows where a
particular page is and can transparently load it into memory if needed.
From this perspective, Ignite is actually very similar to how other
databases work, although it has a much bigger focus on scalability and
in-memory processing, richer compute capabilities, etc.

With an external database it's much more complicated - generally speaking,
there is no way of knowing which data needs to be loaded from the database
into memory for the purpose of an arbitrary query. This is a fundamental
limitation and quite frankly, I don't think there is any other
technology that can do that. Not in a consistent and performant way at
least. As a matter of fact, this limitation is exactly what used to be the
primary driver behind the Native Persistent development in the first place.

Is there any particular reason why you can't use Native Persistence instead
of an external database? It sounds like that's what you need for your use
case unless I'm missing something. Can you tell us a little bit more about
your use case and requirements? I'm sure we can come up with a solution
that would satisfy those requirements without a need for re-implementing
the whole thing.

Thanks,
Val

On Mon, Dec 28, 2020 at 12:58 PM Courtney Robinson <
courtney.robinson@hypi.io> wrote:

> I know this was over the holiday so bumping. Can anyone provide any
> pointers on where to start looking or anything else mentioned in the
> previous email?
> Thanks
>
> On Sat, Dec 26, 2020 at 8:39 PM Courtney Robinson <
> courtney.robinson@hypi.io> wrote:
>
>> We've been using Ignite in production for almost 3 years and we love the
>> platform but there are some increasingly frustrating points we run into.
>> Before the holidays a few of our engineers started looking around and have
>> now presented a serious case for migrating from Ignite. We would end up
>> using at least 3 technologies they've identified to bridge the gap left by
>> Ignite features but have presented good cases for why managing these
>> individually would be a more flexible solution we could grow with.
>>
>> I am not keen on the idea as it presents a major refactor that would
>> likely take 6 months to get to production but I understand and agree with
>> the points they've made. I'm trying to find a middle ground as this seems
>> like the nuclear option to me.
>>
>> Of the top of my head some things they've raised are:
>>
>>    1. Lack of tooling
>>       1. Inability to change a column type + no support in schema
>>       migration tools that we've found (even deleting the column we can't reuse
>>       the name)
>>       2. We had to build our own backup solution and even now backup has
>>       landed in 2.9 we can't use it directly because we have implemented
>>       relatively granular backup to be able to push to S3 compatible APIs (Ceph
>>       in our case) and restore partially to per hour granularity. Whilst we've
>>       done it, it took some serious engineering effort and time. We considered
>>       open sourcing it but it was done in a way that's tightly coupled to our
>>       internal stack and APIs.
>>       2. Inconsistency between various Ignite APIs.
>>       1. Transactions on KV, none of SQL (or now in beta but work seem
>>       seems to have paused?)
>>       2. SQL limitations - SELECT queries never read through data from
>>       the external database
>>       3. Even if we implemented a CacheStore we have to load all data
>>       into Ignite to use SELECT
>>       4. No referential integrity enforcement
>>    3. It is incredibly easy to corrupt the data in Ignite persistence.
>>    We've gotten better due to operational experience but upgrades (in k8s)
>>    still on occasion lead to one or two nodes being corrupt when their pod was
>>    stopped
>>
>> I'll stop there but the point is, after 3yrs in production the team feels
>> like they're always running up against a wall and that Ignite has created
>> that wall.
>>
>> My goal in writing this is to find out more about why the limitation
>> around CacheStore exists, how does Ignite persistence achieve partially
>> caching data in memory and pulling from disk if the data is not in memory
>> and why can't that apply to a CacheStore as well?
>>
>> What would it take to make it so that Ignite's SQL operations could be
>> pushed down to a CacheStore implementation?
>> Ignite's a relatively large code base, so hints about which
>> classes/interfaces to investigate if we're looking to replace Ignite
>> persistence would be incredibly useful. My idea at the moment is to have
>> Ignite as the in-memory SQL layer with a SQL MPP providing persistence.
>>
>> To me right now the path forward is for us to put the work into removing
>> these Ignite limitations if possible. We have a mixture of on-premise
>> clients for our product as well as a multi-tenant SaaS version - some of
>> these on-prem clients depend on Ignite's in-memory capabilities and so we
>> can't easily take this away.
>>
>> FYI it doesn't have to be CacheStore, I realise this just inherits the
>> JCache interface just generally can something like CacheStore be
>> implemented to replace the integration that Ignite persistence provides?
>>
>> Regards,
>> Courtney Robinson
>> Founder and CEO, Hypi
>> https://hypi.io
>>
>

Re: Ignite in-memory + other SQL store without fully loading all data into Ignite

Posted by Courtney Robinson <co...@hypi.io>.

I know this was over the holiday so bumping. Can anyone provide any
pointers on where to start looking or anything else mentioned in the
previous email?
Thanks

On Sat, Dec 26, 2020 at 8:39 PM Courtney Robinson <co...@hypi.io>
wrote:

> We've been using Ignite in production for almost 3 years and we love the
> platform but there are some increasingly frustrating points we run into.
> Before the holidays a few of our engineers started looking around and have
> now presented a serious case for migrating from Ignite. We would end up
> using at least 3 technologies they've identified to bridge the gap left by
> Ignite features but have presented good cases for why managing these
> individually would be a more flexible solution we could grow with.
>
> I am not keen on the idea as it presents a major refactor that would
> likely take 6 months to get to production but I understand and agree with
> the points they've made. I'm trying to find a middle ground as this seems
> like the nuclear option to me.
>
> Of the top of my head some things they've raised are:
>
>    1. Lack of tooling
>       1. Inability to change a column type + no support in schema
>       migration tools that we've found (even deleting the column we can't reuse
>       the name)
>       2. We had to build our own backup solution and even now backup has
>       landed in 2.9 we can't use it directly because we have implemented
>       relatively granular backup to be able to push to S3 compatible APIs (Ceph
>       in our case) and restore partially to per hour granularity. Whilst we've
>       done it, it took some serious engineering effort and time. We considered
>       open sourcing it but it was done in a way that's tightly coupled to our
>       internal stack and APIs.
>       2. Inconsistency between various Ignite APIs.
>       1. Transactions on KV, none of SQL (or now in beta but work seem
>       seems to have paused?)
>       2. SQL limitations - SELECT queries never read through data from
>       the external database
>       3. Even if we implemented a CacheStore we have to load all data
>       into Ignite to use SELECT
>       4. No referential integrity enforcement
>    3. It is incredibly easy to corrupt the data in Ignite persistence.
>    We've gotten better due to operational experience but upgrades (in k8s)
>    still on occasion lead to one or two nodes being corrupt when their pod was
>    stopped
>
> I'll stop there but the point is, after 3yrs in production the team feels
> like they're always running up against a wall and that Ignite has created
> that wall.
>
> My goal in writing this is to find out more about why the limitation
> around CacheStore exists, how does Ignite persistence achieve partially
> caching data in memory and pulling from disk if the data is not in memory
> and why can't that apply to a CacheStore as well?
>
> What would it take to make it so that Ignite's SQL operations could be
> pushed down to a CacheStore implementation?
> Ignite's a relatively large code base, so hints about which
> classes/interfaces to investigate if we're looking to replace Ignite
> persistence would be incredibly useful. My idea at the moment is to have
> Ignite as the in-memory SQL layer with a SQL MPP providing persistence.
>
> To me right now the path forward is for us to put the work into removing
> these Ignite limitations if possible. We have a mixture of on-premise
> clients for our product as well as a multi-tenant SaaS version - some of
> these on-prem clients depend on Ignite's in-memory capabilities and so we
> can't easily take this away.
>
> FYI it doesn't have to be CacheStore, I realise this just inherits the
> JCache interface just generally can something like CacheStore be
> implemented to replace the integration that Ignite persistence provides?
>
> Regards,
> Courtney Robinson
> Founder and CEO, Hypi
> https://hypi.io
>