You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Ivan Pavlukhin <vo...@gmail.com> on 2021/08/03 09:38:04 UTC

Re: Apache Ignite 3 Alpha 2 webinar follow up questions

Hi Courtney,

Statistics for query planning is a rather complex subject.
Unfortunately I am now aware of technical details how is it expected
to be implemented in Ignite 3.

Folks driving SQL please step in.

2021-07-31 20:18 GMT+03:00, Courtney Robinson <co...@hypi.io>:
> Hi Ivan,
> Atri's description of the query plan being cached is what I was thinking of
> with my description.
>
> I lack the knowledge on how the statistics are maintained to really comment
> constructively Atri but my first question about the problem you raise with
> statistics would be:
>
> How/where are the stats maintained and if a query plan is cached based on
> some stats, is it not possible to invalidate the cached plan periodically
> or based on statistics changes?
>
> Regards,
> Courtney Robinson
> Founder and CEO, Hypi
> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
>
> <https://hypi.io>
> https://hypi.io
>
>
> On Sat, Jul 31, 2021 at 8:54 AM Atri Sharma <at...@apache.org> wrote:
>
>> Query caching works on three levels - caching results, caching blocks and
>> caching query plans.
>>
>> Prepared queries work by caching a plan for a query and reusing that plan
>> by changing the parameters for the incoming query. So the query remains
>> the
>> same, but input values keep changing.
>>
>> The problem with prepared queries is that query execution can go bad very
>> fast if the underlying data distribution changes and the cached plan is
>> no
>> longer optimal for the given statistics.
>>
>> On Sat, 31 Jul 2021, 12:54 Ivan Pavlukhin, <vo...@gmail.com> wrote:
>>
>> > Hi Courtney,
>> >
>> > Please clarify what do you mean by prepared queries and query caching?
>> > Do you mean caching query results? If so, in my mind material views
>> > are the best approach here (Ignite 2 does not support them). Do you
>> > have other good approaches in your mind? E.g. implemented in other
>> > databases.
>> >
>> > 2021-07-26 21:27 GMT+03:00, Valentin Kulichenko <
>> > valentin.kulichenko@gmail.com>:
>> > > Hi Courtney,
>> > >
>> > > Generally speaking, query caching certainly makes sense. As far as I
>> > know,
>> > > Ignite 2.x actually does that, but most likely there might be room
>> > > for
>> > > improvement as well. We will look into this.
>> > >
>> > > As for the SQL API - the answer is yes. The requirement for a dummy
>> cache
>> > > is an artifact of the current architecture. This is 100% wrong and
>> > > will
>> > be
>> > > changed in 3.0.
>> > >
>> > > -Val
>> > >
>> > > On Sun, Jul 25, 2021 at 2:51 PM Courtney Robinson
>> > > <co...@hypi.io>
>> > > wrote:
>> > >
>> > >> Something else came to mind, are there plans to support prepared
>> > queries?
>> > >>
>> > >> I recall someone saying before that Ignite does internally cache
>> queries
>> > >> but it's not at all clear if or how it does do that. I assume a
>> > >> simple
>> > >> hash
>> > >> of the query isn't enough.
>> > >>
>> > >> We generate SQL queries based on user runtime settings and they can
>> get
>> > >> to
>> > >> hundreds of lines long, I imagine this means most of our queries are
>> not
>> > >> being cached but there are patterns so we could generate and manage
>> > >> prepared queries ourselves.
>> > >>
>> > >> Also, will there be a dedicated API for doing SQL queries rather
>> > >> than
>> > >> having to pass a SqlFieldsQuery to a cache that has nothing to do
>> > >> with
>> > >> the
>> > >> cache being queried? When I first started with Ignite years ago,
>> > >> this
>> > was
>> > >> beyond confusing for me. I'm trying to run select x from B but I
>> > >> pass
>> > >> this
>> > >> to a cache called DUMMY or whatever arbitrary name...
>> > >>
>> > >> On Fri, Jul 23, 2021 at 4:05 PM Courtney Robinson <
>> > >> courtney.robinson@hypi.io>
>> > >> wrote:
>> > >>
>> > >> > Andrey,
>> > >> > Thanks for the response - see my comments inline.
>> > >> >
>> > >> >
>> > >> >> I've gone through the questions and have no the whole picture of
>> your
>> > >> use
>> > >> >> case.
>> > >> >
>> > >> > Would you please clarify how you exactly use the Ignite? what are
>> the
>> > >> >> integration points?
>> > >> >>
>> > >> >
>> > >> > I'll try to clarify - we have a low/no code platform. A user
>> designs a
>> > >> > model for their application and we map this model to Ignite tables
>> and
>> > >> > other data sources. The model I'll describe is what we're building
>> now
>> > >> and
>> > >> > expected to be in alpha some time in Q4 21. Our current production
>> > >> > architecture is different and isn't as generic, it is heavily tied
>> to
>> > >> > Ignite and we've redesigned to get some flexibility where Ignite
>> > >> > doesn't
>> > >> > provide what we want. Things like window functions and other
>> > >> > SQL-99
>> > >> limits.
>> > >> >
>> > >> > In the next gen version we're working on you can create a model
>> > >> > for
>> a
>> > >> > Tweet(content, to) and we will create an Ignite table with content
>> and
>> > >> > to
>> > >> > columns using the type the user selects. This is the simplest
>> > >> > case.
>> > >> > We are adding generic support for sources and sinks and using
>> Calcite
>> > >> > as
>> > >> a
>> > >> > data virtualisation layer. Ignite is one of the available
>> > source/sinks.
>> > >> >
>> > >> > When a user creates a model for Tweet, we also allow them to
>> > >> > specify
>> > >> > how
>> > >> > they want to index the data. We have a copy of the calcite
>> > >> > Elasticsearch
>> > >> > adapter modified for Solr.
>> > >> >
>> > >> > When a source is queried (Ignite or any other that we support), we
>> > >> > generate SQL that Calcite executes. Calcite will push down the
>> > >> > generated
>> > >> > queries to Solr and Solr produces a list of IDs (in case of
>> > >> > Ignite)
>> > and
>> > >> we
>> > >> > do a multi-get from Ignite to produce the actual results.
>> > >> >
>> > >> > Obviously there's a lot more to this but that should give you a
>> > general
>> > >> > idea.
>> > >> >
>> > >> > and maybe share some experience with using Ignite SPIs?
>> > >> >>
>> > >> > Our evolution with Ignite started from the key value + compute
>> > >> > APIs.
>> > We
>> > >> > used the SPIs then but have since moved to using only the Ignite
>> > >> > SQL
>> > >> > API
>> > >> > (we gave up transactions for this).
>> > >> >
>> > >> > We originally used the indexing SPI to keep our own lucene index
>> > >> > of
>> > >> > data
>> > >> > in a cache. We did not use the Ignite FTS as it is very limited
>> > >> > compared
>> > >> to
>> > >> > what we allow customers to do. If I remember correctly, we were
>> using
>> > >> > an
>> > >> > affinity compute job to send queries to the right Ignite node and
>> > >> > then doing a multi-get to pull the data from caches.
>> > >> > I think we used one or two other SPIs and we found them very
>> > >> > useful
>> to
>> > >> > be
>> > >> > able to extend and customise Ignite without having to fork/change
>> > >> upstream
>> > >> > classes. We only stopped using them because we eventually
>> > >> > concluded
>> > >> > that
>> > >> > using the SQL only API was better for numerous reasons.
>> > >> >
>> > >> >
>> > >> >> We'll keep the information in mind while developing the Ignite,
>> > >> >> because this may help us to make a better product.
>> > >> >>
>> > >> >> By the way, I'll try to answer the questions.
>> > >> >>
>> > >> >> >   1. Schema change - does that include the ability to change
>> > >> >> > the
>> > >> >> > types
>> > >> >> of
>> > >> >> >   fields/columns?
>> > >> >> Yes, we plan to support transparent conversion to a wider type
>> on-fly
>> > >> >> (e.g.
>> > >> >> 'int' to 'long').
>> > >> >> This is a major point of our Live-schema concept.
>> > >> >> In fact, there is no need to convert data on all the nodes in a
>> > >> >> synchronous
>> > >> >> way as old SQL databases do (if one supports though),
>> > >> >> we are going to support multiple schema versions and convert data
>> > >> >> on-demand
>> > >> >> on a per-row basis to the latest version,
>> > >> >> then write-back the row.
>> > >> >>
>> > >> >
>> > >> > I can understand. The auto conversion to wider type makes sense.
>> > >> >
>> > >> >>
>> > >> >> More complex things like 'String' -> 'int' are out of scope for
>> > >> >> now
>> > >> >> because
>> > >> >> it requires the execution of a user code on the critical path.
>> > >> >>
>> > >> >
>> > >> > I would argue though that executing user code on the critical path
>> > >> > shouldn't be a blocker for custom conversions. I feel if a user is
>> > >> > making
>> > >> > an advance enough integration to provide custom conversions they
>> would
>> > >> > be
>> > >> > aware that it impacts the system as a whole.
>> > >> >
>> > >> > The limitation here is column MUST NOT be indexed, because an
>> > >> > index
>> > >> > over
>> > >> >> the data of different kinds is impossible.
>> > >> >>
>> > >> >  Understood - I'd make the case that indexing should be pluggable.
>> > >> > I
>> > >> would
>> > >> > love for us to be able to take indexing away from Ignite in our
>> impl.
>> > -
>> > >> > I
>> > >> > think in Calcite, the Postgres adapter does this by having a table
>> > >> > whose
>> > >> > type is "Index". The implementor would be left with the freedom to
>> > >> > choose
>> > >> > how that table answers index lookups. From Ignite's perspective it
>> > >> wouldn't
>> > >> > care so long as the interface's contract is met, I could use an
>> index
>> > >> that
>> > >> > does a lucene, ES, Solr or Redis lookup and the end result would
>> > >> > be
>> > the
>> > >> > same but as the implementor I'm choosing the tradeoff I want to
>> > >> > meet
>> > >> > the
>> > >> > organisation's goals.
>> > >> >
>> > >> >
>> > >> >>
>> > >> >>  >  2. Will the new guaranteed consistency between APIs also mean
>> SQL
>> > >> will
>> > >> >>  >  gain transaction support?
>> > >> >> Yes, we plan to have Transactional SQL.
>> > >> >> DDL will be non-transactional though, and I wonder if the one
>> > supports
>> > >> >> this.
>> > >> >>
>> > >> > I'm not sure I know of any thing that supports transactional DDL
>> > >> > so
>> > >> > don't
>> > >> > think this is an issue but I would say that a DDL statement in a
>> > >> > transaction shouldn't fail the transaction. I believe in Ignite 2
>> > there
>> > >> is
>> > >> > a flag  to turn this on or off, we should definitely keep this. In
>> our
>> > >> > case, it's an issue with the nature of the platform we provide, at
>> > >> > development time only about 10% of schema or other DB info is
>> > >> > known
>> -
>> > >> > we
>> > >> > generate the other 90% on the fly based on whatever customers
>> > >> > decide
>> > to
>> > >> > design from our UI.
>> > >> >
>> > >> >>
>> > >> >> Ignite 3 will operate with Rows underneath, but classic Table API
>> and
>> > >> >> Key-value will be available to a user
>> > >> >> at the same time and with all consistency guarantees.
>> > >> >
>> > >> > Excellent!
>> > >> >
>> > >> >>
>> > >> >>
>> > >> >
>> > >> >> >  3. Has there been any decision about how much of Calcite will
>> > >> >> > be
>> > >> >> exposed
>> > >> >> >   to the client? When using thick clients, it'll be hugely
>> > >> >> > beneficial
>> > >> to
>> > >> >> be
>> > >> >> >   able to work with Calcite APIs directly to provide custom
>> > >> >> > rules
>> > >> >> > and
>> > >> >> >  optimizations to better suit organization needs
>> > >> >> As of now, we have no plans to expose any Calcite API to a user.
>> > >> >> AFAIK, we have our custom Calcite convention, custom rules that
>> > >> >> are
>> > >> aware
>> > >> >> of distributed environment,
>> > >> >> and additional AST nodes. The rules MUST correctly propagate
>> internal
>> > >> >> information about data distribution,
>> > >> >> so I'm not sure want to give low-level access to them.
>> > >> >>
>> > >> >
>> > >> > Maybe we're an edge case but for us access to the Calcite APIs
>> > >> > would
>> > be
>> > >> > shift our current development somewhat. For us, we're treating
>> Ignite
>> > >> > as
>> > >> a
>> > >> > library that provides a good foundation and we extend and
>> > >> > customise
>> > it.
>> > >> > Again, we may be an edge case and maybe most people just want a
>> > >> > database
>> > >> to
>> > >> > put data into and get it back out without controlling some of how
>> > >> > it
>> > >> > does
>> > >> > those things.
>> > >> >
>> > >> >
>> > >> >> > We Index into Solr and use the Solr indices
>> > >> >> Ignite 1-2 has poor support for TEXT queries, which is totally
>> > >> >> unconfigurable.
>> > >> >> Also, Lucene indices underneath are NOT persistent that requires
>> too
>> > >> much
>> > >> >> effort to fix it.
>> > >> >> GeoSpatial index has the same issues, we decided to drop them
>> > >> >> along
>> > >> >> with
>> > >> >> Indexing SPI at all.
>> > >> >>
>> > >> >> However, you can find the activity on dev-list on the Index Query
>> > >> >> topic.
>> > >> >> Guys are going to add IndexQuery (a scan query over the sorted
>> index
>> > >> which
>> > >> >> can use simple conditions) in Ignite 2.
>> > >> >> We also plan to have the same functionality, maybe it is possible
>> to
>> > >> >> add
>> > >> >> full-text search support here.
>> > >> >> Will it work for you, what do you think?
>> > >> >>
>> > >> > Yes, we originally looked at text queries and almost immediately
>> said
>> > >> > no.
>> > >> > Nothing about it was useful for us other than the lucene
>> > >> > dependency
>> in
>> > >> > Java. In the end that also became an issue because we wanted a
>> > >> > newer
>> > >> lucene
>> > >> > version.
>> > >> > IndexQuery will be useful - we'll certainly use it but it's not
>> > enough.
>> > >> > I
>> > >> > think we customise and depend on Solr too much for IndexQuery to
>> > >> > compare
>> > >> > but it will help in some cases for simpler queries.
>> > >> >
>> > >> >>
>> > >> >>
>> > >> >> >    4. Will the unified storage model enable different versions
>> > >> >> > of
>> > >> Ignite
>> > >> >> to
>> > >> >> >   be in the cluster when persistence is enabled so that rolling
>> > >> restarts
>> > >> >> can
>> > >> >> >   be done?
>> > >> >> I'm not sure a rolling upgrade (RU) will be available because too
>> > much
>> > >> >> compatibility issues should be resolved
>> > >> >> to make RU possible under the load without downtime.
>> > >> >>
>> > >> >> Maybe it makes sense to provide some grid mode (maintenance mode)
>> for
>> > >> >> RU
>> > >> >> purposes that will block all the user load
>> > >> >> but allow upgrade the grid. E.g. for the pure in-memory case.
>> > >> >>
>> > >> >> Persistence compatibility should be preserved as it works for
>> Ignite
>> > >> >> 2.
>> > >> >>
>> > >> > My ideal situation would be that we start a newer Ignite version,
>> > >> > it
>> > >> comes
>> > >> > online, joins the cluster and is treated as some kind of
>> > >> > maintenance
>> > >> > mode
>> > >> > as you suggested. In maintenance mode, the other nodes re-balance
>> > >> > or
>> > >> > some
>> > >> > other process to send all the data this new node will handle over
>> > >> > to
>> > >> > it.
>> > >> > The existing nodes continue serving this data until the new node
>> > >> > is
>> no
>> > >> > longer in maintenance mode and then it becomes the primary for the
>> > data
>> > >> > that was rebalanced to it.
>> > >> >
>> > >> > The second case is if an existing node is restarted with a newer
>> > Ignite
>> > >> > version. No re-balance is needed, it joins in maintenance mode,
>> > >> > runs
>> > >> > any
>> > >> > upgrade/conversion or other task it needs to and then starts
>> accepting
>> > >> > reads and writes. Communication with lower version nodes can be
>> > >> > limited,
>> > >> > they are aware of it and sends it data and queries for which it is
>> the
>> > >> > primary assuming they will also be upgraded.
>> > >> >
>> > >> > I guess I'm not aware of the compatibility issues this presents
>> > >> > and
>> so
>> > >> > my
>> > >> > view is narrow and perhaps naive here.
>> > >> >
>> > >> >>
>> > >> >>
>> > >> >> >    5. Will it be possible to provide a custom cache store still
>> and
>> > >> will
>> > >> >> >   these changes enable custom cache stores to be queryable from
>> > SQL?
>> > >> >> I'm not sure I fully understand this.
>> > >> >> 1. Usually, SQL is about indices. Ignite can't perform a query
>> > >> >> over
>> > >> >> the
>> > >> >> unindexed data.
>> > >> >>
>> > >> > Yes understood
>> > >> >
>> > >> >>
>> > >> >> 2. Fullscan over the cache that contains only part of data + scan
>> the
>> > >> >> CacheStore, then merging the results is a pain.
>> > >> >> Most likely, running a query over CacheStore directly will be a
>> > >> >> simpler
>> > >> >> way, and even more performant.
>> > >> >> Shared CacheStore (same for all nodes) will definitely kill the
>> > >> >> performance
>> > >> >> in that case.
>> > >> >> So, the preliminary loadCache() call looks like a good
>> > >> >> compromise.
>> > >> >>
>> > >> > I think the problem is largely that the CacheStore interface is
>> > >> > not
>> > >> > sufficient for being able to do this. If it had a richer interface
>> > >> > which
>> > >> > allowed the cache store to answer index queries basically hooking
>> into
>> > >> > whatever Ignite's doing for its B+tree then this would be viable.
>> > >> > A
>> > >> > CacheStore that only implements KV API doesn't take part in SQL
>> > >> > queries.
>> > >> >
>> > >> >>
>> > >> >> 3. Splitting query into 2 parts to run on Ignite and to run on
>> > >> CacheStore
>> > >> >> looks possible with Calcite,
>> > >> >> but I think it impractical because in general, neither CacheStore
>> nor
>> > >> >> database structure are aware of the data partitioning.
>> > >> >>
>> > >> > hmmm, maybe I missed the point but as the implementor of the
>> > CacheStore
>> > >> > you should have knowledge of the structure and partition info. or
>> have
>> > >> some
>> > >> > way of retrieving it. Again, I think the current CacheStore
>> interface
>> > >> > is
>> > >> > the problem and if it was extended to provide this information
>> > >> > then
>> > its
>> > >> up
>> > >> > to the implementation to do this whilst Ignite knows that any
>> > >> > implementation of these interfaces will meet the contract
>> > >> > necessary.
>> > >> >
>> > >> >
>> > >> >>
>> > >> >> 4. Transactions can't be supported in case of direct CacheStore
>> > >> >> access,
>> > >> >> because even if the underlying database supports 2-phase commit,
>> > which
>> > >> is
>> > >> >> a
>> > >> >> rare case, the recovery protocol looks hard.
>> > >> >> Just looks like this feature doesn't worth it.
>> > >> >>
>> > >> > I'd completely agree with this. It will be incredibly hard to get
>> this
>> > >> > done reliably
>> > >> >
>> > >> >>
>> > >> >>
>> > >> >> >   6. This question wasn't mine but I was going to ask it as
>> > >> >> > well:
>> > >> >> > What
>> > >> >> >   will happen to the Indexing API since H2 is being removed?
>> > >> >> As I wrote above, Indexing SPI will be dropped, but IndexQuery
>> > >> >> will
>> > be
>> > >> >> added.
>> > >> >>
>> > >> >> >  1. As I mentioned above, we Index into Solr, in earlier
>> > >> >> > versions
>> > of
>> > >> >> >      our product we used the indexing SPI to index into Lucene
>> > >> >> > on
>> > >> >> > the
>> > >> >> Ignite
>> > >> >> >      nodes but this presented so many challenges we ultimately
>> > >> abandoned
>> > >> >> it and
>> > >> >> >      replaced it with the current Solr solution.
>> > >> >> AFAIK, some guys developed and sell a plugin for Ignite-2 with
>> > >> persistent
>> > >> >> Lucene and Geo indices.
>> > >> >> I don't know about the capabilities and limitations of their
>> > solution,
>> > >> >> because of closed code.
>> > >> >> You can easily google it.
>> > >> >>
>> > >> >> I saw few encouraged guys who want to improve TEXT queries,
>> > >> >> but unfortunately, things weren't moved far enough. For now, they
>> are
>> > >> >> in
>> > >> >> the middle of fixing the merging TEXT query results.
>> > >> >> So far so good.
>> > >> >>
>> > >> >> I think it is a good chance to master the skill developing of a
>> > >> >> distributed
>> > >> >> system for the one
>> > >> >> who will take a lead over the full-text search feature and add
>> native
>> > >> >> FullText index support into Ignite-3.
>> > >> >>
>> > >> > I've seen the other thread from Atri I believe about this.
>> > >> >
>> > >> >>
>> > >> >>
>> > >> >> >   7. What impact does RAFT now have on conflict resolution?
>> > >> >> RAFT is a state machine replication protocol. It guarantees all
>> > >> >> the
>> > >> nodes
>> > >> >> will see the updates in the same order.
>> > >> >> So, seems no conflicts are possible. Recovery from split-brain is
>> > >> >> impossible in common-case.
>> > >> >>
>> > >> >> However, I think we have a conflict resolver analog in Ignite-3
>> > >> >> as
>> it
>> > >> >> is
>> > >> >> very useful in some cases
>> > >> >> e.g datacenter replication, incremental data load from 3-rd party
>> > >> source,
>> > >> >> recovery from 3-rd party source.
>> > >> >>
>> > >> >>
>> > >> >> > 8. CacheGroups.
>> > >> >> AFAIK, CacheGroup will be eliminated, actually, we'll keep this
>> > >> mechanic,
>> > >> >> but it will be configured in a different way,
>> > >> >> which makes Ignite configuring a bit simpler.
>> > >> >> Sorry, for now, I have no answer on your performance concerns,
>> > >> >> this
>> > >> >> part
>> > >> >> of
>> > >> >> Ignite-3 slipped from my radar.
>> > >> >>
>> > >> > No worries. I'll wait and see if anyone else suggests something.
>> > >> > Its
>> > >> > getting a lot worse, a node took 1hr to start yesterday after a
>> > >> deployment
>> > >> > and its in prod with very little visibility into what it is doing,
>> it
>> > >> > was
>> > >> > just stopped, no logging or anything and then resumed.
>> > >> >
>> > >> > 2021-07-22 13:40:15.997  INFO [ArcOS,,,] 9 --- [orker-#40%hypi%]
>> > >> > o.a.i.i.p.cache.GridCacheProcessor      [285] :  Finished recovery
>> for
>> > >> > cache [cache=hypi_01F8ZC3DGT66RNYCDZH3XNVY2E_Hue, grp=hypi,
>> > >> > startVer=AffinityTopologyVersion [topVer=79, minorTopVer=0]]
>> > >> >
>> > >> > One hour later it printed the next cache recovery message and
>> started
>> > >> > 30
>> > >> > seconds after going through other tables.
>> > >> >
>> > >> >
>> > >> >
>> > >> >>
>> > >> >> Let's wait if someone will clarify what we could expect in
>> Ignite-3.
>> > >> >> Guys, can someone chime in and give more light on 3,4,7,8
>> questions?
>> > >> >>
>> > >> >>
>> > >> >> On Thu, Jul 22, 2021 at 4:15 AM Courtney Robinson <
>> > >> >> courtney.robinson@hypi.io>
>> > >> >> wrote:
>> > >> >>
>> > >> >> > Hey everyone,
>> > >> >> > I attended the Alpha 2 update yesterday and was quite pleased
>> > >> >> > to
>> > see
>> > >> the
>> > >> >> > progress on things so far. So first, congratulations to
>> > >> >> > everyone
>> on
>> > >> the
>> > >> >> > work being put in and thank you to Val and Kseniya for running
>> > >> >> yesterday's
>> > >> >> > event.
>> > >> >> >
>> > >> >> > I asked a few questions after the webinar which Val had some
>> > answers
>> > >> to
>> > >> >> but
>> > >> >> > suggested posting here as some of them are not things that have
>> > been
>> > >> >> > thought about yet or no plans exist around it at this point.
>> > >> >> >
>> > >> >> > I'll put all of them here and if necessary we can break into
>> > >> >> > different
>> > >> >> > threads after.
>> > >> >> >
>> > >> >> >    1. Schema change - does that include the ability to change
>> > >> >> > the
>> > >> types
>> > >> >> of
>> > >> >> >    fields/columns?
>> > >> >> >       1. Val's answer was yes with some limitations but those
>> > >> >> > are
>> > >> >> > not
>> > >> >> well
>> > >> >> >       defined yet. He did mention that something like some kind
>> of
>> > >> >> > transformer
>> > >> >> >       could be provided for doing the conversion and I would
>> second
>> > >> >> this,
>> > >> >> > even
>> > >> >> >       for common types like int to long being able to do a
>> > >> >> > custom
>> > >> >> > conversion will
>> > >> >> >       be immensely valuable.
>> > >> >> >    2. Will the new guaranteed consistency between APIs also
>> > >> >> > mean
>> > SQL
>> > >> >> will
>> > >> >> >    gain transaction support?
>> > >> >> >       1. I believe the answer here was yes but perhaps someone
>> else
>> > >> may
>> > >> >> >       want to weigh in to confirm
>> > >> >> >    3. Has there been any decision about how much of Calcite
>> > >> >> > will
>> be
>> > >> >> exposed
>> > >> >> >    to the client? When using thick clients, it'll be hugely
>> > >> >> > beneficial
>> > >> >> to
>> > >> >> > be
>> > >> >> >    able to work with Calcite APIs directly to provide custom
>> rules
>> > >> >> > and
>> > >> >> >    optimisations to better suit organisation needs
>> > >> >> >    1. We currently use Calcite ourselves and have a lot of
>> > >> >> > custom
>> > >> rules
>> > >> >> and
>> > >> >> >       optimisations and have slowly pushed more of our queries
>> > >> >> > to
>> > >> >> > Calcite that we
>> > >> >> >       then push down to Ignite.
>> > >> >> >       2. We Index into Solr and use the Solr indices and others
>> to
>> > >> >> >       fulfill over all queries with Ignite just being one of
>> > >> >> > the
>> > >> >> > possible storage
>> > >> >> >       targets Calcite pushes down to. If we could get to the
>> > calcite
>> > >> >> > API from an
>> > >> >> >       Ignite thick client, it would enable us to remove a layer
>> of
>> > >> >> > abstraction
>> > >> >> >       and complexity and make Ignite our primary that we then
>> link
>> > >> >> > with Solr and
>> > >> >> >       others to fulfill queries.
>> > >> >> >    4. Will the unified storage model enable different versions
>> > >> >> > of
>> > >> >> Ignite to
>> > >> >> >    be in the cluster when persistence is enabled so that
>> > >> >> > rolling
>> > >> >> restarts
>> > >> >> > can
>> > >> >> >    be done?
>> > >> >> >    1. We have to do a strange dance to perform Ignite upgrades
>> > >> >> > without
>> > >> >> >       downtime because pods/nodes will fail to start on version
>> > >> mismatch
>> > >> >> > and if
>> > >> >> >       we get that dance wrong, we will corrupt a node's data.
>> > >> >> > It
>> > >> >> > will
>> > >> >> make
>> > >> >> >       admin/upgrades far less brittle and error prone if this
>> > >> >> > was
>> > >> >> possible.
>> > >> >> >    5. Will it be possible to provide a custom cache store still
>> and
>> > >> will
>> > >> >> >    these changes enable custom cache stores to be queryable
>> > >> >> > from
>> > >> >> > SQL?
>> > >> >> >    1. Our Ignite usage is wide and complex because we use KV,
>> > >> >> > SQL
>> > >> >> > and
>> > >> >> other
>> > >> >> >       APIs. The inconsistency of what can and can't be used
>> > >> >> > from
>> > one
>> > >> >> API to
>> > >> >> >       another is a real challenge and has forced us over time
>> > >> >> > to
>> > >> >> > stick
>> > >> >> > to one API
>> > >> >> >       and write alternative solutions outside of Ignite. It
>> > >> >> > will
>> > >> >> > drastically
>> > >> >> >       simplify things if any CacheStore (or some new
>> > >> >> > equivalent)
>> > >> >> > could
>> > >> >> > be plugged
>> > >> >> >       in and be made accessible to SQL (and in fact all other
>> APIs)
>> > >> >> without
>> > >> >> >       having to load all the data from the underlying
>> > >> >> > CacheStore
>> > >> >> > first
>> > >> >> > into memory
>> > >> >> >    6. This question wasn't mine but I was going to ask it as
>> well:
>> > >> What
>> > >> >> >    will happen to the Indexing API since H2 is being removed?
>> > >> >> >       1. As I mentioned above, we Index into Solr, in earlier
>> > >> >> > versions
>> > >> >> of
>> > >> >> >       our product we used the indexing SPI to index into Lucene
>> on
>> > >> >> > the
>> > >> >> > Ignite
>> > >> >> >       nodes but this presented so many challenges we ultimately
>> > >> >> > abandoned it and
>> > >> >> >       replaced it with the current Solr solution.
>> > >> >> >       2. Lucene indexing was ideal because it meant we didn't
>> have
>> > >> >> > to
>> > >> >> >       re-invent Solr or Elasticsearch's sharding capabilities,
>> that
>> > >> was
>> > >> >> > almost
>> > >> >> >       automatic with Ignite only giving you the data that was
>> meant
>> > >> for
>> > >> >> the
>> > >> >> >       current node.
>> > >> >> >       3. The Lucene API enabled more flexibility and removed a
>> > >> >> > network
>> > >> >> >       round trip from our queries.
>> > >> >> >       4. Given Calcite's ability to support custom SQL
>> > >> >> > functions,
>> > >> >> > I'd
>> > >> >> love
>> > >> >> >       to have the ability to define custom functions that
>> > >> >> > Lucene
>> > was
>> > >> >> > answering
>> > >> >> >    7. What impact does RAFT now have on conflict resolution,
>> > >> >> > off
>> > the
>> > >> >> top of
>> > >> >> >    my head there are two cases
>> > >> >> >       1. On startup after a split brain Ignite currently takes
>> > >> >> > an
>> > >> >> "exercise
>> > >> >> >       for the reader" approach and dumps a log along the lines
>> > >> >> > of
>> > >> >> >
>> > >> >> > >    1. BaselineTopology of joining node is not compatible with
>> > >> >> > >       BaselineTopology in the cluster.
>> > >> >> > >    1. Branching history of cluster BlT doesn't contain
>> branching
>> > >> point
>> > >> >> > >       hash of joining node BlT. Consider cleaning persistent
>> > >> >> > > storage
>> > >> >> of
>> > >> >> > the node
>> > >> >> > >       and adding it to the cluster again.
>> > >> >> > >
>> > >> >> >    1. This leaves you with no choice except to take one half
>> > >> >> > and
>> > >> >> manually
>> > >> >> >       copy, write data back over to the other half then destroy
>> the
>> > >> bad
>> > >> >> > one.
>> > >> >> >       2. The second case is conflicts on keys, I
>> > >> >> >       beleive CacheVersionConflictResolver and manager are used
>> > >> >> >       by GridCacheMapEntry which just says if use old value do
>> this
>> > >> >> > otherwise use
>> > >> >> >       newVal. Ideally this will be exposed in the new API so
>> > >> >> > that
>> > >> >> > one
>> > >> >> can
>> > >> >> >       override this behaviour. The last writer wins approach
>> isn't
>> > >> >> always
>> > >> >> > ideal
>> > >> >> >       and the semantics of the domain can mean that what is
>> > consider
>> > >> >> > "correct" in
>> > >> >> >       a conflict is not so for a different domain.
>> > >> >> >    8. This is last on the list but is actually the most
>> > >> >> > important
>> > >> >> > for
>> > >> us
>> > >> >> >    right now as it is an impending and growing risk. We allow
>> > >> customers
>> > >> >> to
>> > >> >> >    create their own tables on demand. We're already using the
>> same
>> > >> cache
>> > >> >> > group
>> > >> >> >    etc for data structures to be re-used but now that we're
>> getting
>> > >> >> > to
>> > >> >> >    thousands of tables/caches our startup times are sometimes
>> > >> >> unpredictably
>> > >> >> >    long - at present it seems to depend on the state of the
>> > >> cache/table
>> > >> >> > before
>> > >> >> >    the restart but we're into the order of 5 - 7 mins and
>> steadily
>> > >> >> > increasing
>> > >> >> >    with the growth of tables. Are there any provisions in
>> > >> >> > Ignite
>> 3
>> > >> >> > for
>> > >> >> >    ensuring startup time isn't proportional to the number of
>> > >> >> tables/caches
>> > >> >> >    available?
>> > >> >> >
>> > >> >> >
>> > >> >> > Those are the key things I can think of at the moment. Val and
>> > >> >> > others
>> > >> >> I'd
>> > >> >> > love to open a conversation around these.
>> > >> >> >
>> > >> >> > Regards,
>> > >> >> > Courtney Robinson
>> > >> >> > Founder and CEO, Hypi
>> > >> >> > Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
>> > >> >> >
>> > >> >> > <https://hypi.io>
>> > >> >> > https://hypi.io
>> > >> >> >
>> > >> >>
>> > >> >>
>> > >> >> --
>> > >> >> Best regards,
>> > >> >> Andrey V. Mashenkov
>> > >> >>
>> > >> >
>> > >>
>> > >
>> >
>> >
>> > --
>> >
>> > Best regards,
>> > Ivan Pavlukhin
>> >
>>
>


-- 

Best regards,
Ivan Pavlukhin

Re: Apache Ignite 3 Alpha 2 webinar follow up questions

Posted by Kseniya Romanova <ro...@gmail.com>.
On September 2 Courtney will tell more about Hypi use case, join us at
Virtual Meetup:
https://www.meetup.com/Apache-Ignite-Virtual-Meetup/events/280030600/

Below I quote Courtney's email to the user list:

The talk will look at the general architecture, playing nice with other
> technologies and the future reactive-streams architecture Hypi's team
> started work on with a view for being in production in 2022.

We'll look at why we're transitioning to reactive streams and how Ignite is
> being used to accelerate other technologies in the stack. Just like Ignite
> 3, the next-gen of Hypi itself is using Apache Calcite as a data
> virtualisation layer. The talk will explore a little of how this is being
> done and finally will discuss some challenges and pitfalls to avoid.


вт, 3 авг. 2021 г. в 12:38, Ivan Pavlukhin <vo...@gmail.com>:

> Hi Courtney,
>
> Statistics for query planning is a rather complex subject.
> Unfortunately I am now aware of technical details how is it expected
> to be implemented in Ignite 3.
>
> Folks driving SQL please step in.
>
> 2021-07-31 20:18 GMT+03:00, Courtney Robinson <co...@hypi.io>:
> > Hi Ivan,
> > Atri's description of the query plan being cached is what I was thinking
> of
> > with my description.
> >
> > I lack the knowledge on how the statistics are maintained to really
> comment
> > constructively Atri but my first question about the problem you raise
> with
> > statistics would be:
> >
> > How/where are the stats maintained and if a query plan is cached based on
> > some stats, is it not possible to invalidate the cached plan periodically
> > or based on statistics changes?
> >
> > Regards,
> > Courtney Robinson
> > Founder and CEO, Hypi
> > Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
> >
> > <https://hypi.io>
> > https://hypi.io
> >
> >
> > On Sat, Jul 31, 2021 at 8:54 AM Atri Sharma <at...@apache.org> wrote:
> >
> >> Query caching works on three levels - caching results, caching blocks
> and
> >> caching query plans.
> >>
> >> Prepared queries work by caching a plan for a query and reusing that
> plan
> >> by changing the parameters for the incoming query. So the query remains
> >> the
> >> same, but input values keep changing.
> >>
> >> The problem with prepared queries is that query execution can go bad
> very
> >> fast if the underlying data distribution changes and the cached plan is
> >> no
> >> longer optimal for the given statistics.
> >>
> >> On Sat, 31 Jul 2021, 12:54 Ivan Pavlukhin, <vo...@gmail.com> wrote:
> >>
> >> > Hi Courtney,
> >> >
> >> > Please clarify what do you mean by prepared queries and query caching?
> >> > Do you mean caching query results? If so, in my mind material views
> >> > are the best approach here (Ignite 2 does not support them). Do you
> >> > have other good approaches in your mind? E.g. implemented in other
> >> > databases.
> >> >
> >> > 2021-07-26 21:27 GMT+03:00, Valentin Kulichenko <
> >> > valentin.kulichenko@gmail.com>:
> >> > > Hi Courtney,
> >> > >
> >> > > Generally speaking, query caching certainly makes sense. As far as I
> >> > know,
> >> > > Ignite 2.x actually does that, but most likely there might be room
> >> > > for
> >> > > improvement as well. We will look into this.
> >> > >
> >> > > As for the SQL API - the answer is yes. The requirement for a dummy
> >> cache
> >> > > is an artifact of the current architecture. This is 100% wrong and
> >> > > will
> >> > be
> >> > > changed in 3.0.
> >> > >
> >> > > -Val
> >> > >
> >> > > On Sun, Jul 25, 2021 at 2:51 PM Courtney Robinson
> >> > > <co...@hypi.io>
> >> > > wrote:
> >> > >
> >> > >> Something else came to mind, are there plans to support prepared
> >> > queries?
> >> > >>
> >> > >> I recall someone saying before that Ignite does internally cache
> >> queries
> >> > >> but it's not at all clear if or how it does do that. I assume a
> >> > >> simple
> >> > >> hash
> >> > >> of the query isn't enough.
> >> > >>
> >> > >> We generate SQL queries based on user runtime settings and they can
> >> get
> >> > >> to
> >> > >> hundreds of lines long, I imagine this means most of our queries
> are
> >> not
> >> > >> being cached but there are patterns so we could generate and manage
> >> > >> prepared queries ourselves.
> >> > >>
> >> > >> Also, will there be a dedicated API for doing SQL queries rather
> >> > >> than
> >> > >> having to pass a SqlFieldsQuery to a cache that has nothing to do
> >> > >> with
> >> > >> the
> >> > >> cache being queried? When I first started with Ignite years ago,
> >> > >> this
> >> > was
> >> > >> beyond confusing for me. I'm trying to run select x from B but I
> >> > >> pass
> >> > >> this
> >> > >> to a cache called DUMMY or whatever arbitrary name...
> >> > >>
> >> > >> On Fri, Jul 23, 2021 at 4:05 PM Courtney Robinson <
> >> > >> courtney.robinson@hypi.io>
> >> > >> wrote:
> >> > >>
> >> > >> > Andrey,
> >> > >> > Thanks for the response - see my comments inline.
> >> > >> >
> >> > >> >
> >> > >> >> I've gone through the questions and have no the whole picture of
> >> your
> >> > >> use
> >> > >> >> case.
> >> > >> >
> >> > >> > Would you please clarify how you exactly use the Ignite? what are
> >> the
> >> > >> >> integration points?
> >> > >> >>
> >> > >> >
> >> > >> > I'll try to clarify - we have a low/no code platform. A user
> >> designs a
> >> > >> > model for their application and we map this model to Ignite
> tables
> >> and
> >> > >> > other data sources. The model I'll describe is what we're
> building
> >> now
> >> > >> and
> >> > >> > expected to be in alpha some time in Q4 21. Our current
> production
> >> > >> > architecture is different and isn't as generic, it is heavily
> tied
> >> to
> >> > >> > Ignite and we've redesigned to get some flexibility where Ignite
> >> > >> > doesn't
> >> > >> > provide what we want. Things like window functions and other
> >> > >> > SQL-99
> >> > >> limits.
> >> > >> >
> >> > >> > In the next gen version we're working on you can create a model
> >> > >> > for
> >> a
> >> > >> > Tweet(content, to) and we will create an Ignite table with
> content
> >> and
> >> > >> > to
> >> > >> > columns using the type the user selects. This is the simplest
> >> > >> > case.
> >> > >> > We are adding generic support for sources and sinks and using
> >> Calcite
> >> > >> > as
> >> > >> a
> >> > >> > data virtualisation layer. Ignite is one of the available
> >> > source/sinks.
> >> > >> >
> >> > >> > When a user creates a model for Tweet, we also allow them to
> >> > >> > specify
> >> > >> > how
> >> > >> > they want to index the data. We have a copy of the calcite
> >> > >> > Elasticsearch
> >> > >> > adapter modified for Solr.
> >> > >> >
> >> > >> > When a source is queried (Ignite or any other that we support),
> we
> >> > >> > generate SQL that Calcite executes. Calcite will push down the
> >> > >> > generated
> >> > >> > queries to Solr and Solr produces a list of IDs (in case of
> >> > >> > Ignite)
> >> > and
> >> > >> we
> >> > >> > do a multi-get from Ignite to produce the actual results.
> >> > >> >
> >> > >> > Obviously there's a lot more to this but that should give you a
> >> > general
> >> > >> > idea.
> >> > >> >
> >> > >> > and maybe share some experience with using Ignite SPIs?
> >> > >> >>
> >> > >> > Our evolution with Ignite started from the key value + compute
> >> > >> > APIs.
> >> > We
> >> > >> > used the SPIs then but have since moved to using only the Ignite
> >> > >> > SQL
> >> > >> > API
> >> > >> > (we gave up transactions for this).
> >> > >> >
> >> > >> > We originally used the indexing SPI to keep our own lucene index
> >> > >> > of
> >> > >> > data
> >> > >> > in a cache. We did not use the Ignite FTS as it is very limited
> >> > >> > compared
> >> > >> to
> >> > >> > what we allow customers to do. If I remember correctly, we were
> >> using
> >> > >> > an
> >> > >> > affinity compute job to send queries to the right Ignite node and
> >> > >> > then doing a multi-get to pull the data from caches.
> >> > >> > I think we used one or two other SPIs and we found them very
> >> > >> > useful
> >> to
> >> > >> > be
> >> > >> > able to extend and customise Ignite without having to fork/change
> >> > >> upstream
> >> > >> > classes. We only stopped using them because we eventually
> >> > >> > concluded
> >> > >> > that
> >> > >> > using the SQL only API was better for numerous reasons.
> >> > >> >
> >> > >> >
> >> > >> >> We'll keep the information in mind while developing the Ignite,
> >> > >> >> because this may help us to make a better product.
> >> > >> >>
> >> > >> >> By the way, I'll try to answer the questions.
> >> > >> >>
> >> > >> >> >   1. Schema change - does that include the ability to change
> >> > >> >> > the
> >> > >> >> > types
> >> > >> >> of
> >> > >> >> >   fields/columns?
> >> > >> >> Yes, we plan to support transparent conversion to a wider type
> >> on-fly
> >> > >> >> (e.g.
> >> > >> >> 'int' to 'long').
> >> > >> >> This is a major point of our Live-schema concept.
> >> > >> >> In fact, there is no need to convert data on all the nodes in a
> >> > >> >> synchronous
> >> > >> >> way as old SQL databases do (if one supports though),
> >> > >> >> we are going to support multiple schema versions and convert
> data
> >> > >> >> on-demand
> >> > >> >> on a per-row basis to the latest version,
> >> > >> >> then write-back the row.
> >> > >> >>
> >> > >> >
> >> > >> > I can understand. The auto conversion to wider type makes sense.
> >> > >> >
> >> > >> >>
> >> > >> >> More complex things like 'String' -> 'int' are out of scope for
> >> > >> >> now
> >> > >> >> because
> >> > >> >> it requires the execution of a user code on the critical path.
> >> > >> >>
> >> > >> >
> >> > >> > I would argue though that executing user code on the critical
> path
> >> > >> > shouldn't be a blocker for custom conversions. I feel if a user
> is
> >> > >> > making
> >> > >> > an advance enough integration to provide custom conversions they
> >> would
> >> > >> > be
> >> > >> > aware that it impacts the system as a whole.
> >> > >> >
> >> > >> > The limitation here is column MUST NOT be indexed, because an
> >> > >> > index
> >> > >> > over
> >> > >> >> the data of different kinds is impossible.
> >> > >> >>
> >> > >> >  Understood - I'd make the case that indexing should be
> pluggable.
> >> > >> > I
> >> > >> would
> >> > >> > love for us to be able to take indexing away from Ignite in our
> >> impl.
> >> > -
> >> > >> > I
> >> > >> > think in Calcite, the Postgres adapter does this by having a
> table
> >> > >> > whose
> >> > >> > type is "Index". The implementor would be left with the freedom
> to
> >> > >> > choose
> >> > >> > how that table answers index lookups. From Ignite's perspective
> it
> >> > >> wouldn't
> >> > >> > care so long as the interface's contract is met, I could use an
> >> index
> >> > >> that
> >> > >> > does a lucene, ES, Solr or Redis lookup and the end result would
> >> > >> > be
> >> > the
> >> > >> > same but as the implementor I'm choosing the tradeoff I want to
> >> > >> > meet
> >> > >> > the
> >> > >> > organisation's goals.
> >> > >> >
> >> > >> >
> >> > >> >>
> >> > >> >>  >  2. Will the new guaranteed consistency between APIs also
> mean
> >> SQL
> >> > >> will
> >> > >> >>  >  gain transaction support?
> >> > >> >> Yes, we plan to have Transactional SQL.
> >> > >> >> DDL will be non-transactional though, and I wonder if the one
> >> > supports
> >> > >> >> this.
> >> > >> >>
> >> > >> > I'm not sure I know of any thing that supports transactional DDL
> >> > >> > so
> >> > >> > don't
> >> > >> > think this is an issue but I would say that a DDL statement in a
> >> > >> > transaction shouldn't fail the transaction. I believe in Ignite 2
> >> > there
> >> > >> is
> >> > >> > a flag  to turn this on or off, we should definitely keep this.
> In
> >> our
> >> > >> > case, it's an issue with the nature of the platform we provide,
> at
> >> > >> > development time only about 10% of schema or other DB info is
> >> > >> > known
> >> -
> >> > >> > we
> >> > >> > generate the other 90% on the fly based on whatever customers
> >> > >> > decide
> >> > to
> >> > >> > design from our UI.
> >> > >> >
> >> > >> >>
> >> > >> >> Ignite 3 will operate with Rows underneath, but classic Table
> API
> >> and
> >> > >> >> Key-value will be available to a user
> >> > >> >> at the same time and with all consistency guarantees.
> >> > >> >
> >> > >> > Excellent!
> >> > >> >
> >> > >> >>
> >> > >> >>
> >> > >> >
> >> > >> >> >  3. Has there been any decision about how much of Calcite will
> >> > >> >> > be
> >> > >> >> exposed
> >> > >> >> >   to the client? When using thick clients, it'll be hugely
> >> > >> >> > beneficial
> >> > >> to
> >> > >> >> be
> >> > >> >> >   able to work with Calcite APIs directly to provide custom
> >> > >> >> > rules
> >> > >> >> > and
> >> > >> >> >  optimizations to better suit organization needs
> >> > >> >> As of now, we have no plans to expose any Calcite API to a user.
> >> > >> >> AFAIK, we have our custom Calcite convention, custom rules that
> >> > >> >> are
> >> > >> aware
> >> > >> >> of distributed environment,
> >> > >> >> and additional AST nodes. The rules MUST correctly propagate
> >> internal
> >> > >> >> information about data distribution,
> >> > >> >> so I'm not sure want to give low-level access to them.
> >> > >> >>
> >> > >> >
> >> > >> > Maybe we're an edge case but for us access to the Calcite APIs
> >> > >> > would
> >> > be
> >> > >> > shift our current development somewhat. For us, we're treating
> >> Ignite
> >> > >> > as
> >> > >> a
> >> > >> > library that provides a good foundation and we extend and
> >> > >> > customise
> >> > it.
> >> > >> > Again, we may be an edge case and maybe most people just want a
> >> > >> > database
> >> > >> to
> >> > >> > put data into and get it back out without controlling some of how
> >> > >> > it
> >> > >> > does
> >> > >> > those things.
> >> > >> >
> >> > >> >
> >> > >> >> > We Index into Solr and use the Solr indices
> >> > >> >> Ignite 1-2 has poor support for TEXT queries, which is totally
> >> > >> >> unconfigurable.
> >> > >> >> Also, Lucene indices underneath are NOT persistent that requires
> >> too
> >> > >> much
> >> > >> >> effort to fix it.
> >> > >> >> GeoSpatial index has the same issues, we decided to drop them
> >> > >> >> along
> >> > >> >> with
> >> > >> >> Indexing SPI at all.
> >> > >> >>
> >> > >> >> However, you can find the activity on dev-list on the Index
> Query
> >> > >> >> topic.
> >> > >> >> Guys are going to add IndexQuery (a scan query over the sorted
> >> index
> >> > >> which
> >> > >> >> can use simple conditions) in Ignite 2.
> >> > >> >> We also plan to have the same functionality, maybe it is
> possible
> >> to
> >> > >> >> add
> >> > >> >> full-text search support here.
> >> > >> >> Will it work for you, what do you think?
> >> > >> >>
> >> > >> > Yes, we originally looked at text queries and almost immediately
> >> said
> >> > >> > no.
> >> > >> > Nothing about it was useful for us other than the lucene
> >> > >> > dependency
> >> in
> >> > >> > Java. In the end that also became an issue because we wanted a
> >> > >> > newer
> >> > >> lucene
> >> > >> > version.
> >> > >> > IndexQuery will be useful - we'll certainly use it but it's not
> >> > enough.
> >> > >> > I
> >> > >> > think we customise and depend on Solr too much for IndexQuery to
> >> > >> > compare
> >> > >> > but it will help in some cases for simpler queries.
> >> > >> >
> >> > >> >>
> >> > >> >>
> >> > >> >> >    4. Will the unified storage model enable different versions
> >> > >> >> > of
> >> > >> Ignite
> >> > >> >> to
> >> > >> >> >   be in the cluster when persistence is enabled so that
> rolling
> >> > >> restarts
> >> > >> >> can
> >> > >> >> >   be done?
> >> > >> >> I'm not sure a rolling upgrade (RU) will be available because
> too
> >> > much
> >> > >> >> compatibility issues should be resolved
> >> > >> >> to make RU possible under the load without downtime.
> >> > >> >>
> >> > >> >> Maybe it makes sense to provide some grid mode (maintenance
> mode)
> >> for
> >> > >> >> RU
> >> > >> >> purposes that will block all the user load
> >> > >> >> but allow upgrade the grid. E.g. for the pure in-memory case.
> >> > >> >>
> >> > >> >> Persistence compatibility should be preserved as it works for
> >> Ignite
> >> > >> >> 2.
> >> > >> >>
> >> > >> > My ideal situation would be that we start a newer Ignite version,
> >> > >> > it
> >> > >> comes
> >> > >> > online, joins the cluster and is treated as some kind of
> >> > >> > maintenance
> >> > >> > mode
> >> > >> > as you suggested. In maintenance mode, the other nodes re-balance
> >> > >> > or
> >> > >> > some
> >> > >> > other process to send all the data this new node will handle over
> >> > >> > to
> >> > >> > it.
> >> > >> > The existing nodes continue serving this data until the new node
> >> > >> > is
> >> no
> >> > >> > longer in maintenance mode and then it becomes the primary for
> the
> >> > data
> >> > >> > that was rebalanced to it.
> >> > >> >
> >> > >> > The second case is if an existing node is restarted with a newer
> >> > Ignite
> >> > >> > version. No re-balance is needed, it joins in maintenance mode,
> >> > >> > runs
> >> > >> > any
> >> > >> > upgrade/conversion or other task it needs to and then starts
> >> accepting
> >> > >> > reads and writes. Communication with lower version nodes can be
> >> > >> > limited,
> >> > >> > they are aware of it and sends it data and queries for which it
> is
> >> the
> >> > >> > primary assuming they will also be upgraded.
> >> > >> >
> >> > >> > I guess I'm not aware of the compatibility issues this presents
> >> > >> > and
> >> so
> >> > >> > my
> >> > >> > view is narrow and perhaps naive here.
> >> > >> >
> >> > >> >>
> >> > >> >>
> >> > >> >> >    5. Will it be possible to provide a custom cache store
> still
> >> and
> >> > >> will
> >> > >> >> >   these changes enable custom cache stores to be queryable
> from
> >> > SQL?
> >> > >> >> I'm not sure I fully understand this.
> >> > >> >> 1. Usually, SQL is about indices. Ignite can't perform a query
> >> > >> >> over
> >> > >> >> the
> >> > >> >> unindexed data.
> >> > >> >>
> >> > >> > Yes understood
> >> > >> >
> >> > >> >>
> >> > >> >> 2. Fullscan over the cache that contains only part of data +
> scan
> >> the
> >> > >> >> CacheStore, then merging the results is a pain.
> >> > >> >> Most likely, running a query over CacheStore directly will be a
> >> > >> >> simpler
> >> > >> >> way, and even more performant.
> >> > >> >> Shared CacheStore (same for all nodes) will definitely kill the
> >> > >> >> performance
> >> > >> >> in that case.
> >> > >> >> So, the preliminary loadCache() call looks like a good
> >> > >> >> compromise.
> >> > >> >>
> >> > >> > I think the problem is largely that the CacheStore interface is
> >> > >> > not
> >> > >> > sufficient for being able to do this. If it had a richer
> interface
> >> > >> > which
> >> > >> > allowed the cache store to answer index queries basically hooking
> >> into
> >> > >> > whatever Ignite's doing for its B+tree then this would be viable.
> >> > >> > A
> >> > >> > CacheStore that only implements KV API doesn't take part in SQL
> >> > >> > queries.
> >> > >> >
> >> > >> >>
> >> > >> >> 3. Splitting query into 2 parts to run on Ignite and to run on
> >> > >> CacheStore
> >> > >> >> looks possible with Calcite,
> >> > >> >> but I think it impractical because in general, neither
> CacheStore
> >> nor
> >> > >> >> database structure are aware of the data partitioning.
> >> > >> >>
> >> > >> > hmmm, maybe I missed the point but as the implementor of the
> >> > CacheStore
> >> > >> > you should have knowledge of the structure and partition info. or
> >> have
> >> > >> some
> >> > >> > way of retrieving it. Again, I think the current CacheStore
> >> interface
> >> > >> > is
> >> > >> > the problem and if it was extended to provide this information
> >> > >> > then
> >> > its
> >> > >> up
> >> > >> > to the implementation to do this whilst Ignite knows that any
> >> > >> > implementation of these interfaces will meet the contract
> >> > >> > necessary.
> >> > >> >
> >> > >> >
> >> > >> >>
> >> > >> >> 4. Transactions can't be supported in case of direct CacheStore
> >> > >> >> access,
> >> > >> >> because even if the underlying database supports 2-phase commit,
> >> > which
> >> > >> is
> >> > >> >> a
> >> > >> >> rare case, the recovery protocol looks hard.
> >> > >> >> Just looks like this feature doesn't worth it.
> >> > >> >>
> >> > >> > I'd completely agree with this. It will be incredibly hard to get
> >> this
> >> > >> > done reliably
> >> > >> >
> >> > >> >>
> >> > >> >>
> >> > >> >> >   6. This question wasn't mine but I was going to ask it as
> >> > >> >> > well:
> >> > >> >> > What
> >> > >> >> >   will happen to the Indexing API since H2 is being removed?
> >> > >> >> As I wrote above, Indexing SPI will be dropped, but IndexQuery
> >> > >> >> will
> >> > be
> >> > >> >> added.
> >> > >> >>
> >> > >> >> >  1. As I mentioned above, we Index into Solr, in earlier
> >> > >> >> > versions
> >> > of
> >> > >> >> >      our product we used the indexing SPI to index into Lucene
> >> > >> >> > on
> >> > >> >> > the
> >> > >> >> Ignite
> >> > >> >> >      nodes but this presented so many challenges we ultimately
> >> > >> abandoned
> >> > >> >> it and
> >> > >> >> >      replaced it with the current Solr solution.
> >> > >> >> AFAIK, some guys developed and sell a plugin for Ignite-2 with
> >> > >> persistent
> >> > >> >> Lucene and Geo indices.
> >> > >> >> I don't know about the capabilities and limitations of their
> >> > solution,
> >> > >> >> because of closed code.
> >> > >> >> You can easily google it.
> >> > >> >>
> >> > >> >> I saw few encouraged guys who want to improve TEXT queries,
> >> > >> >> but unfortunately, things weren't moved far enough. For now,
> they
> >> are
> >> > >> >> in
> >> > >> >> the middle of fixing the merging TEXT query results.
> >> > >> >> So far so good.
> >> > >> >>
> >> > >> >> I think it is a good chance to master the skill developing of a
> >> > >> >> distributed
> >> > >> >> system for the one
> >> > >> >> who will take a lead over the full-text search feature and add
> >> native
> >> > >> >> FullText index support into Ignite-3.
> >> > >> >>
> >> > >> > I've seen the other thread from Atri I believe about this.
> >> > >> >
> >> > >> >>
> >> > >> >>
> >> > >> >> >   7. What impact does RAFT now have on conflict resolution?
> >> > >> >> RAFT is a state machine replication protocol. It guarantees all
> >> > >> >> the
> >> > >> nodes
> >> > >> >> will see the updates in the same order.
> >> > >> >> So, seems no conflicts are possible. Recovery from split-brain
> is
> >> > >> >> impossible in common-case.
> >> > >> >>
> >> > >> >> However, I think we have a conflict resolver analog in Ignite-3
> >> > >> >> as
> >> it
> >> > >> >> is
> >> > >> >> very useful in some cases
> >> > >> >> e.g datacenter replication, incremental data load from 3-rd
> party
> >> > >> source,
> >> > >> >> recovery from 3-rd party source.
> >> > >> >>
> >> > >> >>
> >> > >> >> > 8. CacheGroups.
> >> > >> >> AFAIK, CacheGroup will be eliminated, actually, we'll keep this
> >> > >> mechanic,
> >> > >> >> but it will be configured in a different way,
> >> > >> >> which makes Ignite configuring a bit simpler.
> >> > >> >> Sorry, for now, I have no answer on your performance concerns,
> >> > >> >> this
> >> > >> >> part
> >> > >> >> of
> >> > >> >> Ignite-3 slipped from my radar.
> >> > >> >>
> >> > >> > No worries. I'll wait and see if anyone else suggests something.
> >> > >> > Its
> >> > >> > getting a lot worse, a node took 1hr to start yesterday after a
> >> > >> deployment
> >> > >> > and its in prod with very little visibility into what it is
> doing,
> >> it
> >> > >> > was
> >> > >> > just stopped, no logging or anything and then resumed.
> >> > >> >
> >> > >> > 2021-07-22 13:40:15.997  INFO [ArcOS,,,] 9 --- [orker-#40%hypi%]
> >> > >> > o.a.i.i.p.cache.GridCacheProcessor      [285] :  Finished
> recovery
> >> for
> >> > >> > cache [cache=hypi_01F8ZC3DGT66RNYCDZH3XNVY2E_Hue, grp=hypi,
> >> > >> > startVer=AffinityTopologyVersion [topVer=79, minorTopVer=0]]
> >> > >> >
> >> > >> > One hour later it printed the next cache recovery message and
> >> started
> >> > >> > 30
> >> > >> > seconds after going through other tables.
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> >>
> >> > >> >> Let's wait if someone will clarify what we could expect in
> >> Ignite-3.
> >> > >> >> Guys, can someone chime in and give more light on 3,4,7,8
> >> questions?
> >> > >> >>
> >> > >> >>
> >> > >> >> On Thu, Jul 22, 2021 at 4:15 AM Courtney Robinson <
> >> > >> >> courtney.robinson@hypi.io>
> >> > >> >> wrote:
> >> > >> >>
> >> > >> >> > Hey everyone,
> >> > >> >> > I attended the Alpha 2 update yesterday and was quite pleased
> >> > >> >> > to
> >> > see
> >> > >> the
> >> > >> >> > progress on things so far. So first, congratulations to
> >> > >> >> > everyone
> >> on
> >> > >> the
> >> > >> >> > work being put in and thank you to Val and Kseniya for running
> >> > >> >> yesterday's
> >> > >> >> > event.
> >> > >> >> >
> >> > >> >> > I asked a few questions after the webinar which Val had some
> >> > answers
> >> > >> to
> >> > >> >> but
> >> > >> >> > suggested posting here as some of them are not things that
> have
> >> > been
> >> > >> >> > thought about yet or no plans exist around it at this point.
> >> > >> >> >
> >> > >> >> > I'll put all of them here and if necessary we can break into
> >> > >> >> > different
> >> > >> >> > threads after.
> >> > >> >> >
> >> > >> >> >    1. Schema change - does that include the ability to change
> >> > >> >> > the
> >> > >> types
> >> > >> >> of
> >> > >> >> >    fields/columns?
> >> > >> >> >       1. Val's answer was yes with some limitations but those
> >> > >> >> > are
> >> > >> >> > not
> >> > >> >> well
> >> > >> >> >       defined yet. He did mention that something like some
> kind
> >> of
> >> > >> >> > transformer
> >> > >> >> >       could be provided for doing the conversion and I would
> >> second
> >> > >> >> this,
> >> > >> >> > even
> >> > >> >> >       for common types like int to long being able to do a
> >> > >> >> > custom
> >> > >> >> > conversion will
> >> > >> >> >       be immensely valuable.
> >> > >> >> >    2. Will the new guaranteed consistency between APIs also
> >> > >> >> > mean
> >> > SQL
> >> > >> >> will
> >> > >> >> >    gain transaction support?
> >> > >> >> >       1. I believe the answer here was yes but perhaps someone
> >> else
> >> > >> may
> >> > >> >> >       want to weigh in to confirm
> >> > >> >> >    3. Has there been any decision about how much of Calcite
> >> > >> >> > will
> >> be
> >> > >> >> exposed
> >> > >> >> >    to the client? When using thick clients, it'll be hugely
> >> > >> >> > beneficial
> >> > >> >> to
> >> > >> >> > be
> >> > >> >> >    able to work with Calcite APIs directly to provide custom
> >> rules
> >> > >> >> > and
> >> > >> >> >    optimisations to better suit organisation needs
> >> > >> >> >    1. We currently use Calcite ourselves and have a lot of
> >> > >> >> > custom
> >> > >> rules
> >> > >> >> and
> >> > >> >> >       optimisations and have slowly pushed more of our queries
> >> > >> >> > to
> >> > >> >> > Calcite that we
> >> > >> >> >       then push down to Ignite.
> >> > >> >> >       2. We Index into Solr and use the Solr indices and
> others
> >> to
> >> > >> >> >       fulfill over all queries with Ignite just being one of
> >> > >> >> > the
> >> > >> >> > possible storage
> >> > >> >> >       targets Calcite pushes down to. If we could get to the
> >> > calcite
> >> > >> >> > API from an
> >> > >> >> >       Ignite thick client, it would enable us to remove a
> layer
> >> of
> >> > >> >> > abstraction
> >> > >> >> >       and complexity and make Ignite our primary that we then
> >> link
> >> > >> >> > with Solr and
> >> > >> >> >       others to fulfill queries.
> >> > >> >> >    4. Will the unified storage model enable different versions
> >> > >> >> > of
> >> > >> >> Ignite to
> >> > >> >> >    be in the cluster when persistence is enabled so that
> >> > >> >> > rolling
> >> > >> >> restarts
> >> > >> >> > can
> >> > >> >> >    be done?
> >> > >> >> >    1. We have to do a strange dance to perform Ignite upgrades
> >> > >> >> > without
> >> > >> >> >       downtime because pods/nodes will fail to start on
> version
> >> > >> mismatch
> >> > >> >> > and if
> >> > >> >> >       we get that dance wrong, we will corrupt a node's data.
> >> > >> >> > It
> >> > >> >> > will
> >> > >> >> make
> >> > >> >> >       admin/upgrades far less brittle and error prone if this
> >> > >> >> > was
> >> > >> >> possible.
> >> > >> >> >    5. Will it be possible to provide a custom cache store
> still
> >> and
> >> > >> will
> >> > >> >> >    these changes enable custom cache stores to be queryable
> >> > >> >> > from
> >> > >> >> > SQL?
> >> > >> >> >    1. Our Ignite usage is wide and complex because we use KV,
> >> > >> >> > SQL
> >> > >> >> > and
> >> > >> >> other
> >> > >> >> >       APIs. The inconsistency of what can and can't be used
> >> > >> >> > from
> >> > one
> >> > >> >> API to
> >> > >> >> >       another is a real challenge and has forced us over time
> >> > >> >> > to
> >> > >> >> > stick
> >> > >> >> > to one API
> >> > >> >> >       and write alternative solutions outside of Ignite. It
> >> > >> >> > will
> >> > >> >> > drastically
> >> > >> >> >       simplify things if any CacheStore (or some new
> >> > >> >> > equivalent)
> >> > >> >> > could
> >> > >> >> > be plugged
> >> > >> >> >       in and be made accessible to SQL (and in fact all other
> >> APIs)
> >> > >> >> without
> >> > >> >> >       having to load all the data from the underlying
> >> > >> >> > CacheStore
> >> > >> >> > first
> >> > >> >> > into memory
> >> > >> >> >    6. This question wasn't mine but I was going to ask it as
> >> well:
> >> > >> What
> >> > >> >> >    will happen to the Indexing API since H2 is being removed?
> >> > >> >> >       1. As I mentioned above, we Index into Solr, in earlier
> >> > >> >> > versions
> >> > >> >> of
> >> > >> >> >       our product we used the indexing SPI to index into
> Lucene
> >> on
> >> > >> >> > the
> >> > >> >> > Ignite
> >> > >> >> >       nodes but this presented so many challenges we
> ultimately
> >> > >> >> > abandoned it and
> >> > >> >> >       replaced it with the current Solr solution.
> >> > >> >> >       2. Lucene indexing was ideal because it meant we didn't
> >> have
> >> > >> >> > to
> >> > >> >> >       re-invent Solr or Elasticsearch's sharding capabilities,
> >> that
> >> > >> was
> >> > >> >> > almost
> >> > >> >> >       automatic with Ignite only giving you the data that was
> >> meant
> >> > >> for
> >> > >> >> the
> >> > >> >> >       current node.
> >> > >> >> >       3. The Lucene API enabled more flexibility and removed a
> >> > >> >> > network
> >> > >> >> >       round trip from our queries.
> >> > >> >> >       4. Given Calcite's ability to support custom SQL
> >> > >> >> > functions,
> >> > >> >> > I'd
> >> > >> >> love
> >> > >> >> >       to have the ability to define custom functions that
> >> > >> >> > Lucene
> >> > was
> >> > >> >> > answering
> >> > >> >> >    7. What impact does RAFT now have on conflict resolution,
> >> > >> >> > off
> >> > the
> >> > >> >> top of
> >> > >> >> >    my head there are two cases
> >> > >> >> >       1. On startup after a split brain Ignite currently takes
> >> > >> >> > an
> >> > >> >> "exercise
> >> > >> >> >       for the reader" approach and dumps a log along the lines
> >> > >> >> > of
> >> > >> >> >
> >> > >> >> > >    1. BaselineTopology of joining node is not compatible
> with
> >> > >> >> > >       BaselineTopology in the cluster.
> >> > >> >> > >    1. Branching history of cluster BlT doesn't contain
> >> branching
> >> > >> point
> >> > >> >> > >       hash of joining node BlT. Consider cleaning persistent
> >> > >> >> > > storage
> >> > >> >> of
> >> > >> >> > the node
> >> > >> >> > >       and adding it to the cluster again.
> >> > >> >> > >
> >> > >> >> >    1. This leaves you with no choice except to take one half
> >> > >> >> > and
> >> > >> >> manually
> >> > >> >> >       copy, write data back over to the other half then
> destroy
> >> the
> >> > >> bad
> >> > >> >> > one.
> >> > >> >> >       2. The second case is conflicts on keys, I
> >> > >> >> >       beleive CacheVersionConflictResolver and manager are
> used
> >> > >> >> >       by GridCacheMapEntry which just says if use old value do
> >> this
> >> > >> >> > otherwise use
> >> > >> >> >       newVal. Ideally this will be exposed in the new API so
> >> > >> >> > that
> >> > >> >> > one
> >> > >> >> can
> >> > >> >> >       override this behaviour. The last writer wins approach
> >> isn't
> >> > >> >> always
> >> > >> >> > ideal
> >> > >> >> >       and the semantics of the domain can mean that what is
> >> > consider
> >> > >> >> > "correct" in
> >> > >> >> >       a conflict is not so for a different domain.
> >> > >> >> >    8. This is last on the list but is actually the most
> >> > >> >> > important
> >> > >> >> > for
> >> > >> us
> >> > >> >> >    right now as it is an impending and growing risk. We allow
> >> > >> customers
> >> > >> >> to
> >> > >> >> >    create their own tables on demand. We're already using the
> >> same
> >> > >> cache
> >> > >> >> > group
> >> > >> >> >    etc for data structures to be re-used but now that we're
> >> getting
> >> > >> >> > to
> >> > >> >> >    thousands of tables/caches our startup times are sometimes
> >> > >> >> unpredictably
> >> > >> >> >    long - at present it seems to depend on the state of the
> >> > >> cache/table
> >> > >> >> > before
> >> > >> >> >    the restart but we're into the order of 5 - 7 mins and
> >> steadily
> >> > >> >> > increasing
> >> > >> >> >    with the growth of tables. Are there any provisions in
> >> > >> >> > Ignite
> >> 3
> >> > >> >> > for
> >> > >> >> >    ensuring startup time isn't proportional to the number of
> >> > >> >> tables/caches
> >> > >> >> >    available?
> >> > >> >> >
> >> > >> >> >
> >> > >> >> > Those are the key things I can think of at the moment. Val and
> >> > >> >> > others
> >> > >> >> I'd
> >> > >> >> > love to open a conversation around these.
> >> > >> >> >
> >> > >> >> > Regards,
> >> > >> >> > Courtney Robinson
> >> > >> >> > Founder and CEO, Hypi
> >> > >> >> > Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
> >> > >> >> >
> >> > >> >> > <https://hypi.io>
> >> > >> >> > https://hypi.io
> >> > >> >> >
> >> > >> >>
> >> > >> >>
> >> > >> >> --
> >> > >> >> Best regards,
> >> > >> >> Andrey V. Mashenkov
> >> > >> >>
> >> > >> >
> >> > >>
> >> > >
> >> >
> >> >
> >> > --
> >> >
> >> > Best regards,
> >> > Ivan Pavlukhin
> >> >
> >>
> >
>
>
> --
>
> Best regards,
> Ivan Pavlukhin
>