You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Alexey Zinoviev <za...@gmail.com> on 2018/08/01 13:45:00 UTC

[Distributed SQL] Do we have a plan to implement QuadTree index?

Hi, Igniters.

Currently I'm working on different math stuff over the Apache Ignite and in
a few tasks I need to implement in memory something like this

https://en.wikipedia.org/wiki/Quadtree

I didn't find such index in Apache Ignite, but maybe it's under development
by somebody?

Is it a difficult to add a new index type to our distributed SQL (from
point of view of different infrastructure issues and so on P.S I don't
worry the math stuff here because I've implemented it many times in
non-distributed version)?

It will be great to hear any kind of your thoughts and maybe somebody could
help with implementation

zaleslaw

Re: [Distributed SQL] Do we have a plan to implement QuadTree index?

Posted by Alexey Zinoviev <za...@gmail.com>.
Vladimir, yes I remember this discussion with Yuri. Of course I couldn't
estimate the changes to add new index type as you and thank you for your
consultation.

Maybe it's best approach to continue with my draft where it's implemented
via ML specific data structures over the cache.

2018-08-16 14:16 GMT+06:00 Vladimir Ozerov <vo...@gridgain.com>:

> Hi Alex,
>
> From what I see there is some interest to K-D indexes in SQL world.
> Postgres has it. MySQL users asks about it from time to time, and usually
> use some simpler spatial solutions ask MySQL lacks this index type. We
> definitely may consider integrating it with SQL, but this would be big
> enough feature, requiring changes in almost all SQL components.
>
> For this reason I would put SQL case aside for now and focus on
> implementation for ML. I had a talk with Yuriy Babak some time ago and
> explained how new index type can be added to the product. May be Yuriy has
> some thoughts on how to do that properly with respect to ML roadmap and
> plans.
>
> On Thu, Aug 16, 2018 at 6:21 AM Alexey Zinoviev <za...@gmail.com>
> wrote:
>
> > Sorry, for the delay, dear Pavel and Denis.
> >
> > Yes, I need a kind of indexing to improve KNN algorithms during training
> > the model.
> >
> > In my draft solution I've implemented building of
> > https://en.wikipedia.org/wiki/K-d_tree
> > <https://en.wikipedia.org/wiki/K-d_tree> on the training data set.
> > It collects the information about data distributed in our specific ML
> > Datasets (distributed by nodes over Ignite cache)
> >
> > Pavel, you ask me any questions and I've prepared answers.
> >
> > 1) Should be this index in-memory only or you want to persist it?
> > *Maybe it should be persisted (to reuse that for next predictions)*
> >
> > 2) In general index can be implemented in two ways per-partition and
> > per-node.
> > *Thank you for your explanation. Of course the per-node is better.*
> >
> > 3) I think K-d tree is preferable
> > *You are absolutely right, it should be K-d tree*
> >
> > 4) Could you please share use cases you're trying to solve with QuadTree?
> > With
> > close to real data and examples of queries?
> >
> > I need to solve *k-nearest neighbors search task *on a set of vectors
> with
> > unique keys presented in Ignite Cache (training set),
> > during the training phase I'm going to build a temporary index as a
> KD-Tree
> > (based on distance between vectors).
> >
> > The distance metric is a parameter too.
> >
> > After that, in prediction phase the *k-nearest neighbors search task *is
> > solved by brute-force iteration over all vectors to find the *k-nearest
> > neighbors.*
> > I'd like to improve the search part with queries to index to extract
> > closest vectors.
> >
> > Of course, it's kind of experiment, and maybe it's bad idea to patch SQL
> > internals to solve this private task, but as you mentioned it can be a
> good
> > point of collaboration between ML and SQL components.
> >
> > Can I get one of the implemented indices as a primary example and
> implement
> > it by myself?
> > Could you recommend something as an start point for me?
> >
> > Thanks for your help.
> >
> >
> >
> >
> > 2018-08-04 11:18 GMT+06:00 Denis Magda <dm...@apache.org>:
> >
> > > Alexey, are you working on some new ML/DL APIs/algorithms? Please
> > elaborate
> > > what you'd like to add to Ignite.
> > >
> > > --
> > > Denis
> > >
> > > On Wed, Aug 1, 2018 at 3:10 PM Pavel Kovalenko <jo...@gmail.com>
> > wrote:
> > >
> > > > Hello Alexey,
> > > >
> > > > It's not so difficult to implement new type of indexing of data, but
> if
> > > you
> > > > want to reach performance in distributed environment you need to have
> > > > strong knowledge of a data you're indexing and what kind of queries
> you
> > > > want to execute.
> > > > Should be this index in-memory only or you want to persist it?
> > > > In case of persistence your index should fit our page memory model
> > > > requirements.
> > > > In both cases your index should be ready to work in concurrent
> > > environment.
> > > >
> > > > In general index can be implemented in two ways per-partition and
> > > per-node.
> > > > Per-partition may be efficient if you have a lot of points (x,y)
> > > > representing a big one, e.g. image. In this case it's required that
> all
> > > of
> > > > these points will be in one partition that query e.g. makes images
> > > > intersection will execute in one node. But if you have multiple
> images,
> > > > your index will contain also another points from other object and
> will
> > > > overload it.
> > > > Per-node may be efficient if you have a lot of points (x,y) that are
> > > > independent of each other, that you will use it as spatial e.g.. But
> in
> > > > this case, I think K-d tree is preferable as it can be used in more
> > wide
> > > > way.
> > > >
> > > > Could you please share use cases you're trying to solve with
> QuadTree?
> > > With
> > > > close to real data and examples of queries? Because now the question
> is
> > > too
> > > > abstract and it's hard to understand how it should be implemented to
> > > reach
> > > > good results.
> > > >
> > > >
> > > > 2018-08-01 16:45 GMT+03:00 Alexey Zinoviev <za...@gmail.com>:
> > > >
> > > > > Hi, Igniters.
> > > > >
> > > > > Currently I'm working on different math stuff over the Apache
> Ignite
> > > and
> > > > in
> > > > > a few tasks I need to implement in memory something like this
> > > > >
> > > > > https://en.wikipedia.org/wiki/Quadtree
> > > > >
> > > > > I didn't find such index in Apache Ignite, but maybe it's under
> > > > development
> > > > > by somebody?
> > > > >
> > > > > Is it a difficult to add a new index type to our distributed SQL
> > (from
> > > > > point of view of different infrastructure issues and so on P.S I
> > don't
> > > > > worry the math stuff here because I've implemented it many times in
> > > > > non-distributed version)?
> > > > >
> > > > > It will be great to hear any kind of your thoughts and maybe
> somebody
> > > > could
> > > > > help with implementation
> > > > >
> > > > > zaleslaw
> > > > >
> > > >
> > >
> >
>

Re: [Distributed SQL] Do we have a plan to implement QuadTree index?

Posted by Vladimir Ozerov <vo...@gridgain.com>.
Hi Alex,

From what I see there is some interest to K-D indexes in SQL world.
Postgres has it. MySQL users asks about it from time to time, and usually
use some simpler spatial solutions ask MySQL lacks this index type. We
definitely may consider integrating it with SQL, but this would be big
enough feature, requiring changes in almost all SQL components.

For this reason I would put SQL case aside for now and focus on
implementation for ML. I had a talk with Yuriy Babak some time ago and
explained how new index type can be added to the product. May be Yuriy has
some thoughts on how to do that properly with respect to ML roadmap and
plans.

On Thu, Aug 16, 2018 at 6:21 AM Alexey Zinoviev <za...@gmail.com>
wrote:

> Sorry, for the delay, dear Pavel and Denis.
>
> Yes, I need a kind of indexing to improve KNN algorithms during training
> the model.
>
> In my draft solution I've implemented building of
> https://en.wikipedia.org/wiki/K-d_tree
> <https://en.wikipedia.org/wiki/K-d_tree> on the training data set.
> It collects the information about data distributed in our specific ML
> Datasets (distributed by nodes over Ignite cache)
>
> Pavel, you ask me any questions and I've prepared answers.
>
> 1) Should be this index in-memory only or you want to persist it?
> *Maybe it should be persisted (to reuse that for next predictions)*
>
> 2) In general index can be implemented in two ways per-partition and
> per-node.
> *Thank you for your explanation. Of course the per-node is better.*
>
> 3) I think K-d tree is preferable
> *You are absolutely right, it should be K-d tree*
>
> 4) Could you please share use cases you're trying to solve with QuadTree?
> With
> close to real data and examples of queries?
>
> I need to solve *k-nearest neighbors search task *on a set of vectors with
> unique keys presented in Ignite Cache (training set),
> during the training phase I'm going to build a temporary index as a KD-Tree
> (based on distance between vectors).
>
> The distance metric is a parameter too.
>
> After that, in prediction phase the *k-nearest neighbors search task *is
> solved by brute-force iteration over all vectors to find the *k-nearest
> neighbors.*
> I'd like to improve the search part with queries to index to extract
> closest vectors.
>
> Of course, it's kind of experiment, and maybe it's bad idea to patch SQL
> internals to solve this private task, but as you mentioned it can be a good
> point of collaboration between ML and SQL components.
>
> Can I get one of the implemented indices as a primary example and implement
> it by myself?
> Could you recommend something as an start point for me?
>
> Thanks for your help.
>
>
>
>
> 2018-08-04 11:18 GMT+06:00 Denis Magda <dm...@apache.org>:
>
> > Alexey, are you working on some new ML/DL APIs/algorithms? Please
> elaborate
> > what you'd like to add to Ignite.
> >
> > --
> > Denis
> >
> > On Wed, Aug 1, 2018 at 3:10 PM Pavel Kovalenko <jo...@gmail.com>
> wrote:
> >
> > > Hello Alexey,
> > >
> > > It's not so difficult to implement new type of indexing of data, but if
> > you
> > > want to reach performance in distributed environment you need to have
> > > strong knowledge of a data you're indexing and what kind of queries you
> > > want to execute.
> > > Should be this index in-memory only or you want to persist it?
> > > In case of persistence your index should fit our page memory model
> > > requirements.
> > > In both cases your index should be ready to work in concurrent
> > environment.
> > >
> > > In general index can be implemented in two ways per-partition and
> > per-node.
> > > Per-partition may be efficient if you have a lot of points (x,y)
> > > representing a big one, e.g. image. In this case it's required that all
> > of
> > > these points will be in one partition that query e.g. makes images
> > > intersection will execute in one node. But if you have multiple images,
> > > your index will contain also another points from other object and will
> > > overload it.
> > > Per-node may be efficient if you have a lot of points (x,y) that are
> > > independent of each other, that you will use it as spatial e.g.. But in
> > > this case, I think K-d tree is preferable as it can be used in more
> wide
> > > way.
> > >
> > > Could you please share use cases you're trying to solve with QuadTree?
> > With
> > > close to real data and examples of queries? Because now the question is
> > too
> > > abstract and it's hard to understand how it should be implemented to
> > reach
> > > good results.
> > >
> > >
> > > 2018-08-01 16:45 GMT+03:00 Alexey Zinoviev <za...@gmail.com>:
> > >
> > > > Hi, Igniters.
> > > >
> > > > Currently I'm working on different math stuff over the Apache Ignite
> > and
> > > in
> > > > a few tasks I need to implement in memory something like this
> > > >
> > > > https://en.wikipedia.org/wiki/Quadtree
> > > >
> > > > I didn't find such index in Apache Ignite, but maybe it's under
> > > development
> > > > by somebody?
> > > >
> > > > Is it a difficult to add a new index type to our distributed SQL
> (from
> > > > point of view of different infrastructure issues and so on P.S I
> don't
> > > > worry the math stuff here because I've implemented it many times in
> > > > non-distributed version)?
> > > >
> > > > It will be great to hear any kind of your thoughts and maybe somebody
> > > could
> > > > help with implementation
> > > >
> > > > zaleslaw
> > > >
> > >
> >
>

Re: [Distributed SQL] Do we have a plan to implement QuadTree index?

Posted by Alexey Zinoviev <za...@gmail.com>.
Ok, many thanks, will look at geospatial index

2018-08-16 14:10 GMT+06:00 Alexey Goncharuk <al...@gmail.com>:

> Alexey,
>
> I am not sure if it will be a proper fir for you, but I think it worth a
> try.
>
> Apache Ignite has an option to index geospatial data using third-party
> libraries (note that it is not included in the default Ignite build, the
> module is activated via the lgpl profile). The index is located in
> Ignite-geospatial module and uses an r-tree implementation underneath. One
> downside of this module is that the geospatial index is not supported for
> the Ignite native persistence yet.
>
> Hope this helps!
> --AG
>
> чт, 16 авг. 2018 г. в 6:21, Alexey Zinoviev <za...@gmail.com>:
>
> > Sorry, for the delay, dear Pavel and Denis.
> >
> > Yes, I need a kind of indexing to improve KNN algorithms during training
> > the model.
> >
> > In my draft solution I've implemented building of
> > https://en.wikipedia.org/wiki/K-d_tree
> > <https://en.wikipedia.org/wiki/K-d_tree> on the training data set.
> > It collects the information about data distributed in our specific ML
> > Datasets (distributed by nodes over Ignite cache)
> >
> > Pavel, you ask me any questions and I've prepared answers.
> >
> > 1) Should be this index in-memory only or you want to persist it?
> > *Maybe it should be persisted (to reuse that for next predictions)*
> >
> > 2) In general index can be implemented in two ways per-partition and
> > per-node.
> > *Thank you for your explanation. Of course the per-node is better.*
> >
> > 3) I think K-d tree is preferable
> > *You are absolutely right, it should be K-d tree*
> >
> > 4) Could you please share use cases you're trying to solve with QuadTree?
> > With
> > close to real data and examples of queries?
> >
> > I need to solve *k-nearest neighbors search task *on a set of vectors
> with
> > unique keys presented in Ignite Cache (training set),
> > during the training phase I'm going to build a temporary index as a
> KD-Tree
> > (based on distance between vectors).
> >
> > The distance metric is a parameter too.
> >
> > After that, in prediction phase the *k-nearest neighbors search task *is
> > solved by brute-force iteration over all vectors to find the *k-nearest
> > neighbors.*
> > I'd like to improve the search part with queries to index to extract
> > closest vectors.
> >
> > Of course, it's kind of experiment, and maybe it's bad idea to patch SQL
> > internals to solve this private task, but as you mentioned it can be a
> good
> > point of collaboration between ML and SQL components.
> >
> > Can I get one of the implemented indices as a primary example and
> implement
> > it by myself?
> > Could you recommend something as an start point for me?
> >
> > Thanks for your help.
> >
> >
> >
> >
> > 2018-08-04 11:18 GMT+06:00 Denis Magda <dm...@apache.org>:
> >
> > > Alexey, are you working on some new ML/DL APIs/algorithms? Please
> > elaborate
> > > what you'd like to add to Ignite.
> > >
> > > --
> > > Denis
> > >
> > > On Wed, Aug 1, 2018 at 3:10 PM Pavel Kovalenko <jo...@gmail.com>
> > wrote:
> > >
> > > > Hello Alexey,
> > > >
> > > > It's not so difficult to implement new type of indexing of data, but
> if
> > > you
> > > > want to reach performance in distributed environment you need to have
> > > > strong knowledge of a data you're indexing and what kind of queries
> you
> > > > want to execute.
> > > > Should be this index in-memory only or you want to persist it?
> > > > In case of persistence your index should fit our page memory model
> > > > requirements.
> > > > In both cases your index should be ready to work in concurrent
> > > environment.
> > > >
> > > > In general index can be implemented in two ways per-partition and
> > > per-node.
> > > > Per-partition may be efficient if you have a lot of points (x,y)
> > > > representing a big one, e.g. image. In this case it's required that
> all
> > > of
> > > > these points will be in one partition that query e.g. makes images
> > > > intersection will execute in one node. But if you have multiple
> images,
> > > > your index will contain also another points from other object and
> will
> > > > overload it.
> > > > Per-node may be efficient if you have a lot of points (x,y) that are
> > > > independent of each other, that you will use it as spatial e.g.. But
> in
> > > > this case, I think K-d tree is preferable as it can be used in more
> > wide
> > > > way.
> > > >
> > > > Could you please share use cases you're trying to solve with
> QuadTree?
> > > With
> > > > close to real data and examples of queries? Because now the question
> is
> > > too
> > > > abstract and it's hard to understand how it should be implemented to
> > > reach
> > > > good results.
> > > >
> > > >
> > > > 2018-08-01 16:45 GMT+03:00 Alexey Zinoviev <za...@gmail.com>:
> > > >
> > > > > Hi, Igniters.
> > > > >
> > > > > Currently I'm working on different math stuff over the Apache
> Ignite
> > > and
> > > > in
> > > > > a few tasks I need to implement in memory something like this
> > > > >
> > > > > https://en.wikipedia.org/wiki/Quadtree
> > > > >
> > > > > I didn't find such index in Apache Ignite, but maybe it's under
> > > > development
> > > > > by somebody?
> > > > >
> > > > > Is it a difficult to add a new index type to our distributed SQL
> > (from
> > > > > point of view of different infrastructure issues and so on P.S I
> > don't
> > > > > worry the math stuff here because I've implemented it many times in
> > > > > non-distributed version)?
> > > > >
> > > > > It will be great to hear any kind of your thoughts and maybe
> somebody
> > > > could
> > > > > help with implementation
> > > > >
> > > > > zaleslaw
> > > > >
> > > >
> > >
> >
>

Re: [Distributed SQL] Do we have a plan to implement QuadTree index?

Posted by Alexey Goncharuk <al...@gmail.com>.
Alexey,

I am not sure if it will be a proper fir for you, but I think it worth a
try.

Apache Ignite has an option to index geospatial data using third-party
libraries (note that it is not included in the default Ignite build, the
module is activated via the lgpl profile). The index is located in
Ignite-geospatial module and uses an r-tree implementation underneath. One
downside of this module is that the geospatial index is not supported for
the Ignite native persistence yet.

Hope this helps!
--AG

чт, 16 авг. 2018 г. в 6:21, Alexey Zinoviev <za...@gmail.com>:

> Sorry, for the delay, dear Pavel and Denis.
>
> Yes, I need a kind of indexing to improve KNN algorithms during training
> the model.
>
> In my draft solution I've implemented building of
> https://en.wikipedia.org/wiki/K-d_tree
> <https://en.wikipedia.org/wiki/K-d_tree> on the training data set.
> It collects the information about data distributed in our specific ML
> Datasets (distributed by nodes over Ignite cache)
>
> Pavel, you ask me any questions and I've prepared answers.
>
> 1) Should be this index in-memory only or you want to persist it?
> *Maybe it should be persisted (to reuse that for next predictions)*
>
> 2) In general index can be implemented in two ways per-partition and
> per-node.
> *Thank you for your explanation. Of course the per-node is better.*
>
> 3) I think K-d tree is preferable
> *You are absolutely right, it should be K-d tree*
>
> 4) Could you please share use cases you're trying to solve with QuadTree?
> With
> close to real data and examples of queries?
>
> I need to solve *k-nearest neighbors search task *on a set of vectors with
> unique keys presented in Ignite Cache (training set),
> during the training phase I'm going to build a temporary index as a KD-Tree
> (based on distance between vectors).
>
> The distance metric is a parameter too.
>
> After that, in prediction phase the *k-nearest neighbors search task *is
> solved by brute-force iteration over all vectors to find the *k-nearest
> neighbors.*
> I'd like to improve the search part with queries to index to extract
> closest vectors.
>
> Of course, it's kind of experiment, and maybe it's bad idea to patch SQL
> internals to solve this private task, but as you mentioned it can be a good
> point of collaboration between ML and SQL components.
>
> Can I get one of the implemented indices as a primary example and implement
> it by myself?
> Could you recommend something as an start point for me?
>
> Thanks for your help.
>
>
>
>
> 2018-08-04 11:18 GMT+06:00 Denis Magda <dm...@apache.org>:
>
> > Alexey, are you working on some new ML/DL APIs/algorithms? Please
> elaborate
> > what you'd like to add to Ignite.
> >
> > --
> > Denis
> >
> > On Wed, Aug 1, 2018 at 3:10 PM Pavel Kovalenko <jo...@gmail.com>
> wrote:
> >
> > > Hello Alexey,
> > >
> > > It's not so difficult to implement new type of indexing of data, but if
> > you
> > > want to reach performance in distributed environment you need to have
> > > strong knowledge of a data you're indexing and what kind of queries you
> > > want to execute.
> > > Should be this index in-memory only or you want to persist it?
> > > In case of persistence your index should fit our page memory model
> > > requirements.
> > > In both cases your index should be ready to work in concurrent
> > environment.
> > >
> > > In general index can be implemented in two ways per-partition and
> > per-node.
> > > Per-partition may be efficient if you have a lot of points (x,y)
> > > representing a big one, e.g. image. In this case it's required that all
> > of
> > > these points will be in one partition that query e.g. makes images
> > > intersection will execute in one node. But if you have multiple images,
> > > your index will contain also another points from other object and will
> > > overload it.
> > > Per-node may be efficient if you have a lot of points (x,y) that are
> > > independent of each other, that you will use it as spatial e.g.. But in
> > > this case, I think K-d tree is preferable as it can be used in more
> wide
> > > way.
> > >
> > > Could you please share use cases you're trying to solve with QuadTree?
> > With
> > > close to real data and examples of queries? Because now the question is
> > too
> > > abstract and it's hard to understand how it should be implemented to
> > reach
> > > good results.
> > >
> > >
> > > 2018-08-01 16:45 GMT+03:00 Alexey Zinoviev <za...@gmail.com>:
> > >
> > > > Hi, Igniters.
> > > >
> > > > Currently I'm working on different math stuff over the Apache Ignite
> > and
> > > in
> > > > a few tasks I need to implement in memory something like this
> > > >
> > > > https://en.wikipedia.org/wiki/Quadtree
> > > >
> > > > I didn't find such index in Apache Ignite, but maybe it's under
> > > development
> > > > by somebody?
> > > >
> > > > Is it a difficult to add a new index type to our distributed SQL
> (from
> > > > point of view of different infrastructure issues and so on P.S I
> don't
> > > > worry the math stuff here because I've implemented it many times in
> > > > non-distributed version)?
> > > >
> > > > It will be great to hear any kind of your thoughts and maybe somebody
> > > could
> > > > help with implementation
> > > >
> > > > zaleslaw
> > > >
> > >
> >
>

Re: [Distributed SQL] Do we have a plan to implement QuadTree index?

Posted by Alexey Zinoviev <za...@gmail.com>.
Sorry, for the delay, dear Pavel and Denis.

Yes, I need a kind of indexing to improve KNN algorithms during training
the model.

In my draft solution I've implemented building of
https://en.wikipedia.org/wiki/K-d_tree
<https://en.wikipedia.org/wiki/K-d_tree> on the training data set.
It collects the information about data distributed in our specific ML
Datasets (distributed by nodes over Ignite cache)

Pavel, you ask me any questions and I've prepared answers.

1) Should be this index in-memory only or you want to persist it?
*Maybe it should be persisted (to reuse that for next predictions)*

2) In general index can be implemented in two ways per-partition and
per-node.
*Thank you for your explanation. Of course the per-node is better.*

3) I think K-d tree is preferable
*You are absolutely right, it should be K-d tree*

4) Could you please share use cases you're trying to solve with QuadTree?
With
close to real data and examples of queries?

I need to solve *k-nearest neighbors search task *on a set of vectors with
unique keys presented in Ignite Cache (training set),
during the training phase I'm going to build a temporary index as a KD-Tree
(based on distance between vectors).

The distance metric is a parameter too.

After that, in prediction phase the *k-nearest neighbors search task *is
solved by brute-force iteration over all vectors to find the *k-nearest
neighbors.*
I'd like to improve the search part with queries to index to extract
closest vectors.

Of course, it's kind of experiment, and maybe it's bad idea to patch SQL
internals to solve this private task, but as you mentioned it can be a good
point of collaboration between ML and SQL components.

Can I get one of the implemented indices as a primary example and implement
it by myself?
Could you recommend something as an start point for me?

Thanks for your help.




2018-08-04 11:18 GMT+06:00 Denis Magda <dm...@apache.org>:

> Alexey, are you working on some new ML/DL APIs/algorithms? Please elaborate
> what you'd like to add to Ignite.
>
> --
> Denis
>
> On Wed, Aug 1, 2018 at 3:10 PM Pavel Kovalenko <jo...@gmail.com> wrote:
>
> > Hello Alexey,
> >
> > It's not so difficult to implement new type of indexing of data, but if
> you
> > want to reach performance in distributed environment you need to have
> > strong knowledge of a data you're indexing and what kind of queries you
> > want to execute.
> > Should be this index in-memory only or you want to persist it?
> > In case of persistence your index should fit our page memory model
> > requirements.
> > In both cases your index should be ready to work in concurrent
> environment.
> >
> > In general index can be implemented in two ways per-partition and
> per-node.
> > Per-partition may be efficient if you have a lot of points (x,y)
> > representing a big one, e.g. image. In this case it's required that all
> of
> > these points will be in one partition that query e.g. makes images
> > intersection will execute in one node. But if you have multiple images,
> > your index will contain also another points from other object and will
> > overload it.
> > Per-node may be efficient if you have a lot of points (x,y) that are
> > independent of each other, that you will use it as spatial e.g.. But in
> > this case, I think K-d tree is preferable as it can be used in more wide
> > way.
> >
> > Could you please share use cases you're trying to solve with QuadTree?
> With
> > close to real data and examples of queries? Because now the question is
> too
> > abstract and it's hard to understand how it should be implemented to
> reach
> > good results.
> >
> >
> > 2018-08-01 16:45 GMT+03:00 Alexey Zinoviev <za...@gmail.com>:
> >
> > > Hi, Igniters.
> > >
> > > Currently I'm working on different math stuff over the Apache Ignite
> and
> > in
> > > a few tasks I need to implement in memory something like this
> > >
> > > https://en.wikipedia.org/wiki/Quadtree
> > >
> > > I didn't find such index in Apache Ignite, but maybe it's under
> > development
> > > by somebody?
> > >
> > > Is it a difficult to add a new index type to our distributed SQL (from
> > > point of view of different infrastructure issues and so on P.S I don't
> > > worry the math stuff here because I've implemented it many times in
> > > non-distributed version)?
> > >
> > > It will be great to hear any kind of your thoughts and maybe somebody
> > could
> > > help with implementation
> > >
> > > zaleslaw
> > >
> >
>

Re: [Distributed SQL] Do we have a plan to implement QuadTree index?

Posted by Denis Magda <dm...@apache.org>.
Alexey, are you working on some new ML/DL APIs/algorithms? Please elaborate
what you'd like to add to Ignite.

--
Denis

On Wed, Aug 1, 2018 at 3:10 PM Pavel Kovalenko <jo...@gmail.com> wrote:

> Hello Alexey,
>
> It's not so difficult to implement new type of indexing of data, but if you
> want to reach performance in distributed environment you need to have
> strong knowledge of a data you're indexing and what kind of queries you
> want to execute.
> Should be this index in-memory only or you want to persist it?
> In case of persistence your index should fit our page memory model
> requirements.
> In both cases your index should be ready to work in concurrent environment.
>
> In general index can be implemented in two ways per-partition and per-node.
> Per-partition may be efficient if you have a lot of points (x,y)
> representing a big one, e.g. image. In this case it's required that all of
> these points will be in one partition that query e.g. makes images
> intersection will execute in one node. But if you have multiple images,
> your index will contain also another points from other object and will
> overload it.
> Per-node may be efficient if you have a lot of points (x,y) that are
> independent of each other, that you will use it as spatial e.g.. But in
> this case, I think K-d tree is preferable as it can be used in more wide
> way.
>
> Could you please share use cases you're trying to solve with QuadTree? With
> close to real data and examples of queries? Because now the question is too
> abstract and it's hard to understand how it should be implemented to reach
> good results.
>
>
> 2018-08-01 16:45 GMT+03:00 Alexey Zinoviev <za...@gmail.com>:
>
> > Hi, Igniters.
> >
> > Currently I'm working on different math stuff over the Apache Ignite and
> in
> > a few tasks I need to implement in memory something like this
> >
> > https://en.wikipedia.org/wiki/Quadtree
> >
> > I didn't find such index in Apache Ignite, but maybe it's under
> development
> > by somebody?
> >
> > Is it a difficult to add a new index type to our distributed SQL (from
> > point of view of different infrastructure issues and so on P.S I don't
> > worry the math stuff here because I've implemented it many times in
> > non-distributed version)?
> >
> > It will be great to hear any kind of your thoughts and maybe somebody
> could
> > help with implementation
> >
> > zaleslaw
> >
>

Re: [Distributed SQL] Do we have a plan to implement QuadTree index?

Posted by Pavel Kovalenko <jo...@gmail.com>.
Hello Alexey,

It's not so difficult to implement new type of indexing of data, but if you
want to reach performance in distributed environment you need to have
strong knowledge of a data you're indexing and what kind of queries you
want to execute.
Should be this index in-memory only or you want to persist it?
In case of persistence your index should fit our page memory model
requirements.
In both cases your index should be ready to work in concurrent environment.

In general index can be implemented in two ways per-partition and per-node.
Per-partition may be efficient if you have a lot of points (x,y)
representing a big one, e.g. image. In this case it's required that all of
these points will be in one partition that query e.g. makes images
intersection will execute in one node. But if you have multiple images,
your index will contain also another points from other object and will
overload it.
Per-node may be efficient if you have a lot of points (x,y) that are
independent of each other, that you will use it as spatial e.g.. But in
this case, I think K-d tree is preferable as it can be used in more wide
way.

Could you please share use cases you're trying to solve with QuadTree? With
close to real data and examples of queries? Because now the question is too
abstract and it's hard to understand how it should be implemented to reach
good results.


2018-08-01 16:45 GMT+03:00 Alexey Zinoviev <za...@gmail.com>:

> Hi, Igniters.
>
> Currently I'm working on different math stuff over the Apache Ignite and in
> a few tasks I need to implement in memory something like this
>
> https://en.wikipedia.org/wiki/Quadtree
>
> I didn't find such index in Apache Ignite, but maybe it's under development
> by somebody?
>
> Is it a difficult to add a new index type to our distributed SQL (from
> point of view of different infrastructure issues and so on P.S I don't
> worry the math stuff here because I've implemented it many times in
> non-distributed version)?
>
> It will be great to hear any kind of your thoughts and maybe somebody could
> help with implementation
>
> zaleslaw
>