You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Christos Erotocritou <ch...@gridgain.com> on 2017/04/05 09:02:01 UTC

Non-collocated distributed SQL Joins across caches over separate cluster groups

Igniters,

Is it correct to assume the following:
We have an Ignite cluster comprised of 2 cluster groups A & B that have different caches deployed. 
We use an Ignite client to obtain API access to the whole cluster and execute a join query that joins data across the 2 caches 
My understanding is that this is not possible, correct? 

Reading this article [1 <https://dzone.com/articles/how-apache-ignite-helped-a-large-bank-process-geog-1>] it seems that such cross-cluster-group behaviour is supported with the transactions API and also advised.

Any thoughts why the SQL API would not allow this and requires caches to be located on all nodes when the JOIN query is executed?

Cheers,
Christos

Re: Non-collocated distributed SQL Joins across caches over separate cluster groups

Posted by christos <ch...@gridgain.com>.

Thanks Sergi,

I understand that non-collocated distributed joins is a last resort very
well.

I still don't understand why cluster groups would make this any worse since
in distributed non-collocated joins the data is NOT expected to be on the
same node. Sounds to me that the cross-node calls would be almost the
same...

Christos



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Non-collocated-distributed-SQL-Joins-across-caches-over-separate-cluster-groups-tp11734p11748.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Non-collocated distributed SQL Joins across caches over separate cluster groups

Posted by Dmitriy Setrakyan <ds...@apache.org>.

On Thu, Apr 6, 2017 at 2:52 AM, Alexey Goncharuk <alexey.goncharuk@gmail.com
> wrote:

> Yes, this can happen if caches were created on different versions of
> topology, because FairAffinityFunction is stateful and requires previous
> affinity assignment state.
>

In this case we have to add validation for dynamically started  caches and
throw an exception if FairAffinityFunction was used. Do we do it?

Generally, this can be fixed by introducing cache groups that use the same
> affinity and use this shared state across all caches.
>

Can you please explain what you mean by this?

Re: Non-collocated distributed SQL Joins across caches over separate cluster groups

Posted by Alexey Goncharuk <al...@gmail.com>.

Yes, this can happen if caches were created on different versions of
topology, because FairAffinityFunction is stateful and requires previous
affinity assignment state.

Generally, this can be fixed by introducing cache groups that use the same
affinity and use this shared state across all caches.

2017-04-06 12:37 GMT+03:00 Sergi Vladykin <se...@gmail.com>:

> Andrey,
>
> I did not know that FairAffinity can lead to this inconsistent behavior. AG
> can you please comment on this?
>
> Christos,
>
> Because it will complicate execution pipeline (and by that may be slowdown
> even collocated execution) and in case of different cluster groups we never
> will be collocated.
>
> Sergi
>
> 2017-04-05 15:22 GMT+03:00 christos <ch...@gridgain.com>:
>
> > I suggest we continue the conversation on the user list. My bad for
> pinging
> > the email to both channels.
> >
> >
> >
> > --
> > View this message in context: http://apache-ignite-
> > developers.2346864.n4.nabble.com/Non-collocated-
> > distributed-SQL-Joins-across-caches-over-separate-cluster-
> > groups-tp16136p16163.html
> > Sent from the Apache Ignite Developers mailing list archive at
> Nabble.com.
> >
>

Re: Non-collocated distributed SQL Joins across caches over separate cluster groups

Posted by Sergi Vladykin <se...@gmail.com>.

Andrey,

I did not know that FairAffinity can lead to this inconsistent behavior. AG
can you please comment on this?

Christos,

Because it will complicate execution pipeline (and by that may be slowdown
even collocated execution) and in case of different cluster groups we never
will be collocated.

Sergi

2017-04-05 15:22 GMT+03:00 christos <ch...@gridgain.com>:

> I suggest we continue the conversation on the user list. My bad for pinging
> the email to both channels.
>
>
>
> --
> View this message in context: http://apache-ignite-
> developers.2346864.n4.nabble.com/Non-collocated-
> distributed-SQL-Joins-across-caches-over-separate-cluster-
> groups-tp16136p16163.html
> Sent from the Apache Ignite Developers mailing list archive at Nabble.com.
>

Re: Non-collocated distributed SQL Joins across caches over separate cluster groups

Posted by christos <ch...@gridgain.com>.

I suggest we continue the conversation on the user list. My bad for pinging
the email to both channels.



--
View this message in context: http://apache-ignite-developers.2346864.n4.nabble.com/Non-collocated-distributed-SQL-Joins-across-caches-over-separate-cluster-groups-tp16136p16163.html
Sent from the Apache Ignite Developers mailing list archive at Nabble.com.

Re: Non-collocated distributed SQL Joins across caches over separate cluster groups

Posted by Andrey Mashenkov <an...@gmail.com>.

Sergi,

Does it means that "broken" FairAffinityFunction can lead to wrong SQL
query result?
As we know, using FairAF have no guarantee that same parititions of
different caches can belongs to different nodes in some cases.

On Wed, Apr 5, 2017 at 1:47 PM, Sergi Vladykin <se...@gmail.com>
wrote:

> Hi,
>
> Moreover distributed joins can be executed only between caches with the
> same affinity (same partitions on the same nodes).
>
> Keep in mind that distributed join is already a "last resort" thing and you
> have to prefer collocated joins as much as possible, if you want to achieve
> good performance. Distributed join between different cluster groups will
> make things even worse.
>
> Sergi
>
> 2017-04-05 12:02 GMT+03:00 Christos Erotocritou <ch...@gridgain.com>:
>
> > Igniters,
> >
> > Is it correct to assume the following:
> >
> >    - We have an Ignite cluster comprised of 2 cluster groups A & B that
> >    have different caches deployed.
> >    - We use an Ignite client to obtain API access to the whole cluster
> >    and execute a join query that joins data across the 2 caches
> >
> > My understanding is that this is *not possible*, correct?
> >
> > Reading this article [1
> > <https://dzone.com/articles/how-apache-ignite-helped-a-
> large-bank-process-geog-1>]
> > it seems that such cross-cluster-group behaviour is supported with the
> > transactions API and also advised.
> >
> > Any thoughts why the SQL API would not allow this and requires caches to
> > be located on all nodes when the JOIN query is executed?
> >
> > Cheers,
> > Christos
> >
>



-- 
Best regards,
Andrey V. Mashenkov

Re: Non-collocated distributed SQL Joins across caches over separate cluster groups

Posted by Sergi Vladykin <se...@gmail.com>.

Hi,

Moreover distributed joins can be executed only between caches with the
same affinity (same partitions on the same nodes).

Keep in mind that distributed join is already a "last resort" thing and you
have to prefer collocated joins as much as possible, if you want to achieve
good performance. Distributed join between different cluster groups will
make things even worse.

Sergi

2017-04-05 12:02 GMT+03:00 Christos Erotocritou <ch...@gridgain.com>:

> Igniters,
>
> Is it correct to assume the following:
>
>    - We have an Ignite cluster comprised of 2 cluster groups A & B that
>    have different caches deployed.
>    - We use an Ignite client to obtain API access to the whole cluster
>    and execute a join query that joins data across the 2 caches
>
> My understanding is that this is *not possible*, correct?
>
> Reading this article [1
> <https://dzone.com/articles/how-apache-ignite-helped-a-large-bank-process-geog-1>]
> it seems that such cross-cluster-group behaviour is supported with the
> transactions API and also advised.
>
> Any thoughts why the SQL API would not allow this and requires caches to
> be located on all nodes when the JOIN query is executed?
>
> Cheers,
> Christos
>

Re: Non-collocated distributed SQL Joins across caches over separate cluster groups

Posted by Sergi Vladykin <se...@gmail.com>.

Hi,

Moreover distributed joins can be executed only between caches with the
same affinity (same partitions on the same nodes).

Keep in mind that distributed join is already a "last resort" thing and you
have to prefer collocated joins as much as possible, if you want to achieve
good performance. Distributed join between different cluster groups will
make things even worse.

Sergi

2017-04-05 12:02 GMT+03:00 Christos Erotocritou <ch...@gridgain.com>:

> Igniters,
>
> Is it correct to assume the following:
>
>    - We have an Ignite cluster comprised of 2 cluster groups A & B that
>    have different caches deployed.
>    - We use an Ignite client to obtain API access to the whole cluster
>    and execute a join query that joins data across the 2 caches
>
> My understanding is that this is *not possible*, correct?
>
> Reading this article [1
> <https://dzone.com/articles/how-apache-ignite-helped-a-large-bank-process-geog-1>]
> it seems that such cross-cluster-group behaviour is supported with the
> transactions API and also advised.
>
> Any thoughts why the SQL API would not allow this and requires caches to
> be located on all nodes when the JOIN query is executed?
>
> Cheers,
> Christos
>