You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Sven Beauprez <sv...@theglue.com> on 2018/09/17 08:04:27 UTC

Using materialised views for queries with joins

All,

We are in a situation where we have to query data over several caches. As we all know, joins are a nightmare in a distributed system and I know there are other means like denormalisation, but it is not sufficient anymore in some cases we have and we need the joins.

We mainly work in an OLTP context, where queries are known in advance (ie dev time) and inpsired by following blog of several years ago, I was wondering if the concept of “materialized views” could make it into Apache Ignite.
(https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/ )

It would work as follows:

  *   A query must register itself in Ignite at startup time (eg. via configuration) or during run time (eg. API call)
  *   The registered query is parsed and a new “view” cache is created which will ‘cache’ the resultset of the query (could take a while, but intermediate status can be “warming up” and “hot” when ready)
  *   All caches involved in the joins are now monitored for CUD operations and relevant data is stored in the new “view” cache so the view gets updated in real time
  *   All operations must be ACID compliant
  *   The view is queried via a very trivial select statement

Would that be feasible as a new feature?


Regards,

Sven



[cid:image001.png@01D3007B.4D007790]

SVEN BEAUPREZ

L e a d   A r c h i t e c t

De Kleetlaan 5, B-1831 Diegem
www.theglue.com<http://www.theglue.com/>

Re: Using materialised views for queries with joins

Posted by Vladimir Ozerov <vo...@gridgain.com>.

Sven,

At this point you can do that with CQ and normal cache - just list for
events from other caches, and update your own cache as needed. Or you can
query your tables periodically and save query result to the cache.
Essentially these are very simplified foundation of materialized views in
real databases.

We have no immediate plans to implement it in Ignite. Because
production-ready implementation of materialized views is very complex
matter (especially if we are talking about immediate updates), which will
require changes in virtually all Ignite subsystems.

On Fri, Sep 21, 2018 at 12:21 AM Sven Beauprez <sv...@theglue.com>
wrote:

>
> Small correction on in my second example about IoT: I meant moving average
> of course because stream is endless. An example with a sum was probably
> better __
>
> Regards,
>
> Sven
>
>
> On 20/09/2018, 16:55, "Sven Beauprez" <sv...@theglue.com> wrote:
>
>     Hi Sergi,
>
>     Thanks for your thoughts on this.
>
>     I think everything is already available in Ignite to react upon cache
> operations and what needs to be done is to check if that operation should
> also change the resultset of a query. I think this is also how "Continuous
> Queries" work in Ignite.
>     What I see as a hurdle though, is the ACID compliance (ie. inventory
> is updated within a transaction and the materialized view 'cache' is
> updated at the same time or maybe eventual consistent with a step in
> between by eg writing to a WAL or a temp cache)
>
>     Let me elaborate a little bit on my example, also giving an example
> with streaming of IoT data and refer to how Kafka uses this idea (they call
> it Interactive Queries):
>
>
>     --- EXAMPLE 1: simple shopping basket---
>     (I know I could do some manual denormalisation in the example below by
> putting inventory in shopping basket cache, but we have a much more
> challenging use case that would be too hard to describe in a mail, so if
> you could follow the reasoning and imagine that manual denormalisation is
> not possible, thinking CQRS would also help)
>
>     Coming back to the shopping basket example, where you would like to
> show the end user the amount of items still available, the resultset of the
> materialized view is not that big. The join would be between
> 'shoppingbasket' and 'inventory' where shoppingbasket is 'active', ie.
> materialized view cache (on ignite node A) would have the fields
>
>     shoppingbasketID | productItem | amount | inventoryAvailability
>
>     The shoppingbasket cache (on ignite node X) contains all active and
> inactive (finalized, discarded,... whatever reason) shoppingbaskets and the
> inventory cache (on ignite node Y) all product items with the amount still
> available.
>
>     The materialized view would then only contain active shoppingbaskets
> with for each productitem in the shoppingbasket the available items.
>
>     Whenever a shoppingbasket is updated, the 'query' (ie. materialized
> view) is triggered (--> an event on the shoppingbasket cache), the where
> clause is checked (ie. shoppingbasket must be active) and the materialized
> view is updated (ie. item is added to cache or updated and inventory cache
> is checked for abailability of that item)
>
>     Same when inventory is updated, the 'query' (ie.materialized view) is
> triggered (--> an event on inventory cache), each entry of the productitem
> in the materialized view is updated (ie. availbility for that productitem
> is updated in each active shoppingbasket in the view)
>
>     This is a kind of 'automagical' denormalization based on queries known
> at development time and hence deploy time in the grid (ie. a query for
> denormalized views needs to be registered/deployed). Seeing what is
> happening in the market with other tools (Kafka, Samza), this would be a
> very powerful feature for Ignite and would be a huge speed optimisation for
> querying.
>
>
>     --- EXAMPLE 2: streaming from IoT devices ---
>     This would be even a better example, where a materialized view is
> actually a join of two or more streams, each filling up a cache in Ignite.
>
>     Today in ignite it is already possible to have continuous queries to
> get new incoming data transparantly.
>
>     The materialized view would only contain relevant *state* for certain
> things coming from that stream. This is especially for IoT interesting
> because you could keep for example the average or median of the temperature
> of thousands of devices in the materialized view cache while data on the
> stream can be discarded very quickly.
>
>
>     --- SAMZA/KAFKA ---
>     This is how Apache Samza works (
> https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/
> ) and those ideas are also the basis for how the Interactive Queries from
> Kafka works (
> https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/
> ), but the big advandage with Ignite is that everything is already in place
> and you don't need an extra embedded database like RocksDB. All the cons
> they describe could be solved in Ignite out of the box.
>
>
>
>     Regards,
>
>     Sven
>
>
>
>
>     SVEN BEAUPREZ
>
>     L e a d   A r c h i t e c t
>
>
>
>     De Kleetlaan 5, B-1831 Diegem
>
>     www.theglue.com <http://www.theglue.com/>
>
>     On 18/09/2018, 15:10, "Sergi Vladykin" <se...@gmail.com>
> wrote:
>
>         Sven,
>
>         Support of materialized views sounds like a huge project. I would
> not
>         expect it to appear in Ignite soon.
>
>         As far as I see you have problems with data collocation. If you
> can not
>         store the original data in replicated caches, then these views
> will be huge
>         as well and thus must be partitioned with some collocation. So it
> does not
>         change the picture too much actually.
>
>         I would suggest to go for a manual denormalization: put the same
> value with
>         different affinity keys.
>
>         Sergi
>
>         пн, 17 сент. 2018 г. в 14:22, Sven Beauprez <
> sven.beauprez@theglue.com>:
>
>         > Hi Dmitry,
>         >
>         > Yes we can use those solutions in some cases, but not always.
>         >
>         > Replication is indeed the simplest, but sadly enough replicated
> caches are
>         > too much overhead. We have often a minimum of 12 nodes and all
> data must
>         > stay in sync 12x then. We do use it for small caches that don't
> need a lot
>         > of updates.
>         >
>         > We use colocation all over the place. Colocation based on
> affinity keys is
>         > not possible though for distinct data sets with only some very
> specific
>         > relationships with _some other_ dataset, well known before hand.
>         >  (eg. for example -not our exact use case which is more complex-
> items in
>         > a shopping basket with items from product inventory, both are in
> different
>         > caches managed on other nodes and it is not possible to
> denormalize such
>         > that the shopping basket knows the amount of availble items)
>         >
>         >
>         > Regards,
>         >
>         > Sven
>         >
>         >
>         >
>         >
>         > SVEN BEAUPREZ
>         >
>         > L e a d   A r c h i t e c t
>         >
>         >
>         >
>         > De Kleetlaan 5, B-1831 Diegem
>         >
>         > www.theglue.com <http://www.theglue.com/>
>         >
>         > On 17/09/2018, 10:37, "Dmitriy Setrakyan" <ds...@apache.org>
> wrote:
>         >
>         >     Hi Sven,
>         >
>         >     I will let others comment on the materialized view
> suggestion, but I
>         > have
>         >     another question.
>         >
>         >     *As we all know, joins are a nightmare in a distributed
> system*
>         >
>         >
>         >     Have you considered collocation or replication? If a table is
>         > replicated,
>         >     then it will be present on all the nodes and all joins will
> be fast.
>         > If two
>         >     partitioned tables are colocated based on some affinity key,
> then
>         > joins on
>         >     that affinity key will be fast as well.
>         >
>         >     Both, colocation and replication are supported by Ignite.
> Will any of
>         > these
>         >     approaches work for you?
>         >
>         >     D.
>         >
>         >     On Mon, Sep 17, 2018 at 11:04 AM Sven Beauprez <
>         > sven.beauprez@theglue.com>
>         >     wrote:
>         >
>         >     > All,
>         >     >
>         >     >
>         >     >
>         >     > We are in a situation where we have to query data over
> several
>         > caches. As
>         >     > we all know, joins are a nightmare in a distributed system
> and I
>         > know there
>         >     > are other means like denormalisation, but it is not
> sufficient
>         > anymore in
>         >     > some cases we have and we need the joins.
>         >     >
>         >     >
>         >     >
>         >     > We mainly work in an OLTP context, where queries are known
> in
>         > advance (ie
>         >     > dev time) and inpsired by following blog of several years
> ago, I was
>         >     > wondering if the concept of “materialized views” could
> make it into
>         > Apache
>         >     > Ignite.
>         >     >
>         >     > (
>         >     >
>         >
> https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/
>         >     > )
>         >     >
>         >     >
>         >     >
>         >     > It would work as follows:
>         >     >
>         >     >    - A query must register itself in Ignite at startup
> time (eg. via
>         >     >    configuration) or during run time (eg. API call)
>         >     >    - The registered query is parsed and a new “view” cache
> is created
>         >     >    which will ‘cache’ the resultset of the query (could
> take a
>         > while, but
>         >     >    intermediate status can be “warming up” and “hot” when
> ready)
>         >     >    - All caches involved in the joins are now monitored
> for CUD
>         >     >    operations and relevant data is stored in the new
> “view” cache so
>         > the view
>         >     >    gets updated in real time
>         >     >    - All operations must be ACID compliant
>         >     >    - The view is queried via a very trivial select
> statement
>         >     >
>         >     >
>         >     >
>         >     > Would that be feasible as a new feature?
>         >     >
>         >     >
>         >     >
>         >     >
>         >     >
>         >     > Regards,
>         >     >
>         >     >
>         >     >
>         >     > Sven
>         >     >
>         >     >
>         >     >
>         >     >
>         >     >
>         >     >
>         >     >
>         >     > [image: cid:image001.png@01D3007B.4D007790]
>         >     >
>         >     >
>         >     >
>         >     > SVEN BEAUPREZ
>         >     >
>         >     >
>         >     >
>         >     > L e a d   A r c h i t e c t
>         >     >
>         >     >
>         >     >
>         >     > De Kleetlaan 5, B-1831 Diegem
>         >     >
>         >     > www.theglue.com
>         >     >
>         >
>         >
>         >
>
>
>
>
>

Re: Using materialised views for queries with joins

Posted by Sven Beauprez <sv...@theglue.com>.

Small correction on in my second example about IoT: I meant moving average of course because stream is endless. An example with a sum was probably better __

Regards,

Sven


On 20/09/2018, 16:55, "Sven Beauprez" <sv...@theglue.com> wrote:

    Hi Sergi,
    
    Thanks for your thoughts on this. 
    
    I think everything is already available in Ignite to react upon cache operations and what needs to be done is to check if that operation should also change the resultset of a query. I think this is also how "Continuous Queries" work in Ignite.
    What I see as a hurdle though, is the ACID compliance (ie. inventory is updated within a transaction and the materialized view 'cache' is updated at the same time or maybe eventual consistent with a step in between by eg writing to a WAL or a temp cache)
    
    Let me elaborate a little bit on my example, also giving an example with streaming of IoT data and refer to how Kafka uses this idea (they call it Interactive Queries):
    
    
    --- EXAMPLE 1: simple shopping basket---
    (I know I could do some manual denormalisation in the example below by putting inventory in shopping basket cache, but we have a much more challenging use case that would be too hard to describe in a mail, so if you could follow the reasoning and imagine that manual denormalisation is not possible, thinking CQRS would also help)
    
    Coming back to the shopping basket example, where you would like to show the end user the amount of items still available, the resultset of the materialized view is not that big. The join would be between 'shoppingbasket' and 'inventory' where shoppingbasket is 'active', ie. materialized view cache (on ignite node A) would have the fields
    
    shoppingbasketID | productItem | amount | inventoryAvailability
    
    The shoppingbasket cache (on ignite node X) contains all active and inactive (finalized, discarded,... whatever reason) shoppingbaskets and the inventory cache (on ignite node Y) all product items with the amount still available. 
    
    The materialized view would then only contain active shoppingbaskets with for each productitem in the shoppingbasket the available items.
    
    Whenever a shoppingbasket is updated, the 'query' (ie. materialized view) is triggered (--> an event on the shoppingbasket cache), the where clause is checked (ie. shoppingbasket must be active) and the materialized view is updated (ie. item is added to cache or updated and inventory cache is checked for abailability of that item)
    
    Same when inventory is updated, the 'query' (ie.materialized view) is triggered (--> an event on inventory cache), each entry of the productitem in the materialized view is updated (ie. availbility for that productitem is updated in each active shoppingbasket in the view)
    
    This is a kind of 'automagical' denormalization based on queries known at development time and hence deploy time in the grid (ie. a query for denormalized views needs to be registered/deployed). Seeing what is happening in the market with other tools (Kafka, Samza), this would be a very powerful feature for Ignite and would be a huge speed optimisation for querying.
    
    
    --- EXAMPLE 2: streaming from IoT devices ---
    This would be even a better example, where a materialized view is actually a join of two or more streams, each filling up a cache in Ignite. 
    
    Today in ignite it is already possible to have continuous queries to get new incoming data transparantly.
    
    The materialized view would only contain relevant *state* for certain things coming from that stream. This is especially for IoT interesting because you could keep for example the average or median of the temperature of thousands of devices in the materialized view cache while data on the stream can be discarded very quickly.
    
    
    --- SAMZA/KAFKA ---
    This is how Apache Samza works (https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/ ) and those ideas are also the basis for how the Interactive Queries from Kafka works (https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/ ), but the big advandage with Ignite is that everything is already in place and you don't need an extra embedded database like RocksDB. All the cons they describe could be solved in Ignite out of the box.
    
    
    
    Regards,
    
    Sven
    
    
    
     
    SVEN BEAUPREZ 
     
    L e a d   A r c h i t e c t
    
     
    
    De Kleetlaan 5, B-1831 Diegem
    
    www.theglue.com <http://www.theglue.com/>
    
    On 18/09/2018, 15:10, "Sergi Vladykin" <se...@gmail.com> wrote:
    
        Sven,
        
        Support of materialized views sounds like a huge project. I would not
        expect it to appear in Ignite soon.
        
        As far as I see you have problems with data collocation. If you can not
        store the original data in replicated caches, then these views will be huge
        as well and thus must be partitioned with some collocation. So it does not
        change the picture too much actually.
        
        I would suggest to go for a manual denormalization: put the same value with
        different affinity keys.
        
        Sergi
        
        пн, 17 сент. 2018 г. в 14:22, Sven Beauprez <sv...@theglue.com>:
        
        > Hi Dmitry,
        >
        > Yes we can use those solutions in some cases, but not always.
        >
        > Replication is indeed the simplest, but sadly enough replicated caches are
        > too much overhead. We have often a minimum of 12 nodes and all data must
        > stay in sync 12x then. We do use it for small caches that don't need a lot
        > of updates.
        >
        > We use colocation all over the place. Colocation based on affinity keys is
        > not possible though for distinct data sets with only some very specific
        > relationships with _some other_ dataset, well known before hand.
        >  (eg. for example -not our exact use case which is more complex- items in
        > a shopping basket with items from product inventory, both are in different
        > caches managed on other nodes and it is not possible to denormalize such
        > that the shopping basket knows the amount of availble items)
        >
        >
        > Regards,
        >
        > Sven
        >
        >
        >
        >
        > SVEN BEAUPREZ
        >
        > L e a d   A r c h i t e c t
        >
        >
        >
        > De Kleetlaan 5, B-1831 Diegem
        >
        > www.theglue.com <http://www.theglue.com/>
        >
        > On 17/09/2018, 10:37, "Dmitriy Setrakyan" <ds...@apache.org> wrote:
        >
        >     Hi Sven,
        >
        >     I will let others comment on the materialized view suggestion, but I
        > have
        >     another question.
        >
        >     *As we all know, joins are a nightmare in a distributed system*
        >
        >
        >     Have you considered collocation or replication? If a table is
        > replicated,
        >     then it will be present on all the nodes and all joins will be fast.
        > If two
        >     partitioned tables are colocated based on some affinity key, then
        > joins on
        >     that affinity key will be fast as well.
        >
        >     Both, colocation and replication are supported by Ignite. Will any of
        > these
        >     approaches work for you?
        >
        >     D.
        >
        >     On Mon, Sep 17, 2018 at 11:04 AM Sven Beauprez <
        > sven.beauprez@theglue.com>
        >     wrote:
        >
        >     > All,
        >     >
        >     >
        >     >
        >     > We are in a situation where we have to query data over several
        > caches. As
        >     > we all know, joins are a nightmare in a distributed system and I
        > know there
        >     > are other means like denormalisation, but it is not sufficient
        > anymore in
        >     > some cases we have and we need the joins.
        >     >
        >     >
        >     >
        >     > We mainly work in an OLTP context, where queries are known in
        > advance (ie
        >     > dev time) and inpsired by following blog of several years ago, I was
        >     > wondering if the concept of “materialized views” could make it into
        > Apache
        >     > Ignite.
        >     >
        >     > (
        >     >
        > https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/
        >     > )
        >     >
        >     >
        >     >
        >     > It would work as follows:
        >     >
        >     >    - A query must register itself in Ignite at startup time (eg. via
        >     >    configuration) or during run time (eg. API call)
        >     >    - The registered query is parsed and a new “view” cache is created
        >     >    which will ‘cache’ the resultset of the query (could take a
        > while, but
        >     >    intermediate status can be “warming up” and “hot” when ready)
        >     >    - All caches involved in the joins are now monitored for CUD
        >     >    operations and relevant data is stored in the new “view” cache so
        > the view
        >     >    gets updated in real time
        >     >    - All operations must be ACID compliant
        >     >    - The view is queried via a very trivial select statement
        >     >
        >     >
        >     >
        >     > Would that be feasible as a new feature?
        >     >
        >     >
        >     >
        >     >
        >     >
        >     > Regards,
        >     >
        >     >
        >     >
        >     > Sven
        >     >
        >     >
        >     >
        >     >
        >     >
        >     >
        >     >
        >     > [image: cid:image001.png@01D3007B.4D007790]
        >     >
        >     >
        >     >
        >     > SVEN BEAUPREZ
        >     >
        >     >
        >     >
        >     > L e a d   A r c h i t e c t
        >     >
        >     >
        >     >
        >     > De Kleetlaan 5, B-1831 Diegem
        >     >
        >     > www.theglue.com
        >     >
        >
        >
        >

Re: Using materialised views for queries with joins

Posted by Sven Beauprez <sv...@theglue.com>.

Hi Sergi,

Thanks for your thoughts on this. 

I think everything is already available in Ignite to react upon cache operations and what needs to be done is to check if that operation should also change the resultset of a query. I think this is also how "Continuous Queries" work in Ignite.
What I see as a hurdle though, is the ACID compliance (ie. inventory is updated within a transaction and the materialized view 'cache' is updated at the same time or maybe eventual consistent with a step in between by eg writing to a WAL or a temp cache)

Let me elaborate a little bit on my example, also giving an example with streaming of IoT data and refer to how Kafka uses this idea (they call it Interactive Queries):

--- EXAMPLE 1: simple shopping basket---
(I know I could do some manual denormalisation in the example below by putting inventory in shopping basket cache, but we have a much more challenging use case that would be too hard to describe in a mail, so if you could follow the reasoning and imagine that manual denormalisation is not possible, thinking CQRS would also help)

Coming back to the shopping basket example, where you would like to show the end user the amount of items still available, the resultset of the materialized view is not that big. The join would be between 'shoppingbasket' and 'inventory' where shoppingbasket is 'active', ie. materialized view cache (on ignite node A) would have the fields

shoppingbasketID | productItem | amount | inventoryAvailability

The shoppingbasket cache (on ignite node X) contains all active and inactive (finalized, discarded,... whatever reason) shoppingbaskets and the inventory cache (on ignite node Y) all product items with the amount still available. 

The materialized view would then only contain active shoppingbaskets with for each productitem in the shoppingbasket the available items.

Whenever a shoppingbasket is updated, the 'query' (ie. materialized view) is triggered (--> an event on the shoppingbasket cache), the where clause is checked (ie. shoppingbasket must be active) and the materialized view is updated (ie. item is added to cache or updated and inventory cache is checked for abailability of that item)

Same when inventory is updated, the 'query' (ie.materialized view) is triggered (--> an event on inventory cache), each entry of the productitem in the materialized view is updated (ie. availbility for that productitem is updated in each active shoppingbasket in the view)

This is a kind of 'automagical' denormalization based on queries known at development time and hence deploy time in the grid (ie. a query for denormalized views needs to be registered/deployed). Seeing what is happening in the market with other tools (Kafka, Samza), this would be a very powerful feature for Ignite and would be a huge speed optimisation for querying.

--- EXAMPLE 2: streaming from IoT devices ---
This would be even a better example, where a materialized view is actually a join of two or more streams, each filling up a cache in Ignite. 

Today in ignite it is already possible to have continuous queries to get new incoming data transparantly.

The materialized view would only contain relevant *state* for certain things coming from that stream. This is especially for IoT interesting because you could keep for example the average or median of the temperature of thousands of devices in the materialized view cache while data on the stream can be discarded very quickly.

--- SAMZA/KAFKA ---
This is how Apache Samza works (https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/ ) and those ideas are also the basis for how the Interactive Queries from Kafka works (https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/ ), but the big advandage with Ignite is that everything is already in place and you don't need an extra embedded database like RocksDB. All the cons they describe could be solved in Ignite out of the box.

Regards,

Sven

SVEN BEAUPREZ 

L e a d   A r c h i t e c t

De Kleetlaan 5, B-1831 Diegem

www.theglue.com <http://www.theglue.com/>

On 18/09/2018, 15:10, "Sergi Vladykin" <se...@gmail.com> wrote:

    Sven,

    Support of materialized views sounds like a huge project. I would not
    expect it to appear in Ignite soon.

    As far as I see you have problems with data collocation. If you can not
    store the original data in replicated caches, then these views will be huge
    as well and thus must be partitioned with some collocation. So it does not
    change the picture too much actually.

    I would suggest to go for a manual denormalization: put the same value with
    different affinity keys.

    Sergi

    пн, 17 сент. 2018 г. в 14:22, Sven Beauprez <sv...@theglue.com>:

    > Hi Dmitry,
    >
    > Yes we can use those solutions in some cases, but not always.
    >
    > Replication is indeed the simplest, but sadly enough replicated caches are
    > too much overhead. We have often a minimum of 12 nodes and all data must
    > stay in sync 12x then. We do use it for small caches that don't need a lot
    > of updates.
    >
    > We use colocation all over the place. Colocation based on affinity keys is
    > not possible though for distinct data sets with only some very specific
    > relationships with _some other_ dataset, well known before hand.
    >  (eg. for example -not our exact use case which is more complex- items in
    > a shopping basket with items from product inventory, both are in different
    > caches managed on other nodes and it is not possible to denormalize such
    > that the shopping basket knows the amount of availble items)
    >
    >
    > Regards,
    >
    > Sven
    >
    >
    >
    >
    > SVEN BEAUPREZ
    >
    > L e a d   A r c h i t e c t
    >
    >
    >
    > De Kleetlaan 5, B-1831 Diegem
    >
    > www.theglue.com <http://www.theglue.com/>
    >
    > On 17/09/2018, 10:37, "Dmitriy Setrakyan" <ds...@apache.org> wrote:
    >
    >     Hi Sven,
    >
    >     I will let others comment on the materialized view suggestion, but I
    > have
    >     another question.
    >
    >     *As we all know, joins are a nightmare in a distributed system*
    >
    >
    >     Have you considered collocation or replication? If a table is
    > replicated,
    >     then it will be present on all the nodes and all joins will be fast.
    > If two
    >     partitioned tables are colocated based on some affinity key, then
    > joins on
    >     that affinity key will be fast as well.
    >
    >     Both, colocation and replication are supported by Ignite. Will any of
    > these
    >     approaches work for you?
    >
    >     D.
    >
    >     On Mon, Sep 17, 2018 at 11:04 AM Sven Beauprez <
    > sven.beauprez@theglue.com>
    >     wrote:
    >
    >     > All,
    >     >
    >     >
    >     >
    >     > We are in a situation where we have to query data over several
    > caches. As
    >     > we all know, joins are a nightmare in a distributed system and I
    > know there
    >     > are other means like denormalisation, but it is not sufficient
    > anymore in
    >     > some cases we have and we need the joins.
    >     >
    >     >
    >     >
    >     > We mainly work in an OLTP context, where queries are known in
    > advance (ie
    >     > dev time) and inpsired by following blog of several years ago, I was
    >     > wondering if the concept of “materialized views” could make it into
    > Apache
    >     > Ignite.
    >     >
    >     > (
    >     >
    > https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/
    >     > )
    >     >
    >     >
    >     >
    >     > It would work as follows:
    >     >
    >     >    - A query must register itself in Ignite at startup time (eg. via
    >     >    configuration) or during run time (eg. API call)
    >     >    - The registered query is parsed and a new “view” cache is created
    >     >    which will ‘cache’ the resultset of the query (could take a
    > while, but
    >     >    intermediate status can be “warming up” and “hot” when ready)
    >     >    - All caches involved in the joins are now monitored for CUD
    >     >    operations and relevant data is stored in the new “view” cache so
    > the view
    >     >    gets updated in real time
    >     >    - All operations must be ACID compliant
    >     >    - The view is queried via a very trivial select statement
    >     >
    >     >
    >     >
    >     > Would that be feasible as a new feature?
    >     >
    >     >
    >     >
    >     >
    >     >
    >     > Regards,
    >     >
    >     >
    >     >
    >     > Sven
    >     >
    >     >
    >     >
    >     >
    >     >
    >     >
    >     >
    >     > [image: cid:image001.png@01D3007B.4D007790]
    >     >
    >     >
    >     >
    >     > SVEN BEAUPREZ
    >     >
    >     >
    >     >
    >     > L e a d   A r c h i t e c t
    >     >
    >     >
    >     >
    >     > De Kleetlaan 5, B-1831 Diegem
    >     >
    >     > www.theglue.com
    >     >
    >
    >
    >

Re: Using materialised views for queries with joins

Posted by Sergi Vladykin <se...@gmail.com>.

Sven,

Support of materialized views sounds like a huge project. I would not
expect it to appear in Ignite soon.

As far as I see you have problems with data collocation. If you can not
store the original data in replicated caches, then these views will be huge
as well and thus must be partitioned with some collocation. So it does not
change the picture too much actually.

I would suggest to go for a manual denormalization: put the same value with
different affinity keys.

Sergi

пн, 17 сент. 2018 г. в 14:22, Sven Beauprez <sv...@theglue.com>:

> Hi Dmitry,
>
> Yes we can use those solutions in some cases, but not always.
>
> Replication is indeed the simplest, but sadly enough replicated caches are
> too much overhead. We have often a minimum of 12 nodes and all data must
> stay in sync 12x then. We do use it for small caches that don't need a lot
> of updates.
>
> We use colocation all over the place. Colocation based on affinity keys is
> not possible though for distinct data sets with only some very specific
> relationships with _some other_ dataset, well known before hand.
>  (eg. for example -not our exact use case which is more complex- items in
> a shopping basket with items from product inventory, both are in different
> caches managed on other nodes and it is not possible to denormalize such
> that the shopping basket knows the amount of availble items)
>
>
> Regards,
>
> Sven
>
>
>
>
> SVEN BEAUPREZ
>
> L e a d   A r c h i t e c t
>
>
>
> De Kleetlaan 5, B-1831 Diegem
>
> www.theglue.com <http://www.theglue.com/>
>
> On 17/09/2018, 10:37, "Dmitriy Setrakyan" <ds...@apache.org> wrote:
>
>     Hi Sven,
>
>     I will let others comment on the materialized view suggestion, but I
> have
>     another question.
>
>     *As we all know, joins are a nightmare in a distributed system*
>
>
>     Have you considered collocation or replication? If a table is
> replicated,
>     then it will be present on all the nodes and all joins will be fast.
> If two
>     partitioned tables are colocated based on some affinity key, then
> joins on
>     that affinity key will be fast as well.
>
>     Both, colocation and replication are supported by Ignite. Will any of
> these
>     approaches work for you?
>
>     D.
>
>     On Mon, Sep 17, 2018 at 11:04 AM Sven Beauprez <
> sven.beauprez@theglue.com>
>     wrote:
>
>     > All,
>     >
>     >
>     >
>     > We are in a situation where we have to query data over several
> caches. As
>     > we all know, joins are a nightmare in a distributed system and I
> know there
>     > are other means like denormalisation, but it is not sufficient
> anymore in
>     > some cases we have and we need the joins.
>     >
>     >
>     >
>     > We mainly work in an OLTP context, where queries are known in
> advance (ie
>     > dev time) and inpsired by following blog of several years ago, I was
>     > wondering if the concept of “materialized views” could make it into
> Apache
>     > Ignite.
>     >
>     > (
>     >
> https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/
>     > )
>     >
>     >
>     >
>     > It would work as follows:
>     >
>     >    - A query must register itself in Ignite at startup time (eg. via
>     >    configuration) or during run time (eg. API call)
>     >    - The registered query is parsed and a new “view” cache is created
>     >    which will ‘cache’ the resultset of the query (could take a
> while, but
>     >    intermediate status can be “warming up” and “hot” when ready)
>     >    - All caches involved in the joins are now monitored for CUD
>     >    operations and relevant data is stored in the new “view” cache so
> the view
>     >    gets updated in real time
>     >    - All operations must be ACID compliant
>     >    - The view is queried via a very trivial select statement
>     >
>     >
>     >
>     > Would that be feasible as a new feature?
>     >
>     >
>     >
>     >
>     >
>     > Regards,
>     >
>     >
>     >
>     > Sven
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     > [image: cid:image001.png@01D3007B.4D007790]
>     >
>     >
>     >
>     > SVEN BEAUPREZ
>     >
>     >
>     >
>     > L e a d   A r c h i t e c t
>     >
>     >
>     >
>     > De Kleetlaan 5, B-1831 Diegem
>     >
>     > www.theglue.com
>     >
>
>
>

Re: Using materialised views for queries with joins

Posted by Sven Beauprez <sv...@theglue.com>.

Hi Dmitry,

Yes we can use those solutions in some cases, but not always. 

Replication is indeed the simplest, but sadly enough replicated caches are too much overhead. We have often a minimum of 12 nodes and all data must stay in sync 12x then. We do use it for small caches that don't need a lot of updates.

We use colocation all over the place. Colocation based on affinity keys is not possible though for distinct data sets with only some very specific relationships with _some other_ dataset, well known before hand.
 (eg. for example -not our exact use case which is more complex- items in a shopping basket with items from product inventory, both are in different caches managed on other nodes and it is not possible to denormalize such that the shopping basket knows the amount of availble items)


Regards,

Sven



 
SVEN BEAUPREZ 
 
L e a d   A r c h i t e c t

 

De Kleetlaan 5, B-1831 Diegem

www.theglue.com <http://www.theglue.com/>

On 17/09/2018, 10:37, "Dmitriy Setrakyan" <ds...@apache.org> wrote:

    Hi Sven,
    
    I will let others comment on the materialized view suggestion, but I have
    another question.
    
    *As we all know, joins are a nightmare in a distributed system*
    
    
    Have you considered collocation or replication? If a table is replicated,
    then it will be present on all the nodes and all joins will be fast. If two
    partitioned tables are colocated based on some affinity key, then joins on
    that affinity key will be fast as well.
    
    Both, colocation and replication are supported by Ignite. Will any of these
    approaches work for you?
    
    D.
    
    On Mon, Sep 17, 2018 at 11:04 AM Sven Beauprez <sv...@theglue.com>
    wrote:
    
    > All,
    >
    >
    >
    > We are in a situation where we have to query data over several caches. As
    > we all know, joins are a nightmare in a distributed system and I know there
    > are other means like denormalisation, but it is not sufficient anymore in
    > some cases we have and we need the joins.
    >
    >
    >
    > We mainly work in an OLTP context, where queries are known in advance (ie
    > dev time) and inpsired by following blog of several years ago, I was
    > wondering if the concept of “materialized views” could make it into Apache
    > Ignite.
    >
    > (
    > https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/
    > )
    >
    >
    >
    > It would work as follows:
    >
    >    - A query must register itself in Ignite at startup time (eg. via
    >    configuration) or during run time (eg. API call)
    >    - The registered query is parsed and a new “view” cache is created
    >    which will ‘cache’ the resultset of the query (could take a while, but
    >    intermediate status can be “warming up” and “hot” when ready)
    >    - All caches involved in the joins are now monitored for CUD
    >    operations and relevant data is stored in the new “view” cache so the view
    >    gets updated in real time
    >    - All operations must be ACID compliant
    >    - The view is queried via a very trivial select statement
    >
    >
    >
    > Would that be feasible as a new feature?
    >
    >
    >
    >
    >
    > Regards,
    >
    >
    >
    > Sven
    >
    >
    >
    >
    >
    >
    >
    > [image: cid:image001.png@01D3007B.4D007790]
    >
    >
    >
    > SVEN BEAUPREZ
    >
    >
    >
    > L e a d   A r c h i t e c t
    >
    >
    >
    > De Kleetlaan 5, B-1831 Diegem
    >
    > www.theglue.com
    >

Re: Using materialised views for queries with joins

Posted by Dmitriy Setrakyan <ds...@apache.org>.

Hi Sven,

I will let others comment on the materialized view suggestion, but I have
another question.

*As we all know, joins are a nightmare in a distributed system*


Have you considered collocation or replication? If a table is replicated,
then it will be present on all the nodes and all joins will be fast. If two
partitioned tables are colocated based on some affinity key, then joins on
that affinity key will be fast as well.

Both, colocation and replication are supported by Ignite. Will any of these
approaches work for you?

D.

On Mon, Sep 17, 2018 at 11:04 AM Sven Beauprez <sv...@theglue.com>
wrote:

> All,
>
>
>
> We are in a situation where we have to query data over several caches. As
> we all know, joins are a nightmare in a distributed system and I know there
> are other means like denormalisation, but it is not sufficient anymore in
> some cases we have and we need the joins.
>
>
>
> We mainly work in an OLTP context, where queries are known in advance (ie
> dev time) and inpsired by following blog of several years ago, I was
> wondering if the concept of “materialized views” could make it into Apache
> Ignite.
>
> (
> https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/
> )
>
>
>
> It would work as follows:
>
>    - A query must register itself in Ignite at startup time (eg. via
>    configuration) or during run time (eg. API call)
>    - The registered query is parsed and a new “view” cache is created
>    which will ‘cache’ the resultset of the query (could take a while, but
>    intermediate status can be “warming up” and “hot” when ready)
>    - All caches involved in the joins are now monitored for CUD
>    operations and relevant data is stored in the new “view” cache so the view
>    gets updated in real time
>    - All operations must be ACID compliant
>    - The view is queried via a very trivial select statement
>
>
>
> Would that be feasible as a new feature?
>
>
>
>
>
> Regards,
>
>
>
> Sven
>
>
>
>
>
>
>
> [image: cid:image001.png@01D3007B.4D007790]
>
>
>
> SVEN BEAUPREZ
>
>
>
> L e a d   A r c h i t e c t
>
>
>
> De Kleetlaan 5, B-1831 Diegem
>
> www.theglue.com
>