You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Matt <dr...@gmail.com> on 2017/05/26 07:51:16 UTC

Correct Way to Store Data

Hello,

Right now I have a couple of caches associated with the kind of objects I
store. For instance I have one cache for products, one for sales, one for
stats, etc. I use the id of the product as the affinity key in all cases.

Some questions I have regarding this approach...

*1.* I get the impression I'm not doing it "the Ignite way", since I'm only
storing one kind of object (ie, objects of only one class) in each cache.
The approach I'm using is equivalent to having a PostgreSQL schema for
products, another one for sales and a third for stats. Is that right?

*2.* I believe it would make more sense to have only one cache (for
instance, "analytics") and save all objects there (products, sales and
stats). That would be equivalent to having one single scheme and inside it
one table for each class I store. Right?

*3.* Is there any problem in terms of performance or is it a bad practice
to have one cache with all products and one cache per product with all
related objects to that particular product? I think some queries would run
much faster that way since all objects in a certain cache are related to
the same product, there is no need to filter by sales or stats with a
certain product id.

*4.* What's the best approach or which one is more commonly used?

As a side note, in all 3 cases I'll use as the affinity key the id of the
product, except for the "products" cache in #3, which would be stored in a
single node. Also, right now I'm storing about 10k products but that number
increases as clients arrives, so I expect the cardinality to increase
rapidly.

Cheers,
Matt

Re: Correct Way to Store Data

Posted by Dmitry Pavlov <dp...@gmail.com>.
Hi,

Of cause Ignite cache is more powerful than just one relational table.
Cache is object oriented storage and can store a lot of data tables in one
cache record.

Selecting how many caches to use is design question: - one per each table,
- one per some business object (root for several tables), - some mix of
business object types to be placed in one cache – all objects to be in one
cache.

There is one more aspect to be considered at system design. It is data
collocation. Ignite is distributed system. And important thing is to reduce
number of requests to different nodes for most used requests/updates. There
is opportunity to collocate data at one node. Sometimes it is also possible
to use replicated caches for rarely updated parameters (dictionaries).

Important thing need to be remarked: Ignite uses H2 for query parsing, but
not for data in memory. Cache data is stored in faster and scalable manner
than in relational table.

Best Regards,
Dmitriy Pavlov

пн, 29 мая 2017 г. в 12:06, rick_tem <rv...@temenos.com>:

> As well, for companies that have a large number of tables (as does ours),
> using a cache per table would probably not be ideal.
>
> Best,
> Rick
>
>
>
> --
> View this message in context:
> http://apache-ignite-users.70518.x6.nabble.com/Correct-Way-to-Store-Data-tp13163p13187.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>

Re: Correct Way to Store Data

Posted by rick_tem <rv...@temenos.com>.
As well, for companies that have a large number of tables (as does ours),
using a cache per table would probably not be ideal.

Best,
Rick



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Correct-Way-to-Store-Data-tp13163p13187.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Correct Way to Store Data

Posted by Matt <dr...@gmail.com>.
Err, I meant "[...] a different memory policy for different classes", not
for "different products".

On Fri, May 26, 2017 at 6:00 PM, Matt <dr...@gmail.com> wrote:

> I don't think that's correct.
>
> As far as I know, on Ignite it's fine to put more than one type on the
> same cache, because a cache is like a schema (in the relational db world)
> and not a table. So for each type on a cache, a different table on H2 is
> created. There's no need for additional logic to fetch different types from
> the same cache, because internally they live in a different and independent
> table each.
>
> If you save an object of class Foo and another one of class Bar inside
> cache MyCache, they would reside in "MyCache"."Foo" and "MyCache"."Bar"
> respectively.
>
> That's why a model like #2 may make more sense than #1. However, I agree
> with you that #2 would make it impossible to specify a different memory
> policy for different products, but that is not required in this case anyway.
>
> Matt
>
> On Fri, May 26, 2017 at 4:34 PM, Dmitry Pavlov <dp...@gmail.com>
> wrote:
>
>> Hi Matt,
>>
>>
>>
>> Ignite cache more or less corresponds to table from relational world.
>>
>>
>>
>> As for caches number: Both ways are possible. In relational world, by the
>> way, you also can place different business objects into one table, but you
>> will have to introduce additional type field.
>>
>>
>>
>> Similar for the cache, you can place different values into the same
>> cache, but it is on your own to provide additional logic to separate what
>> type of object was selected.
>>
>>
>>
>> Known benefit of having 1 cache to 1 business object type: you can do
>> fine grained tuning of cache quotes (memory policies), and other cache
>> parameters separately for each business object type.
>>
>>
>>
>> Hope this helps.
>>
>>
>>
>> Sincerely,
>>
>> Dmitriy Pavlov
>>
>>
>> пт, 26 мая 2017 г. в 22:03, Matt <dr...@gmail.com>:
>>
>>> Interesting, so #3 is not the way to go.
>>>
>>> What about #2? That would be the "relational database way of doing it",
>>> which is what Ignite uses behind the scene (H2). What's the disadvantage
>>> compared to #1?
>>>
>>> Thanks for sharing your insight.
>>>
>>> On Fri, May 26, 2017 at 11:28 AM, Ilya Lantukh <il...@gridgain.com>
>>> wrote:
>>>
>>>> Hi Matt,
>>>>
>>>> From what I've seen, the most commonly used approach is the one you
>>>> took: have caches associated with object classes. This approach is
>>>> efficient and completely corresponds to "the Ignite way".
>>>>
>>>> Having a separate cache for each product is definitely not a good idea,
>>>> especially if you have thousands of products and that number is going to
>>>> increase rapidly. Every cache requires additional memory to store it's
>>>> internal data structures. In addition, you will have to perform dynamic
>>>> cache start when a new product is added, which is a relatively expensive
>>>> operation and causes grid to pause all other operations for some time.
>>>>
>>>> Hope this helps.
>>>>
>>>>
>>>> On Fri, May 26, 2017 at 10:51 AM, Matt <dr...@gmail.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Right now I have a couple of caches associated with the kind of
>>>>> objects I store. For instance I have one cache for products, one for sales,
>>>>> one for stats, etc. I use the id of the product as the affinity key in all
>>>>> cases.
>>>>>
>>>>> Some questions I have regarding this approach...
>>>>>
>>>>> *1.* I get the impression I'm not doing it "the Ignite way", since
>>>>> I'm only storing one kind of object (ie, objects of only one class) in each
>>>>> cache. The approach I'm using is equivalent to having a PostgreSQL schema
>>>>> for products, another one for sales and a third for stats. Is that right?
>>>>>
>>>>> *2.* I believe it would make more sense to have only one cache (for
>>>>> instance, "analytics") and save all objects there (products, sales and
>>>>> stats). That would be equivalent to having one single scheme and inside it
>>>>> one table for each class I store. Right?
>>>>>
>>>>> *3.* Is there any problem in terms of performance or is it a bad
>>>>> practice to have one cache with all products and one cache per product with
>>>>> all related objects to that particular product? I think some queries would
>>>>> run much faster that way since all objects in a certain cache are related
>>>>> to the same product, there is no need to filter by sales or stats with a
>>>>> certain product id.
>>>>>
>>>>> *4.* What's the best approach or which one is more commonly used?
>>>>>
>>>>> As a side note, in all 3 cases I'll use as the affinity key the id of
>>>>> the product, except for the "products" cache in #3, which would be stored
>>>>> in a single node. Also, right now I'm storing about 10k products but that
>>>>> number increases as clients arrives, so I expect the cardinality to
>>>>> increase rapidly.
>>>>>
>>>>> Cheers,
>>>>> Matt
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Ilya
>>>>
>>>
>>>
>

Re: Correct Way to Store Data

Posted by Matt <dr...@gmail.com>.
I don't think that's correct.

As far as I know, on Ignite it's fine to put more than one type on the same
cache, because a cache is like a schema (in the relational db world) and
not a table. So for each type on a cache, a different table on H2 is
created. There's no need for additional logic to fetch different types from
the same cache, because internally they live in a different and independent
table each.

If you save an object of class Foo and another one of class Bar inside
cache MyCache, they would reside in "MyCache"."Foo" and "MyCache"."Bar"
respectively.

That's why a model like #2 may make more sense than #1. However, I agree
with you that #2 would make it impossible to specify a different memory
policy for different products, but that is not required in this case anyway.

Matt

On Fri, May 26, 2017 at 4:34 PM, Dmitry Pavlov <dp...@gmail.com>
wrote:

> Hi Matt,
>
>
>
> Ignite cache more or less corresponds to table from relational world.
>
>
>
> As for caches number: Both ways are possible. In relational world, by the
> way, you also can place different business objects into one table, but you
> will have to introduce additional type field.
>
>
>
> Similar for the cache, you can place different values into the same cache,
> but it is on your own to provide additional logic to separate what type of
> object was selected.
>
>
>
> Known benefit of having 1 cache to 1 business object type: you can do fine
> grained tuning of cache quotes (memory policies), and other cache
> parameters separately for each business object type.
>
>
>
> Hope this helps.
>
>
>
> Sincerely,
>
> Dmitriy Pavlov
>
>
> пт, 26 мая 2017 г. в 22:03, Matt <dr...@gmail.com>:
>
>> Interesting, so #3 is not the way to go.
>>
>> What about #2? That would be the "relational database way of doing it",
>> which is what Ignite uses behind the scene (H2). What's the disadvantage
>> compared to #1?
>>
>> Thanks for sharing your insight.
>>
>> On Fri, May 26, 2017 at 11:28 AM, Ilya Lantukh <il...@gridgain.com>
>> wrote:
>>
>>> Hi Matt,
>>>
>>> From what I've seen, the most commonly used approach is the one you
>>> took: have caches associated with object classes. This approach is
>>> efficient and completely corresponds to "the Ignite way".
>>>
>>> Having a separate cache for each product is definitely not a good idea,
>>> especially if you have thousands of products and that number is going to
>>> increase rapidly. Every cache requires additional memory to store it's
>>> internal data structures. In addition, you will have to perform dynamic
>>> cache start when a new product is added, which is a relatively expensive
>>> operation and causes grid to pause all other operations for some time.
>>>
>>> Hope this helps.
>>>
>>>
>>> On Fri, May 26, 2017 at 10:51 AM, Matt <dr...@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> Right now I have a couple of caches associated with the kind of objects
>>>> I store. For instance I have one cache for products, one for sales, one for
>>>> stats, etc. I use the id of the product as the affinity key in all cases.
>>>>
>>>> Some questions I have regarding this approach...
>>>>
>>>> *1.* I get the impression I'm not doing it "the Ignite way", since I'm
>>>> only storing one kind of object (ie, objects of only one class) in each
>>>> cache. The approach I'm using is equivalent to having a PostgreSQL schema
>>>> for products, another one for sales and a third for stats. Is that right?
>>>>
>>>> *2.* I believe it would make more sense to have only one cache (for
>>>> instance, "analytics") and save all objects there (products, sales and
>>>> stats). That would be equivalent to having one single scheme and inside it
>>>> one table for each class I store. Right?
>>>>
>>>> *3.* Is there any problem in terms of performance or is it a bad
>>>> practice to have one cache with all products and one cache per product with
>>>> all related objects to that particular product? I think some queries would
>>>> run much faster that way since all objects in a certain cache are related
>>>> to the same product, there is no need to filter by sales or stats with a
>>>> certain product id.
>>>>
>>>> *4.* What's the best approach or which one is more commonly used?
>>>>
>>>> As a side note, in all 3 cases I'll use as the affinity key the id of
>>>> the product, except for the "products" cache in #3, which would be stored
>>>> in a single node. Also, right now I'm storing about 10k products but that
>>>> number increases as clients arrives, so I expect the cardinality to
>>>> increase rapidly.
>>>>
>>>> Cheers,
>>>> Matt
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Ilya
>>>
>>
>>

Re: Correct Way to Store Data

Posted by Dmitry Pavlov <dp...@gmail.com>.
Hi Matt,



Ignite cache more or less corresponds to table from relational world.



As for caches number: Both ways are possible. In relational world, by the
way, you also can place different business objects into one table, but you
will have to introduce additional type field.



Similar for the cache, you can place different values into the same cache,
but it is on your own to provide additional logic to separate what type of
object was selected.



Known benefit of having 1 cache to 1 business object type: you can do fine
grained tuning of cache quotes (memory policies), and other cache
parameters separately for each business object type.



Hope this helps.



Sincerely,

Dmitriy Pavlov


пт, 26 мая 2017 г. в 22:03, Matt <dr...@gmail.com>:

> Interesting, so #3 is not the way to go.
>
> What about #2? That would be the "relational database way of doing it",
> which is what Ignite uses behind the scene (H2). What's the disadvantage
> compared to #1?
>
> Thanks for sharing your insight.
>
> On Fri, May 26, 2017 at 11:28 AM, Ilya Lantukh <il...@gridgain.com>
> wrote:
>
>> Hi Matt,
>>
>> From what I've seen, the most commonly used approach is the one you took:
>> have caches associated with object classes. This approach is efficient and
>> completely corresponds to "the Ignite way".
>>
>> Having a separate cache for each product is definitely not a good idea,
>> especially if you have thousands of products and that number is going to
>> increase rapidly. Every cache requires additional memory to store it's
>> internal data structures. In addition, you will have to perform dynamic
>> cache start when a new product is added, which is a relatively expensive
>> operation and causes grid to pause all other operations for some time.
>>
>> Hope this helps.
>>
>>
>> On Fri, May 26, 2017 at 10:51 AM, Matt <dr...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> Right now I have a couple of caches associated with the kind of objects
>>> I store. For instance I have one cache for products, one for sales, one for
>>> stats, etc. I use the id of the product as the affinity key in all cases.
>>>
>>> Some questions I have regarding this approach...
>>>
>>> *1.* I get the impression I'm not doing it "the Ignite way", since I'm
>>> only storing one kind of object (ie, objects of only one class) in each
>>> cache. The approach I'm using is equivalent to having a PostgreSQL schema
>>> for products, another one for sales and a third for stats. Is that right?
>>>
>>> *2.* I believe it would make more sense to have only one cache (for
>>> instance, "analytics") and save all objects there (products, sales and
>>> stats). That would be equivalent to having one single scheme and inside it
>>> one table for each class I store. Right?
>>>
>>> *3.* Is there any problem in terms of performance or is it a bad
>>> practice to have one cache with all products and one cache per product with
>>> all related objects to that particular product? I think some queries would
>>> run much faster that way since all objects in a certain cache are related
>>> to the same product, there is no need to filter by sales or stats with a
>>> certain product id.
>>>
>>> *4.* What's the best approach or which one is more commonly used?
>>>
>>> As a side note, in all 3 cases I'll use as the affinity key the id of
>>> the product, except for the "products" cache in #3, which would be stored
>>> in a single node. Also, right now I'm storing about 10k products but that
>>> number increases as clients arrives, so I expect the cardinality to
>>> increase rapidly.
>>>
>>> Cheers,
>>> Matt
>>>
>>
>>
>>
>> --
>> Best regards,
>> Ilya
>>
>
>

Re: Correct Way to Store Data

Posted by Matt <dr...@gmail.com>.
Interesting, so #3 is not the way to go.

What about #2? That would be the "relational database way of doing it",
which is what Ignite uses behind the scene (H2). What's the disadvantage
compared to #1?

Thanks for sharing your insight.

On Fri, May 26, 2017 at 11:28 AM, Ilya Lantukh <il...@gridgain.com>
wrote:

> Hi Matt,
>
> From what I've seen, the most commonly used approach is the one you took:
> have caches associated with object classes. This approach is efficient and
> completely corresponds to "the Ignite way".
>
> Having a separate cache for each product is definitely not a good idea,
> especially if you have thousands of products and that number is going to
> increase rapidly. Every cache requires additional memory to store it's
> internal data structures. In addition, you will have to perform dynamic
> cache start when a new product is added, which is a relatively expensive
> operation and causes grid to pause all other operations for some time.
>
> Hope this helps.
>
>
> On Fri, May 26, 2017 at 10:51 AM, Matt <dr...@gmail.com> wrote:
>
>> Hello,
>>
>> Right now I have a couple of caches associated with the kind of objects I
>> store. For instance I have one cache for products, one for sales, one for
>> stats, etc. I use the id of the product as the affinity key in all cases.
>>
>> Some questions I have regarding this approach...
>>
>> *1.* I get the impression I'm not doing it "the Ignite way", since I'm
>> only storing one kind of object (ie, objects of only one class) in each
>> cache. The approach I'm using is equivalent to having a PostgreSQL schema
>> for products, another one for sales and a third for stats. Is that right?
>>
>> *2.* I believe it would make more sense to have only one cache (for
>> instance, "analytics") and save all objects there (products, sales and
>> stats). That would be equivalent to having one single scheme and inside it
>> one table for each class I store. Right?
>>
>> *3.* Is there any problem in terms of performance or is it a bad
>> practice to have one cache with all products and one cache per product with
>> all related objects to that particular product? I think some queries would
>> run much faster that way since all objects in a certain cache are related
>> to the same product, there is no need to filter by sales or stats with a
>> certain product id.
>>
>> *4.* What's the best approach or which one is more commonly used?
>>
>> As a side note, in all 3 cases I'll use as the affinity key the id of the
>> product, except for the "products" cache in #3, which would be stored in a
>> single node. Also, right now I'm storing about 10k products but that number
>> increases as clients arrives, so I expect the cardinality to increase
>> rapidly.
>>
>> Cheers,
>> Matt
>>
>
>
>
> --
> Best regards,
> Ilya
>

Re: Correct Way to Store Data

Posted by Ilya Lantukh <il...@gridgain.com>.
Hi Matt,

From what I've seen, the most commonly used approach is the one you took:
have caches associated with object classes. This approach is efficient and
completely corresponds to "the Ignite way".

Having a separate cache for each product is definitely not a good idea,
especially if you have thousands of products and that number is going to
increase rapidly. Every cache requires additional memory to store it's
internal data structures. In addition, you will have to perform dynamic
cache start when a new product is added, which is a relatively expensive
operation and causes grid to pause all other operations for some time.

Hope this helps.


On Fri, May 26, 2017 at 10:51 AM, Matt <dr...@gmail.com> wrote:

> Hello,
>
> Right now I have a couple of caches associated with the kind of objects I
> store. For instance I have one cache for products, one for sales, one for
> stats, etc. I use the id of the product as the affinity key in all cases.
>
> Some questions I have regarding this approach...
>
> *1.* I get the impression I'm not doing it "the Ignite way", since I'm
> only storing one kind of object (ie, objects of only one class) in each
> cache. The approach I'm using is equivalent to having a PostgreSQL schema
> for products, another one for sales and a third for stats. Is that right?
>
> *2.* I believe it would make more sense to have only one cache (for
> instance, "analytics") and save all objects there (products, sales and
> stats). That would be equivalent to having one single scheme and inside it
> one table for each class I store. Right?
>
> *3.* Is there any problem in terms of performance or is it a bad practice
> to have one cache with all products and one cache per product with all
> related objects to that particular product? I think some queries would run
> much faster that way since all objects in a certain cache are related to
> the same product, there is no need to filter by sales or stats with a
> certain product id.
>
> *4.* What's the best approach or which one is more commonly used?
>
> As a side note, in all 3 cases I'll use as the affinity key the id of the
> product, except for the "products" cache in #3, which would be stored in a
> single node. Also, right now I'm storing about 10k products but that number
> increases as clients arrives, so I expect the cardinality to increase
> rapidly.
>
> Cheers,
> Matt
>



-- 
Best regards,
Ilya