You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by c c <ye...@gmail.com> on 2019/11/21 07:49:37 UTC

Does ignite suite for large data search without index?

HI,
     We have a table with about 30 million records and 15 fields. We need
implement function that user can filter record by arbitrary 12 fields(
one，two, three...of them) with very low latency. It's difficult to create
index. We think ignite is a grid memory cache and test it with 4 million
records(one node) without creating index. It took about 5 seconds to find a
record match one field filter condition. We have tested just travel a java
List(10 million elements) with 3 filter condition. It took about 0.1
second. We just want to know whether ignite suit this use case? Thanks very
much.

Re: Does ignite suite for large data search without index?

Posted by c c <ye...@gmail.com>.

HI，
We have some filter condition like this:    age >= 18 and level =1 and
gender = 1;  (age >= 18 or level = 2) and hobby = 'music'
If one cache for each column, join the result is complicate.
Is there any way make searching without index fast?

Mikael <mi...@telia.com> 于2019年11月21日周四 下午8:08写道：

> Hi!
>
> One idea would be to have one cache for each column, so the key is name
> and value is the hobby for example, you get an index on the key for "free"
> and create one index on the value.
>
> If the cache does not contain name that person does not have a hobby, only
> names that does have a hobby is in the cache, it would complicate the query
> a bit and you need to ask multiple queries for each column, but updating of
> index is fast as you only need to update one index for each cache if you
> only update a few columns, if you need to update all it will of course
> still need to update the index for all caches, I am not sure if that would
> work for you, it depends on what kind of queries you need.
>
> In theory you could have 15 nodes and have one cache on each node and ask
> queries in parallel.
>
> I am not at all sure it will work well, it's just an idea.
>
> Mikael
>
>
> Den 2019-11-21 kl. 12:17, skrev c c:
>
> yes, we may add more columns in the future. You mean creating index create
> on one column or multiple columns? And some columns value difference are
> not big. So many index is not efficient and will cost a lot of ram and
> decrease update or insert performance(this table may udpate real time). So
> we think just traveling collection in memory is good. And cache is scalable
> will get rid of ram limit and make filter more quick.
>
> Mikael <mi...@telia.com> 于2019年11月21日周四 下午7:06写道：
>
>> Hi!
>>
>> Are the queries limited to something like "select name from ... where
>> hobby=x and location=y..." or you need more complex queries ?
>>
>> If the columns are fixed to 15, I don't see why you could not create 15
>> indices, it would use lots of ram and I don't think it's the best solution
>> either but it should work.
>>
>> Is it fixed to 15 columns ? or will you have to add more columns in the
>> future ?
>>
>> Den 2019-11-21 kl. 10:56, skrev c c:
>>
>> HI, Mikael
>>      Thanks for you reply very much!
>>      The type of data like this:
>>      member [name, location, age, gender, hobby, level, credits, expense
>> ...]
>>      We need filter data by arbitrary fileds combination, so creating
>> index is not of much use. We thought traveling all data in memory works
>> better.
>>      We can keep all data in ram, but data may increase progressisvely,
>> single node is not scalable. So we plan to use a distribute memory cache.
>>      We store data off heap and all in ram with default ignite
>> serialization. We just create table, then populate data with default
>> configuration in ignite, query by sql(one node,  4 million records ).
>>      Is there anyway can improve query performance ?
>>
>> Mikael <mi...@telia.com> 于2019年11月21日周四 下午5:02写道：
>>
>>> Hi!
>>>
>>> The comparison is not of much use, when you talk about ignite, it's not
>>> just to search a list, there is serialization/deserialization and other
>>> things to consider that will make it slower compared to a simple list
>>> search, a linear search on an Ignite cache depends on how you store data
>>> (off heap/on heap, in ram/partially on disk, type of serialization and
>>> so on.
>>>
>>> If you cannot keep all data in ram you are going to need some index to
>>> do a fast lookup, there is no way around it.
>>>
>>> If you can have all the data in ram, why do you need Ignite ? do you
>>> have some other requirements for it that Ignite gives you ? otherwise it
>>> might be simpler to just use a list in ram and go with that ?
>>>
>>> Is memory a limitation (cluster or single node ?) ? if not, could you
>>> explain why is it difficult to create an index on the data ?
>>>
>>> Could you explain what type of data it is ? maybe it is possible to
>>> arrange the data in some other way to improve everything
>>>
>>> Did you test with a single node or a cluster of nodes ? with more nodes
>>> you can improve performance as any search can be split up between the
>>> nodes, still, some kind of index will help a lot.
>>>
>>> Mikael
>>>
>>> Den 2019-11-21 kl. 08:49, skrev c c:
>>> > HI,
>>> >      We have a table with about 30 million records and 15 fields. We
>>> > need implement function that user can filter record by arbitrary 12
>>> > fields( one，two, three...of them) with very low latency. It's
>>> > difficult to create index. We think ignite is a grid memory cache and
>>> > test it with 4 million records(one node) without creating index. It
>>> > took about 5 seconds to find a record match one field filter
>>> > condition. We have tested just travel a java List(10 million elements)
>>> > with 3 filter condition. It took about 0.1 second. We just want to
>>> > know whether ignite suit this use case? Thanks very much.
>>> >
>>>
>>

Re: Does ignite suite for large data search without index?

Posted by Mikael <mi...@telia.com>.

Hi!

One idea would be to have one cache for each column, so the key is name 
and value is the hobby for example, you get an index on the key for 
"free" and create one index on the value.

If the cache does not contain name that person does not have a hobby, 
only names that does have a hobby is in the cache, it would complicate 
the query a bit and you need to ask multiple queries for each column, 
but updating of index is fast as you only need to update one index for 
each cache if you only update a few columns, if you need to update all 
it will of course still need to update the index for all caches, I am 
not sure if that would work for you, it depends on what kind of queries 
you need.

In theory you could have 15 nodes and have one cache on each node and 
ask queries in parallel.

I am not at all sure it will work well, it's just an idea.

Mikael


Den 2019-11-21 kl. 12:17, skrev c c:
> yes, we may add more columns in the future. You mean creating index 
> create on one column or multiple columns? And some columns value 
> difference are not big. So many index is not efficient and will cost a 
> lot of ram and decrease update or insert performance(this table may 
> udpate real time). So we think just traveling collection in memory is 
> good. And cache is scalable will get rid of ram limit and make filter 
> more quick.
>
> Mikael <mikael-aronsson@telia.com <ma...@telia.com>> 
> 于2019年11月21日周四 下午7:06写道：
>
>     Hi!
>
>     Are the queries limited to something like "select name from ...
>     where hobby=x and location=y..." or you need more complex queries ?
>
>     If the columns are fixed to 15, I don't see why you could not
>     create 15 indices, it would use lots of ram and I don't think it's
>     the best solution either but it should work.
>
>     Is it fixed to 15 columns ? or will you have to add more columns
>     in the future ?
>
>     Den 2019-11-21 kl. 10:56, skrev c c:
>
>>     HI,Mikael
>>          Thanks for you reply very much!
>>          The type of data like this:
>>          member [name, location, age, gender, hobby, level, credits,
>>     expense ...]
>>          We need filter data by arbitrary fileds combination, so
>>     creating index is not of much use. We thought traveling all data
>>     in memory works better.
>>          We can keep all data in ram, but data may increase
>>     progressisvely, single node is not scalable. So we plan to use a
>>     distribute memory cache.
>>          We store data off heap and all in ram with default ignite
>>     serialization. We just create table, then populate data with
>>     default configuration in ignite, query by sql(one node,  4
>>     million records ).
>>          Is there anyway can improve query performance ?
>>
>>     Mikael <mikael-aronsson@telia.com
>>     <ma...@telia.com>> 于2019年11月21日周四
>>     下午5:02写道：
>>
>>         Hi!
>>
>>         The comparison is not of much use, when you talk about
>>         ignite, it's not
>>         just to search a list, there is serialization/deserialization
>>         and other
>>         things to consider that will make it slower compared to a
>>         simple list
>>         search, a linear search on an Ignite cache depends on how you
>>         store data
>>         (off heap/on heap, in ram/partially on disk, type of
>>         serialization and
>>         so on.
>>
>>         If you cannot keep all data in ram you are going to need some
>>         index to
>>         do a fast lookup, there is no way around it.
>>
>>         If you can have all the data in ram, why do you need Ignite ?
>>         do you
>>         have some other requirements for it that Ignite gives you ?
>>         otherwise it
>>         might be simpler to just use a list in ram and go with that ?
>>
>>         Is memory a limitation (cluster or single node ?) ? if not,
>>         could you
>>         explain why is it difficult to create an index on the data ?
>>
>>         Could you explain what type of data it is ? maybe it is
>>         possible to
>>         arrange the data in some other way to improve everything
>>
>>         Did you test with a single node or a cluster of nodes ? with
>>         more nodes
>>         you can improve performance as any search can be split up
>>         between the
>>         nodes, still, some kind of index will help a lot.
>>
>>         Mikael
>>
>>         Den 2019-11-21 kl. 08:49, skrev c c:
>>         > HI,
>>         >      We have a table with about 30 million records and 15
>>         fields. We
>>         > need implement function that user can filter record by
>>         arbitrary 12
>>         > fields( one，two, three...of them) with very low latency. It's
>>         > difficult to create index. We think ignite is a grid memory
>>         cache and
>>         > test it with 4 million records(one node) without creating
>>         index. It
>>         > took about 5 seconds to find a record match one field filter
>>         > condition. We have tested just travel a java List(10
>>         million elements)
>>         > with 3 filter condition. It took about 0.1 second. We just
>>         want to
>>         > know whether ignite suit this use case? Thanks very much.
>>         >
>>

Re: Does ignite suite for large data search without index?

Posted by c c <ye...@gmail.com>.

yes, we may add more columns in the future. You mean creating index create
on one column or multiple columns? And some columns value difference are
not big. So many index is not efficient and will cost a lot of ram and
decrease update or insert performance(this table may udpate real time). So
we think just traveling collection in memory is good. And cache is scalable
will get rid of ram limit and make filter more quick.

Mikael <mi...@telia.com> 于2019年11月21日周四 下午7:06写道：

> Hi!
>
> Are the queries limited to something like "select name from ... where
> hobby=x and location=y..." or you need more complex queries ?
>
> If the columns are fixed to 15, I don't see why you could not create 15
> indices, it would use lots of ram and I don't think it's the best solution
> either but it should work.
>
> Is it fixed to 15 columns ? or will you have to add more columns in the
> future ?
>
> Den 2019-11-21 kl. 10:56, skrev c c:
>
> HI, Mikael
>      Thanks for you reply very much!
>      The type of data like this:
>      member [name, location, age, gender, hobby, level, credits, expense
> ...]
>      We need filter data by arbitrary fileds combination, so creating
> index is not of much use. We thought traveling all data in memory works
> better.
>      We can keep all data in ram, but data may increase progressisvely,
> single node is not scalable. So we plan to use a distribute memory cache.
>      We store data off heap and all in ram with default ignite
> serialization. We just create table, then populate data with default
> configuration in ignite, query by sql(one node,  4 million records ).
>      Is there anyway can improve query performance ?
>
> Mikael <mi...@telia.com> 于2019年11月21日周四 下午5:02写道：
>
>> Hi!
>>
>> The comparison is not of much use, when you talk about ignite, it's not
>> just to search a list, there is serialization/deserialization and other
>> things to consider that will make it slower compared to a simple list
>> search, a linear search on an Ignite cache depends on how you store data
>> (off heap/on heap, in ram/partially on disk, type of serialization and
>> so on.
>>
>> If you cannot keep all data in ram you are going to need some index to
>> do a fast lookup, there is no way around it.
>>
>> If you can have all the data in ram, why do you need Ignite ? do you
>> have some other requirements for it that Ignite gives you ? otherwise it
>> might be simpler to just use a list in ram and go with that ?
>>
>> Is memory a limitation (cluster or single node ?) ? if not, could you
>> explain why is it difficult to create an index on the data ?
>>
>> Could you explain what type of data it is ? maybe it is possible to
>> arrange the data in some other way to improve everything
>>
>> Did you test with a single node or a cluster of nodes ? with more nodes
>> you can improve performance as any search can be split up between the
>> nodes, still, some kind of index will help a lot.
>>
>> Mikael
>>
>> Den 2019-11-21 kl. 08:49, skrev c c:
>> > HI,
>> >      We have a table with about 30 million records and 15 fields. We
>> > need implement function that user can filter record by arbitrary 12
>> > fields( one，two, three...of them) with very low latency. It's
>> > difficult to create index. We think ignite is a grid memory cache and
>> > test it with 4 million records(one node) without creating index. It
>> > took about 5 seconds to find a record match one field filter
>> > condition. We have tested just travel a java List(10 million elements)
>> > with 3 filter condition. It took about 0.1 second. We just want to
>> > know whether ignite suit this use case? Thanks very much.
>> >
>>
>

Re: Does ignite suite for large data search without index?

Posted by Mikael <mi...@telia.com>.

Hi!

Are the queries limited to something like "select name from ... where 
hobby=x and location=y..." or you need more complex queries ?

If the columns are fixed to 15, I don't see why you could not create 15 
indices, it would use lots of ram and I don't think it's the best 
solution either but it should work.

Is it fixed to 15 columns ? or will you have to add more columns in the 
future ?

Den 2019-11-21 kl. 10:56, skrev c c:

> HI,Mikael
>      Thanks for you reply very much!
>      The type of data like this:
>      member [name, location, age, gender, hobby, level, credits, 
> expense ...]
>      We need filter data by arbitrary fileds combination, so creating 
> index is not of much use. We thought traveling all data in memory 
> works better.
>      We can keep all data in ram, but data may increase 
> progressisvely, single node is not scalable. So we plan to use a 
> distribute memory cache.
>      We store data off heap and all in ram with default ignite 
> serialization. We just create table, then populate data with default 
> configuration in ignite, query by sql(one node,  4 million records ).
>      Is there anyway can improve query performance ?
>
> Mikael <mikael-aronsson@telia.com <ma...@telia.com>> 
> 于2019年11月21日周四 下午5:02写道：
>
>     Hi!
>
>     The comparison is not of much use, when you talk about ignite,
>     it's not
>     just to search a list, there is serialization/deserialization and
>     other
>     things to consider that will make it slower compared to a simple list
>     search, a linear search on an Ignite cache depends on how you
>     store data
>     (off heap/on heap, in ram/partially on disk, type of serialization
>     and
>     so on.
>
>     If you cannot keep all data in ram you are going to need some
>     index to
>     do a fast lookup, there is no way around it.
>
>     If you can have all the data in ram, why do you need Ignite ? do you
>     have some other requirements for it that Ignite gives you ?
>     otherwise it
>     might be simpler to just use a list in ram and go with that ?
>
>     Is memory a limitation (cluster or single node ?) ? if not, could you
>     explain why is it difficult to create an index on the data ?
>
>     Could you explain what type of data it is ? maybe it is possible to
>     arrange the data in some other way to improve everything
>
>     Did you test with a single node or a cluster of nodes ? with more
>     nodes
>     you can improve performance as any search can be split up between the
>     nodes, still, some kind of index will help a lot.
>
>     Mikael
>
>     Den 2019-11-21 kl. 08:49, skrev c c:
>     > HI,
>     >      We have a table with about 30 million records and 15
>     fields. We
>     > need implement function that user can filter record by arbitrary 12
>     > fields( one，two, three...of them) with very low latency. It's
>     > difficult to create index. We think ignite is a grid memory
>     cache and
>     > test it with 4 million records(one node) without creating index. It
>     > took about 5 seconds to find a record match one field filter
>     > condition. We have tested just travel a java List(10 million
>     elements)
>     > with 3 filter condition. It took about 0.1 second. We just want to
>     > know whether ignite suit this use case? Thanks very much.
>     >
>

Re: Does ignite suite for large data search without index?

Posted by c c <ye...@gmail.com>.

HI, Mikael
     Thanks for you reply very much!
     The type of data like this:
     member [name, location, age, gender, hobby, level, credits, expense
...]
     We need filter data by arbitrary fileds combination, so creating index
is not of much use. We thought traveling all data in memory works better.
     We can keep all data in ram, but data may increase progressisvely,
single node is not scalable. So we plan to use a distribute memory cache.
     We store data off heap and all in ram with default ignite
serialization. We just create table, then populate data with default
configuration in ignite, query by sql(one node,  4 million records ).
     Is there anyway can improve query performance ?

Mikael <mi...@telia.com> 于2019年11月21日周四 下午5:02写道：

> Hi!
>
> The comparison is not of much use, when you talk about ignite, it's not
> just to search a list, there is serialization/deserialization and other
> things to consider that will make it slower compared to a simple list
> search, a linear search on an Ignite cache depends on how you store data
> (off heap/on heap, in ram/partially on disk, type of serialization and
> so on.
>
> If you cannot keep all data in ram you are going to need some index to
> do a fast lookup, there is no way around it.
>
> If you can have all the data in ram, why do you need Ignite ? do you
> have some other requirements for it that Ignite gives you ? otherwise it
> might be simpler to just use a list in ram and go with that ?
>
> Is memory a limitation (cluster or single node ?) ? if not, could you
> explain why is it difficult to create an index on the data ?
>
> Could you explain what type of data it is ? maybe it is possible to
> arrange the data in some other way to improve everything
>
> Did you test with a single node or a cluster of nodes ? with more nodes
> you can improve performance as any search can be split up between the
> nodes, still, some kind of index will help a lot.
>
> Mikael
>
> Den 2019-11-21 kl. 08:49, skrev c c:
> > HI,
> >      We have a table with about 30 million records and 15 fields. We
> > need implement function that user can filter record by arbitrary 12
> > fields( one，two, three...of them) with very low latency. It's
> > difficult to create index. We think ignite is a grid memory cache and
> > test it with 4 million records(one node) without creating index. It
> > took about 5 seconds to find a record match one field filter
> > condition. We have tested just travel a java List(10 million elements)
> > with 3 filter condition. It took about 0.1 second. We just want to
> > know whether ignite suit this use case? Thanks very much.
> >
>

Re: Does ignite suite for large data search without index?

Posted by Mikael <mi...@telia.com>.

Hi!

The comparison is not of much use, when you talk about ignite, it's not 
just to search a list, there is serialization/deserialization and other 
things to consider that will make it slower compared to a simple list 
search, a linear search on an Ignite cache depends on how you store data 
(off heap/on heap, in ram/partially on disk, type of serialization and 
so on.

If you cannot keep all data in ram you are going to need some index to 
do a fast lookup, there is no way around it.

If you can have all the data in ram, why do you need Ignite ? do you 
have some other requirements for it that Ignite gives you ? otherwise it 
might be simpler to just use a list in ram and go with that ?

Is memory a limitation (cluster or single node ?) ? if not, could you 
explain why is it difficult to create an index on the data ?

Could you explain what type of data it is ? maybe it is possible to 
arrange the data in some other way to improve everything

Did you test with a single node or a cluster of nodes ? with more nodes 
you can improve performance as any search can be split up between the 
nodes, still, some kind of index will help a lot.

Mikael

Den 2019-11-21 kl. 08:49, skrev c c:
> HI,
>      We have a table with about 30 million records and 15 fields. We 
> need implement function that user can filter record by arbitrary 12 
> fields( one，two, three...of them) with very low latency. It's 
> difficult to create index. We think ignite is a grid memory cache and 
> test it with 4 million records(one node) without creating index. It 
> took about 5 seconds to find a record match one field filter 
> condition. We have tested just travel a java List(10 million elements) 
> with 3 filter condition. It took about 0.1 second. We just want to 
> know whether ignite suit this use case? Thanks very much.
>