You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by Verdan Mahmood <ve...@gmail.com> on 2019/08/14 17:10:11 UTC
Help needed with Production Configurations
Hello Community,
We have deployed the Atlas-Solr in our production environment recently, and
have around 30k hive tables, with a couple of custom entity types.
The DSL queries are pretty slow and taking an average of 15 seconds each
time.
Our Solr is pretty efficient and works pretty well with all kind of queries
from Solr UI.
Do you guys have some kind of configurations that you fine tune to get the
most out of Atlas?
Best,
*Verdan Mahmood*
Re: Help needed with Production Configurations
Posted by Ashutosh Mestry <am...@cloudera.com>.
Both of these queries will do an in-memory scans. This will cause performance penalty with increased volume of data.
You could get better performance if you use Basic Search that in-turn uses the fulltext indexes. Only that you will have to do some processing yourself.
~ ashutosh
From: Verdan Mahmood <ve...@gmail.com>
Date: Wednesday, August 14, 2019 at 10:39 AM
To: "dev@atlas.apache.org" <de...@atlas.apache.org>, Ashutosh Mestry <am...@cloudera.com>
Cc: "user@atlas.apache.org" <us...@atlas.apache.org>
Subject: Re: Help needed with Production Configurations
We are using Atlas 2.0
We do have following relations:
Table(DataSet):
hive_table(Table):
metadata = Metadata
Metadata:
popularityScore
and one of our query is
FROM Table SELECT metadata.__guid ORDERBY popularityScore desc LIMIT 10
Another one is for the wildcard search
Table from Table where name like '*{query_term}*' or description like '*{query_term}*'
Any suggestions on how do you improve those. Use the fulltext/freetext searches ?
Best,
Verdan Mahmood
On Wed, Aug 14, 2019 at 7:22 PM Ashutosh Mestry <am...@cloudera.com.invalid> wrote:
What version are you using?
DSL queries mostly not use indexes, hence having good configuration of Solr may not be of much value. The pre-1.0 version of DSL is not very efficient in terms of performance.
What kind of queries do you have? Can you post some examples?
Few things you could try:
- Increase Atlas' memory. DSL uses in memory filtering for certain kind of queries. Additional memory may help.
- Increase number of shards in Solr (default is 1, adding shards will help improve throughput). You will have to use Solr API calls for adding shards. The num_shards property within gets used only during the very first start.
- Analyze your queries, see if you can use Basic Search, since it is optimized to use Solr primarily.
~ ashutosh
On 8/14/19, 10:10 AM, "Verdan Mahmood" <ve...@gmail.com>> wrote:
Hello Community,
We have deployed the Atlas-Solr in our production environment recently, and
have around 30k hive tables, with a couple of custom entity types.
The DSL queries are pretty slow and taking an average of 15 seconds each
time.
Our Solr is pretty efficient and works pretty well with all kind of queries
from Solr UI.
Do you guys have some kind of configurations that you fine tune to get the
most out of Atlas?
Best,
*Verdan Mahmood*
Re: Help needed with Production Configurations
Posted by Ashutosh Mestry <am...@cloudera.com.INVALID>.
Both of these queries will do an in-memory scans. This will cause performance penalty with increased volume of data.
You could get better performance if you use Basic Search that in-turn uses the fulltext indexes. Only that you will have to do some processing yourself.
~ ashutosh
From: Verdan Mahmood <ve...@gmail.com>
Date: Wednesday, August 14, 2019 at 10:39 AM
To: "dev@atlas.apache.org" <de...@atlas.apache.org>, Ashutosh Mestry <am...@cloudera.com>
Cc: "user@atlas.apache.org" <us...@atlas.apache.org>
Subject: Re: Help needed with Production Configurations
We are using Atlas 2.0
We do have following relations:
Table(DataSet):
hive_table(Table):
metadata = Metadata
Metadata:
popularityScore
and one of our query is
FROM Table SELECT metadata.__guid ORDERBY popularityScore desc LIMIT 10
Another one is for the wildcard search
Table from Table where name like '*{query_term}*' or description like '*{query_term}*'
Any suggestions on how do you improve those. Use the fulltext/freetext searches ?
Best,
Verdan Mahmood
On Wed, Aug 14, 2019 at 7:22 PM Ashutosh Mestry <am...@cloudera.com.invalid> wrote:
What version are you using?
DSL queries mostly not use indexes, hence having good configuration of Solr may not be of much value. The pre-1.0 version of DSL is not very efficient in terms of performance.
What kind of queries do you have? Can you post some examples?
Few things you could try:
- Increase Atlas' memory. DSL uses in memory filtering for certain kind of queries. Additional memory may help.
- Increase number of shards in Solr (default is 1, adding shards will help improve throughput). You will have to use Solr API calls for adding shards. The num_shards property within gets used only during the very first start.
- Analyze your queries, see if you can use Basic Search, since it is optimized to use Solr primarily.
~ ashutosh
On 8/14/19, 10:10 AM, "Verdan Mahmood" <ve...@gmail.com>> wrote:
Hello Community,
We have deployed the Atlas-Solr in our production environment recently, and
have around 30k hive tables, with a couple of custom entity types.
The DSL queries are pretty slow and taking an average of 15 seconds each
time.
Our Solr is pretty efficient and works pretty well with all kind of queries
from Solr UI.
Do you guys have some kind of configurations that you fine tune to get the
most out of Atlas?
Best,
*Verdan Mahmood*
Re: Help needed with Production Configurations
Posted by Verdan Mahmood <ve...@gmail.com>.
We are using Atlas 2.0
We do have following relations:
*Table(DataSet):*
*hive_table(Table):*
metadata = Metadata
*Metadata:*
popularityScore
and one of our query is
FROM Table SELECT metadata.__guid ORDERBY popularityScore desc LIMIT 10
Another one is for the wildcard search
Table from Table where name like '*{query_term}*' or description like
'*{query_term}*'
Any suggestions on how do you improve those. Use the fulltext/freetext
searches ?
Best,
*Verdan Mahmood*
On Wed, Aug 14, 2019 at 7:22 PM Ashutosh Mestry
<am...@cloudera.com.invalid> wrote:
> What version are you using?
>
> DSL queries mostly not use indexes, hence having good configuration of
> Solr may not be of much value. The pre-1.0 version of DSL is not very
> efficient in terms of performance.
>
> What kind of queries do you have? Can you post some examples?
>
> Few things you could try:
> - Increase Atlas' memory. DSL uses in memory filtering for certain kind of
> queries. Additional memory may help.
> - Increase number of shards in Solr (default is 1, adding shards will help
> improve throughput). You will have to use Solr API calls for adding shards.
> The num_shards property within gets used only during the very first start.
> - Analyze your queries, see if you can use Basic Search, since it is
> optimized to use Solr primarily.
>
> ~ ashutosh
>
>
> On 8/14/19, 10:10 AM, "Verdan Mahmood" <ve...@gmail.com> wrote:
>
> Hello Community,
>
> We have deployed the Atlas-Solr in our production environment
> recently, and
> have around 30k hive tables, with a couple of custom entity types.
> The DSL queries are pretty slow and taking an average of 15 seconds
> each
> time.
> Our Solr is pretty efficient and works pretty well with all kind of
> queries
> from Solr UI.
>
> Do you guys have some kind of configurations that you fine tune to get
> the
> most out of Atlas?
>
>
> Best,
> *Verdan Mahmood*
>
>
Re: Help needed with Production Configurations
Posted by Verdan Mahmood <ve...@gmail.com>.
We are using Atlas 2.0
We do have following relations:
*Table(DataSet):*
*hive_table(Table):*
metadata = Metadata
*Metadata:*
popularityScore
and one of our query is
FROM Table SELECT metadata.__guid ORDERBY popularityScore desc LIMIT 10
Another one is for the wildcard search
Table from Table where name like '*{query_term}*' or description like
'*{query_term}*'
Any suggestions on how do you improve those. Use the fulltext/freetext
searches ?
Best,
*Verdan Mahmood*
On Wed, Aug 14, 2019 at 7:22 PM Ashutosh Mestry
<am...@cloudera.com.invalid> wrote:
> What version are you using?
>
> DSL queries mostly not use indexes, hence having good configuration of
> Solr may not be of much value. The pre-1.0 version of DSL is not very
> efficient in terms of performance.
>
> What kind of queries do you have? Can you post some examples?
>
> Few things you could try:
> - Increase Atlas' memory. DSL uses in memory filtering for certain kind of
> queries. Additional memory may help.
> - Increase number of shards in Solr (default is 1, adding shards will help
> improve throughput). You will have to use Solr API calls for adding shards.
> The num_shards property within gets used only during the very first start.
> - Analyze your queries, see if you can use Basic Search, since it is
> optimized to use Solr primarily.
>
> ~ ashutosh
>
>
> On 8/14/19, 10:10 AM, "Verdan Mahmood" <ve...@gmail.com> wrote:
>
> Hello Community,
>
> We have deployed the Atlas-Solr in our production environment
> recently, and
> have around 30k hive tables, with a couple of custom entity types.
> The DSL queries are pretty slow and taking an average of 15 seconds
> each
> time.
> Our Solr is pretty efficient and works pretty well with all kind of
> queries
> from Solr UI.
>
> Do you guys have some kind of configurations that you fine tune to get
> the
> most out of Atlas?
>
>
> Best,
> *Verdan Mahmood*
>
>
Re: Help needed with Production Configurations
Posted by Ashutosh Mestry <am...@cloudera.com.INVALID>.
What version are you using?
DSL queries mostly not use indexes, hence having good configuration of Solr may not be of much value. The pre-1.0 version of DSL is not very efficient in terms of performance.
What kind of queries do you have? Can you post some examples?
Few things you could try:
- Increase Atlas' memory. DSL uses in memory filtering for certain kind of queries. Additional memory may help.
- Increase number of shards in Solr (default is 1, adding shards will help improve throughput). You will have to use Solr API calls for adding shards. The num_shards property within gets used only during the very first start.
- Analyze your queries, see if you can use Basic Search, since it is optimized to use Solr primarily.
~ ashutosh
On 8/14/19, 10:10 AM, "Verdan Mahmood" <ve...@gmail.com> wrote:
Hello Community,
We have deployed the Atlas-Solr in our production environment recently, and
have around 30k hive tables, with a couple of custom entity types.
The DSL queries are pretty slow and taking an average of 15 seconds each
time.
Our Solr is pretty efficient and works pretty well with all kind of queries
from Solr UI.
Do you guys have some kind of configurations that you fine tune to get the
most out of Atlas?
Best,
*Verdan Mahmood*
Re: Help needed with Production Configurations
Posted by Ashutosh Mestry <am...@cloudera.com>.
What version are you using?
DSL queries mostly not use indexes, hence having good configuration of Solr may not be of much value. The pre-1.0 version of DSL is not very efficient in terms of performance.
What kind of queries do you have? Can you post some examples?
Few things you could try:
- Increase Atlas' memory. DSL uses in memory filtering for certain kind of queries. Additional memory may help.
- Increase number of shards in Solr (default is 1, adding shards will help improve throughput). You will have to use Solr API calls for adding shards. The num_shards property within gets used only during the very first start.
- Analyze your queries, see if you can use Basic Search, since it is optimized to use Solr primarily.
~ ashutosh
On 8/14/19, 10:10 AM, "Verdan Mahmood" <ve...@gmail.com> wrote:
Hello Community,
We have deployed the Atlas-Solr in our production environment recently, and
have around 30k hive tables, with a couple of custom entity types.
The DSL queries are pretty slow and taking an average of 15 seconds each
time.
Our Solr is pretty efficient and works pretty well with all kind of queries
from Solr UI.
Do you guys have some kind of configurations that you fine tune to get the
most out of Atlas?
Best,
*Verdan Mahmood*