You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@atlas.apache.org by Verdan Mahmood <ve...@gmail.com> on 2019/08/14 17:10:11 UTC

Help needed with Production Configurations

Hello Community,

We have deployed the Atlas-Solr in our production environment recently, and
have around 30k hive tables, with a couple of custom entity types.
The DSL queries are pretty slow and taking an average of 15 seconds each
time.
Our Solr is pretty efficient and works pretty well with all kind of queries
from Solr UI.

Do you guys have some kind of configurations that you fine tune to get the
most out of Atlas?


Best,
*Verdan Mahmood*

Re: Help needed with Production Configurations

Posted by Ashutosh Mestry <am...@cloudera.com>.
Both of these queries will do an in-memory scans. This will cause performance penalty with increased volume of data.

You could get better performance if you use Basic Search that in-turn uses the fulltext indexes. Only that you will have to do some processing yourself.

~ ashutosh


From: Verdan Mahmood <ve...@gmail.com>
Date: Wednesday, August 14, 2019 at 10:39 AM
To: "dev@atlas.apache.org" <de...@atlas.apache.org>, Ashutosh Mestry <am...@cloudera.com>
Cc: "user@atlas.apache.org" <us...@atlas.apache.org>
Subject: Re: Help needed with Production Configurations

We are using Atlas 2.0

We do have following relations:

Table(DataSet):

hive_table(Table):
metadata = Metadata

Metadata:
popularityScore

and one of our query is

FROM Table SELECT metadata.__guid ORDERBY popularityScore desc LIMIT 10

Another one is for the wildcard search

Table from Table where name like '*{query_term}*' or description like '*{query_term}*'
Any suggestions on how do you improve those. Use the fulltext/freetext searches ?


Best,
Verdan Mahmood



On Wed, Aug 14, 2019 at 7:22 PM Ashutosh Mestry <am...@cloudera.com.invalid> wrote:
What version are you using?

DSL queries mostly not use indexes, hence having good configuration of Solr may not be of much value. The pre-1.0 version of DSL is not very efficient in terms of performance.

What kind of queries do you have? Can you post some examples?

Few things you could try:
- Increase Atlas' memory. DSL uses in memory filtering for certain kind of queries. Additional memory may help.
- Increase number of shards in Solr (default is 1, adding shards will help improve throughput). You will have to use Solr API calls for adding shards. The num_shards property within gets used only during the very first start.
- Analyze your queries, see if you can use Basic Search, since it is optimized to use Solr primarily.

~ ashutosh


On 8/14/19, 10:10 AM, "Verdan Mahmood" <ve...@gmail.com>> wrote:

    Hello Community,

    We have deployed the Atlas-Solr in our production environment recently, and
    have around 30k hive tables, with a couple of custom entity types.
    The DSL queries are pretty slow and taking an average of 15 seconds each
    time.
    Our Solr is pretty efficient and works pretty well with all kind of queries
    from Solr UI.

    Do you guys have some kind of configurations that you fine tune to get the
    most out of Atlas?


    Best,
    *Verdan Mahmood*

Re: Help needed with Production Configurations

Posted by Ashutosh Mestry <am...@cloudera.com.INVALID>.
Both of these queries will do an in-memory scans. This will cause performance penalty with increased volume of data.

You could get better performance if you use Basic Search that in-turn uses the fulltext indexes. Only that you will have to do some processing yourself.

~ ashutosh


From: Verdan Mahmood <ve...@gmail.com>
Date: Wednesday, August 14, 2019 at 10:39 AM
To: "dev@atlas.apache.org" <de...@atlas.apache.org>, Ashutosh Mestry <am...@cloudera.com>
Cc: "user@atlas.apache.org" <us...@atlas.apache.org>
Subject: Re: Help needed with Production Configurations

We are using Atlas 2.0

We do have following relations:

Table(DataSet):

hive_table(Table):
metadata = Metadata

Metadata:
popularityScore

and one of our query is

FROM Table SELECT metadata.__guid ORDERBY popularityScore desc LIMIT 10

Another one is for the wildcard search

Table from Table where name like '*{query_term}*' or description like '*{query_term}*'
Any suggestions on how do you improve those. Use the fulltext/freetext searches ?


Best,
Verdan Mahmood



On Wed, Aug 14, 2019 at 7:22 PM Ashutosh Mestry <am...@cloudera.com.invalid> wrote:
What version are you using?

DSL queries mostly not use indexes, hence having good configuration of Solr may not be of much value. The pre-1.0 version of DSL is not very efficient in terms of performance.

What kind of queries do you have? Can you post some examples?

Few things you could try:
- Increase Atlas' memory. DSL uses in memory filtering for certain kind of queries. Additional memory may help.
- Increase number of shards in Solr (default is 1, adding shards will help improve throughput). You will have to use Solr API calls for adding shards. The num_shards property within gets used only during the very first start.
- Analyze your queries, see if you can use Basic Search, since it is optimized to use Solr primarily.

~ ashutosh


On 8/14/19, 10:10 AM, "Verdan Mahmood" <ve...@gmail.com>> wrote:

    Hello Community,

    We have deployed the Atlas-Solr in our production environment recently, and
    have around 30k hive tables, with a couple of custom entity types.
    The DSL queries are pretty slow and taking an average of 15 seconds each
    time.
    Our Solr is pretty efficient and works pretty well with all kind of queries
    from Solr UI.

    Do you guys have some kind of configurations that you fine tune to get the
    most out of Atlas?


    Best,
    *Verdan Mahmood*

Re: Help needed with Production Configurations

Posted by Verdan Mahmood <ve...@gmail.com>.
We are using Atlas 2.0

We do have following relations:

*Table(DataSet):*

*hive_table(Table):*
metadata = Metadata

*Metadata:*
popularityScore

and one of our query is

FROM Table SELECT metadata.__guid ORDERBY popularityScore desc LIMIT 10

Another one is for the wildcard search

Table from Table where name like '*{query_term}*' or description like
'*{query_term}*'

Any suggestions on how do you improve those. Use the fulltext/freetext
searches ?


Best,
*Verdan Mahmood*



On Wed, Aug 14, 2019 at 7:22 PM Ashutosh Mestry
<am...@cloudera.com.invalid> wrote:

> What version are you using?
>
> DSL queries mostly not use indexes, hence having good configuration of
> Solr may not be of much value. The pre-1.0 version of DSL is not very
> efficient in terms of performance.
>
> What kind of queries do you have? Can you post some examples?
>
> Few things you could try:
> - Increase Atlas' memory. DSL uses in memory filtering for certain kind of
> queries. Additional memory may help.
> - Increase number of shards in Solr (default is 1, adding shards will help
> improve throughput). You will have to use Solr API calls for adding shards.
> The num_shards property within gets used only during the very first start.
> - Analyze your queries, see if you can use Basic Search, since it is
> optimized to use Solr primarily.
>
> ~ ashutosh
>
>
> On 8/14/19, 10:10 AM, "Verdan Mahmood" <ve...@gmail.com> wrote:
>
>     Hello Community,
>
>     We have deployed the Atlas-Solr in our production environment
> recently, and
>     have around 30k hive tables, with a couple of custom entity types.
>     The DSL queries are pretty slow and taking an average of 15 seconds
> each
>     time.
>     Our Solr is pretty efficient and works pretty well with all kind of
> queries
>     from Solr UI.
>
>     Do you guys have some kind of configurations that you fine tune to get
> the
>     most out of Atlas?
>
>
>     Best,
>     *Verdan Mahmood*
>
>

Re: Help needed with Production Configurations

Posted by Verdan Mahmood <ve...@gmail.com>.
We are using Atlas 2.0

We do have following relations:

*Table(DataSet):*

*hive_table(Table):*
metadata = Metadata

*Metadata:*
popularityScore

and one of our query is

FROM Table SELECT metadata.__guid ORDERBY popularityScore desc LIMIT 10

Another one is for the wildcard search

Table from Table where name like '*{query_term}*' or description like
'*{query_term}*'

Any suggestions on how do you improve those. Use the fulltext/freetext
searches ?


Best,
*Verdan Mahmood*



On Wed, Aug 14, 2019 at 7:22 PM Ashutosh Mestry
<am...@cloudera.com.invalid> wrote:

> What version are you using?
>
> DSL queries mostly not use indexes, hence having good configuration of
> Solr may not be of much value. The pre-1.0 version of DSL is not very
> efficient in terms of performance.
>
> What kind of queries do you have? Can you post some examples?
>
> Few things you could try:
> - Increase Atlas' memory. DSL uses in memory filtering for certain kind of
> queries. Additional memory may help.
> - Increase number of shards in Solr (default is 1, adding shards will help
> improve throughput). You will have to use Solr API calls for adding shards.
> The num_shards property within gets used only during the very first start.
> - Analyze your queries, see if you can use Basic Search, since it is
> optimized to use Solr primarily.
>
> ~ ashutosh
>
>
> On 8/14/19, 10:10 AM, "Verdan Mahmood" <ve...@gmail.com> wrote:
>
>     Hello Community,
>
>     We have deployed the Atlas-Solr in our production environment
> recently, and
>     have around 30k hive tables, with a couple of custom entity types.
>     The DSL queries are pretty slow and taking an average of 15 seconds
> each
>     time.
>     Our Solr is pretty efficient and works pretty well with all kind of
> queries
>     from Solr UI.
>
>     Do you guys have some kind of configurations that you fine tune to get
> the
>     most out of Atlas?
>
>
>     Best,
>     *Verdan Mahmood*
>
>

Re: Help needed with Production Configurations

Posted by Ashutosh Mestry <am...@cloudera.com.INVALID>.
What version are you using?

DSL queries mostly not use indexes, hence having good configuration of Solr may not be of much value. The pre-1.0 version of DSL is not very efficient in terms of performance.

What kind of queries do you have? Can you post some examples?

Few things you could try:
- Increase Atlas' memory. DSL uses in memory filtering for certain kind of queries. Additional memory may help.
- Increase number of shards in Solr (default is 1, adding shards will help improve throughput). You will have to use Solr API calls for adding shards. The num_shards property within gets used only during the very first start.
- Analyze your queries, see if you can use Basic Search, since it is optimized to use Solr primarily.

~ ashutosh


On 8/14/19, 10:10 AM, "Verdan Mahmood" <ve...@gmail.com> wrote:

    Hello Community,
    
    We have deployed the Atlas-Solr in our production environment recently, and
    have around 30k hive tables, with a couple of custom entity types.
    The DSL queries are pretty slow and taking an average of 15 seconds each
    time.
    Our Solr is pretty efficient and works pretty well with all kind of queries
    from Solr UI.
    
    Do you guys have some kind of configurations that you fine tune to get the
    most out of Atlas?
    
    
    Best,
    *Verdan Mahmood*
    

Re: Help needed with Production Configurations

Posted by Ashutosh Mestry <am...@cloudera.com>.
What version are you using?

DSL queries mostly not use indexes, hence having good configuration of Solr may not be of much value. The pre-1.0 version of DSL is not very efficient in terms of performance.

What kind of queries do you have? Can you post some examples?

Few things you could try:
- Increase Atlas' memory. DSL uses in memory filtering for certain kind of queries. Additional memory may help.
- Increase number of shards in Solr (default is 1, adding shards will help improve throughput). You will have to use Solr API calls for adding shards. The num_shards property within gets used only during the very first start.
- Analyze your queries, see if you can use Basic Search, since it is optimized to use Solr primarily.

~ ashutosh


On 8/14/19, 10:10 AM, "Verdan Mahmood" <ve...@gmail.com> wrote:

    Hello Community,
    
    We have deployed the Atlas-Solr in our production environment recently, and
    have around 30k hive tables, with a couple of custom entity types.
    The DSL queries are pretty slow and taking an average of 15 seconds each
    time.
    Our Solr is pretty efficient and works pretty well with all kind of queries
    from Solr UI.
    
    Do you guys have some kind of configurations that you fine tune to get the
    most out of Atlas?
    
    
    Best,
    *Verdan Mahmood*