You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Feng Zhu <we...@gmail.com> on 2019/10/29 02:28:57 UTC

[DISCUSSION] Cache Optimization in JdbcSchema

Hi all,
We made some optimizations in practice. But I'm not sure whether this kind
of change is necessary to the community, because it will make the code
complex.

Current now, JdbcSchema caches all JdbcTables in tableMap (i.e.,*
ImmutableMap<String, JdbcTable> tableMap*)

In our production environment, there are about 3000+ datasources and
correspondingly creating 3000+ JdbcSchemas, while each JdbcSchema may
contain up to 10000+ tables.Consequently, the table map occupies nearly
10GB memory, bringing great pressure on the server.

We encode <*catalogName, schemaName, tableTypeName*> tuple as unique
Integer, and simplify the table map as <*String, Integer*>. According to
the Integer, we can find tuple and construct JdbcTable dynamically. Benefit
from this, the cached table map costs only about 800MB memory.

Best,
DonnyZone

Re: [DISCUSSION] Cache Optimization in JdbcSchema

Posted by Julian Hyde <jh...@apache.org>.

How much benefit is that massive cache giving you? If it’s not giving much benefit, maybe we should not be caching so much. Maybe in your case caching just the current schema (or the most recent 2 or 3 schemas) would be a better strategy.

As y’all know, I’m always in favor of removing caches. Or at least getting them to prove their worth.

Julian


> On Oct 28, 2019, at 11:50 PM, Feng Zhu <we...@gmail.com> wrote:
> 
> Thanks, I will open a JIRA for discussion, with design doc and testing
> report.
> 
> Danny Chan <yu...@gmail.com> 于2019年10月29日周二 下午12:11写道：
> 
>> Sounds very attractive, could you give an intuitive design doc to
>> illustrate how it works ? And we may review the design then ;)
>> 
>> Best,
>> Danny Chan
>> 在 2019年10月29日 +0800 AM10:36，Feng Zhu <we...@gmail.com>，写道：
>>> Hi all,
>>> We made some optimizations in practice. But I'm not sure whether this
>> kind
>>> of change is necessary to the community, because it will make the code
>>> complex.
>>> 
>>> Current now, JdbcSchema caches all JdbcTables in tableMap (i.e.,*
>>> ImmutableMap<String, JdbcTable> tableMap*)
>>> 
>>> In our production environment, there are about 3000+ datasources and
>>> correspondingly creating 3000+ JdbcSchemas, while each JdbcSchema may
>>> contain up to 10000+ tables.Consequently, the table map occupies nearly
>>> 10GB memory, bringing great pressure on the server.
>>> 
>>> We encode <*catalogName, schemaName, tableTypeName*> tuple as unique
>>> Integer, and simplify the table map as <*String, Integer*>. According to
>>> the Integer, we can find tuple and construct JdbcTable dynamically.
>> Benefit
>>> from this, the cached table map costs only about 800MB memory.
>>> 
>>> Best,
>>> DonnyZone
>>

Re: [DISCUSSION] Cache Optimization in JdbcSchema

Posted by Feng Zhu <we...@gmail.com>.

Thanks, I will open a JIRA for discussion, with design doc and testing
report.

Danny Chan <yu...@gmail.com> 于2019年10月29日周二 下午12:11写道：

> Sounds very attractive, could you give an intuitive design doc to
> illustrate how it works ? And we may review the design then ;)
>
> Best,
> Danny Chan
> 在 2019年10月29日 +0800 AM10:36，Feng Zhu <we...@gmail.com>，写道：
> > Hi all,
> > We made some optimizations in practice. But I'm not sure whether this
> kind
> > of change is necessary to the community, because it will make the code
> > complex.
> >
> > Current now, JdbcSchema caches all JdbcTables in tableMap (i.e.,*
> > ImmutableMap<String, JdbcTable> tableMap*)
> >
> > In our production environment, there are about 3000+ datasources and
> > correspondingly creating 3000+ JdbcSchemas, while each JdbcSchema may
> > contain up to 10000+ tables.Consequently, the table map occupies nearly
> > 10GB memory, bringing great pressure on the server.
> >
> > We encode <*catalogName, schemaName, tableTypeName*> tuple as unique
> > Integer, and simplify the table map as <*String, Integer*>. According to
> > the Integer, we can find tuple and construct JdbcTable dynamically.
> Benefit
> > from this, the cached table map costs only about 800MB memory.
> >
> > Best,
> > DonnyZone
>

Re: [DISCUSSION] Cache Optimization in JdbcSchema

Posted by Danny Chan <yu...@gmail.com>.

Sounds very attractive, could you give an intuitive design doc to illustrate how it works ? And we may review the design then ;)

Best,
Danny Chan
在 2019年10月29日 +0800 AM10:36，Feng Zhu <we...@gmail.com>，写道：
> Hi all,
> We made some optimizations in practice. But I'm not sure whether this kind
> of change is necessary to the community, because it will make the code
> complex.
>
> Current now, JdbcSchema caches all JdbcTables in tableMap (i.e.,*
> ImmutableMap<String, JdbcTable> tableMap*)
>
> In our production environment, there are about 3000+ datasources and
> correspondingly creating 3000+ JdbcSchemas, while each JdbcSchema may
> contain up to 10000+ tables.Consequently, the table map occupies nearly
> 10GB memory, bringing great pressure on the server.
>
> We encode <*catalogName, schemaName, tableTypeName*> tuple as unique
> Integer, and simplify the table map as <*String, Integer*>. According to
> the Integer, we can find tuple and construct JdbcTable dynamically. Benefit
> from this, the cached table map costs only about 800MB memory.
>
> Best,
> DonnyZone

Re: [DISCUSSION] Cache Optimization in JdbcSchema

Posted by Haisheng Yuan <h....@alibaba-inc.com>.

I think this definitely will help.
The code is already complicated, we can add more comments and doc to make it clear. Nothing is more attractive than saving 90% memory.

- Haisheng

------------------------------------------------------------------
发件人：Feng Zhu<we...@gmail.com>
日　期：2019年10月29日 10:28:57
收件人：<de...@calcite.apache.org>
主　题：[DISCUSSION] Cache Optimization in JdbcSchema

Hi all,
We made some optimizations in practice. But I'm not sure whether this kind
of change is necessary to the community, because it will make the code
complex.

Current now, JdbcSchema caches all JdbcTables in tableMap (i.e.,*
ImmutableMap<String, JdbcTable> tableMap*)

In our production environment, there are about 3000+ datasources and
correspondingly creating 3000+ JdbcSchemas, while each JdbcSchema may
contain up to 10000+ tables.Consequently, the table map occupies nearly
10GB memory, bringing great pressure on the server.

We encode <*catalogName, schemaName, tableTypeName*> tuple as unique
Integer, and simplify the table map as <*String, Integer*>. According to
the Integer, we can find tuple and construct JdbcTable dynamically. Benefit
from this, the cached table map costs only about 800MB memory.

Best,
DonnyZone