You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by "Knut Anders Hatlen (JIRA)" <de...@db.apache.org> on 2005/11/15 10:14:29 UTC

[jira] Closed: (DERBY-704) Large page cache kills initial performance

     [ http://issues.apache.org/jira/browse/DERBY-704?page=all ]
     
Knut Anders Hatlen closed DERBY-704:
------------------------------------


Fixed in revision 344270.

> Large page cache kills initial performance
> ------------------------------------------
>
>          Key: DERBY-704
>          URL: http://issues.apache.org/jira/browse/DERBY-704
>      Project: Derby
>         Type: Bug
>   Components: Services, Performance
>     Versions: 10.1.1.0, 10.2.0.0, 10.1.2.0, 10.1.1.1, 10.1.1.2, 10.1.2.1, 10.1.3.0, 10.1.2.2
>  Environment: All platforms
>     Reporter: Knut Anders Hatlen
>     Assignee: Knut Anders Hatlen
>      Fix For: 10.2.0.0
>  Attachments: DERBY-704.diff, cpu-usage.png, derbyall_report.txt, throughput.png
>
> When the page cache is large the performance gets lower as the page
> cache is being filled. As soon as the page cache is filled, the
> throughput increases. In the period with low performance, the CPU
> usage is high, and when the performance increases the CPU usage is
> lower.
> This behaviour is caused by the algorithm for finding free slots in
> the page cache. If there are invalid pages in the page cache, it will
> be scanned to find one of those pages. However, when multiple clients
> access the database, the invalid pages are often already taken. This
> means that the entire page cache will be scanned, but no free invalid
> page is found. Since the scan of the page cache is synchronized on the
> cache manager, all other threads that want to access the page cache
> have to wait. When the page cache is large, this will kill the
> performance.
> When the page cache is full, this is not a problem, as there will be
> no invalid pages.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Re: [jira] Closed: (DERBY-704) Large page cache kills initial performance

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.

Mike Matrigali <mi...@sbcglobal.net> writes:

> thanks, that makes sense - I just wasn't thinking about the
> startup costs and that growing the cache created invalid pages in the
> same way as "shrinking" does in the case of drop table.
> I tend to ignore the startup and wait for the
> steady state and concentrate on performance there, great
> you found this issue.

The problem was more like the steady state was never reached. Not sure
"steady" is the right word for the state being reached, though...

> The clock algorithm is an area that may be ripe for improvements
> (or probably better a complete new cache factory),
> especially when dealing with very large caches. Also the cache
> may be the first place to look to use new more concurrent
> data structures provided by java.  I would expect the current
> design to scale reasonably well on 1, 2 and maybe 4 processors -
> but it may see problems after that.  I would expect the most gain
> to be first the buffer manager, next the lock manager, and then
> the various other caches (statement cache, open file cache).

I imagine it would be relatively easy to implement and test prototypes
of new cache managers in Derby. Implementing the CacheFactory and
CacheManager interfaces should be enough, I think. Experimenting with
concurrent hash tables, multiple (prioritized) LRUs and other caching
strategies would certainly be an interesting task, and I too believe
there's a lot to gain in this area.

-- 
Knut Anders

Re: [jira] Closed: (DERBY-704) Large page cache kills initial performance

Posted by Mike Matrigali <mi...@sbcglobal.net>.

thanks, that makes sense - I just wasn't thinking about the
startup costs and that growing the cache created invalid pages in the
same way as "shrinking" does in the case of drop table.
I tend to ignore the startup and wait for the
steady state and concentrate on performance there, great
you found this issue.

The clock algorithm is an area that may be ripe for improvements
(or probably better a complete new cache factory),
especially when dealing with very large caches. Also the cache
may be the first place to look to use new more concurrent
data structures provided by java.  I would expect the current
design to scale reasonably well on 1, 2 and maybe 4 processors -
but it may see problems after that.  I would expect the most gain
to be first the buffer manager, next the lock manager, and then
the various other caches (statement cache, open file cache).

Knut Anders Hatlen wrote:
> Mike Matrigali <mi...@sbcglobal.net> writes:
> 
> 
>>Also I was wondering if you had any idea where the invalid pages
>>were coming from?
> 
> 
> The invalid pages are coming from Clock.growCache() where they are
> created but not initialized (hence invalid). The synchronization
> policy for Clock allows (to some extent) interleaving, so when there
> are more than one client there is a possibility that clients might see
> invalid pages not belonging to them. These pages are however marked
> with keepForCreate and cannot be taken by any other client than the
> one that created it.
> 
> 
>>The original problem with invalid pages was
>>from drop tables, where applications didn't like it that we used
>>all the pages in the cache when they were growing a single table
>>to 1/2 the cache size and then dropping it.  So the invalid case
>>was sort of special and wasn't expected to happen very often.
>>The usual case should be the cache growing to full and then LRU'ing
>>old pages to make for way for new ones.
>>
>>Maybe there are some temp tables involved in your testing?
> 
> 
> No temp tables are involved. The load is really simple, with
> transactions consisting of one single-row select (using primary key),
> three single-row updates (using primary key) and one insert.
> 
> 
>>I thought you were doing a TPCC type of benchmark, so I am not
>>sure where the invalid pages are coming from.
>>
>>Your work on big caches is very interesting, historically not much
>>work has been done in this area.
>>
>>Mike Matrigali wrote:
>>
>>
>>>just got a chance to look at the patch, it would have been nice if
>>>some of the great comments that are in the bug description had made
>>>it into the code changes.
> 
> 
> You are right, I should have put more comments into the code. I'll see
> what I can do. Thanks for looking at the patch.
>

Re: [jira] Closed: (DERBY-704) Large page cache kills initial performance

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.

Mike Matrigali <mi...@sbcglobal.net> writes:

> Also I was wondering if you had any idea where the invalid pages
> were coming from?

The invalid pages are coming from Clock.growCache() where they are
created but not initialized (hence invalid). The synchronization
policy for Clock allows (to some extent) interleaving, so when there
are more than one client there is a possibility that clients might see
invalid pages not belonging to them. These pages are however marked
with keepForCreate and cannot be taken by any other client than the
one that created it.

> The original problem with invalid pages was
> from drop tables, where applications didn't like it that we used
> all the pages in the cache when they were growing a single table
> to 1/2 the cache size and then dropping it.  So the invalid case
> was sort of special and wasn't expected to happen very often.
> The usual case should be the cache growing to full and then LRU'ing
> old pages to make for way for new ones.
>
> Maybe there are some temp tables involved in your testing?

No temp tables are involved. The load is really simple, with
transactions consisting of one single-row select (using primary key),
three single-row updates (using primary key) and one insert.

> I thought you were doing a TPCC type of benchmark, so I am not
> sure where the invalid pages are coming from.
>
> Your work on big caches is very interesting, historically not much
> work has been done in this area.
>
> Mike Matrigali wrote:
>
>> just got a chance to look at the patch, it would have been nice if
>> some of the great comments that are in the bug description had made
>> it into the code changes.

You are right, I should have put more comments into the code. I'll see
what I can do. Thanks for looking at the patch.

-- 
Knut Anders

Re: [jira] Closed: (DERBY-704) Large page cache kills initial performance

Posted by Øystein Grøvlen <Oy...@Sun.COM>.

Mike Matrigali wrote:
> Also I was wondering if you had any idea where the invalid pages
> were coming from?  The original problem with invalid pages was
> from drop tables, where applications didn't like it that we used
> all the pages in the cache when they were growing a single table
> to 1/2 the cache size and then dropping it.  So the invalid case
> was sort of special and wasn't expected to happen very often.
> The usual case should be the cache growing to full and then LRU'ing
> old pages to make for way for new ones.
> 

I thought that when new cache items were added, the new pages will be 
invalid until they are used for the first time.  This is why this is 
only a problem while the cache is growing.

--
Øystein

Re: [jira] Closed: (DERBY-704) Large page cache kills initial performance

Posted by Mike Matrigali <mi...@sbcglobal.net>.

Also I was wondering if you had any idea where the invalid pages
were coming from?  The original problem with invalid pages was
from drop tables, where applications didn't like it that we used
all the pages in the cache when they were growing a single table
to 1/2 the cache size and then dropping it.  So the invalid case
was sort of special and wasn't expected to happen very often.
The usual case should be the cache growing to full and then LRU'ing
old pages to make for way for new ones.

Maybe there are some temp tables involved in your testing?

I thought you were doing a TPCC type of benchmark, so I am not
sure where the invalid pages are coming from.

Your work on big caches is very interesting, historically not much
work has been done in this area.

Mike Matrigali wrote:

> just got a chance to look at the patch, it would have been nice if
> some of the great comments that are in the bug description had made
> it into the code changes.
> 
> Knut Anders Hatlen (JIRA) wrote:
> 
> 
>>     [ http://issues.apache.org/jira/browse/DERBY-704?page=all ]
>>     
>>Knut Anders Hatlen closed DERBY-704:
>>------------------------------------
>>
>>
>>Fixed in revision 344270.
>>
>>
>>
>>>Large page cache kills initial performance
>>>------------------------------------------
>>>
>>>        Key: DERBY-704
>>>        URL: http://issues.apache.org/jira/browse/DERBY-704
>>>    Project: Derby
>>>       Type: Bug
>>> Components: Services, Performance
>>>   Versions: 10.1.1.0, 10.2.0.0, 10.1.2.0, 10.1.1.1, 10.1.1.2, 10.1.2.1, 10.1.3.0, 10.1.2.2
>>>Environment: All platforms
>>>   Reporter: Knut Anders Hatlen
>>>   Assignee: Knut Anders Hatlen
>>>    Fix For: 10.2.0.0
>>>Attachments: DERBY-704.diff, cpu-usage.png, derbyall_report.txt, throughput.png
>>>
>>>When the page cache is large the performance gets lower as the page
>>>cache is being filled. As soon as the page cache is filled, the
>>>throughput increases. In the period with low performance, the CPU
>>>usage is high, and when the performance increases the CPU usage is
>>>lower.
>>>This behaviour is caused by the algorithm for finding free slots in
>>>the page cache. If there are invalid pages in the page cache, it will
>>>be scanned to find one of those pages. However, when multiple clients
>>>access the database, the invalid pages are often already taken. This
>>>means that the entire page cache will be scanned, but no free invalid
>>>page is found. Since the scan of the page cache is synchronized on the
>>>cache manager, all other threads that want to access the page cache
>>>have to wait. When the page cache is large, this will kill the
>>>performance.
>>>When the page cache is full, this is not a problem, as there will be
>>>no invalid pages.
>>
>>
>

Re: [jira] Closed: (DERBY-704) Large page cache kills initial performance

Posted by Mike Matrigali <mi...@sbcglobal.net>.

just got a chance to look at the patch, it would have been nice if
some of the great comments that are in the bug description had made
it into the code changes.

Knut Anders Hatlen (JIRA) wrote:

>      [ http://issues.apache.org/jira/browse/DERBY-704?page=all ]
>      
> Knut Anders Hatlen closed DERBY-704:
> ------------------------------------
> 
> 
> Fixed in revision 344270.
> 
> 
>>Large page cache kills initial performance
>>------------------------------------------
>>
>>         Key: DERBY-704
>>         URL: http://issues.apache.org/jira/browse/DERBY-704
>>     Project: Derby
>>        Type: Bug
>>  Components: Services, Performance
>>    Versions: 10.1.1.0, 10.2.0.0, 10.1.2.0, 10.1.1.1, 10.1.1.2, 10.1.2.1, 10.1.3.0, 10.1.2.2
>> Environment: All platforms
>>    Reporter: Knut Anders Hatlen
>>    Assignee: Knut Anders Hatlen
>>     Fix For: 10.2.0.0
>> Attachments: DERBY-704.diff, cpu-usage.png, derbyall_report.txt, throughput.png
>>
>>When the page cache is large the performance gets lower as the page
>>cache is being filled. As soon as the page cache is filled, the
>>throughput increases. In the period with low performance, the CPU
>>usage is high, and when the performance increases the CPU usage is
>>lower.
>>This behaviour is caused by the algorithm for finding free slots in
>>the page cache. If there are invalid pages in the page cache, it will
>>be scanned to find one of those pages. However, when multiple clients
>>access the database, the invalid pages are often already taken. This
>>means that the entire page cache will be scanned, but no free invalid
>>page is found. Since the scan of the page cache is synchronized on the
>>cache manager, all other threads that want to access the page cache
>>have to wait. When the page cache is large, this will kill the
>>performance.
>>When the page cache is full, this is not a problem, as there will be
>>no invalid pages.
> 
>