You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Julian Reschke (JIRA)" <ji...@apache.org> on 2017/10/06 11:36:00 UTC

[jira] [Commented] (OAK-6789) RDB: RevisionGC performance on Oracle

    [ https://issues.apache.org/jira/browse/OAK-6789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16194472#comment-16194472 ] 

Julian Reschke commented on OAK-6789:
-------------------------------------

Further tests show that this is caused by the way we delete in batches, and the way Oracle handles these deletes.

The version garbage collection deletes in chunks of 450 documents, each identified by ID and with a condition on it's modification date (to prevent deletion of documents that have changed since the collection phase).

In {{RDBDocumentStore}} we break these down to chunks of 64 documents, and generate a single DELETE statement like this:
{noformat}
delete from nodes where id='x1' and modified=1 or id='x2' and modified=2  ...
{noformat}

Tests show that Oracle behaves sanely as long as we do not put more than 8 conditions into a single statement. Checking the execution plan for 16 conditions:
{noformat}
-----------------------------------------------------------------------------------------------

| Id  | Operation                     | Name          | Rows  | Bytes | Cost (%CPU)| Time     |

-----------------------------------------------------------------------------------------------

|   0 | DELETE STATEMENT              |               |     1 |   122 |    25(0)| 00:00:01 |

|   1 |  DELETE                       | NODES         |       |       |   |          |

|   2 |   INLIST ITERATOR             |               |       |       |   |          |

|*  3 |    TABLE ACCESS BY INDEX ROWID| NODES         |     1 |   122 |    25(0)| 00:00:01 |

|*  4 |     INDEX UNIQUE SCAN         | SYS_C00175322 |     8 |       |    19(0)| 00:00:01 |

-----------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("ID"='x1' AND "MODIFIED"=1 OR "ID"='x10' AND "MODIFIED"=10 OR "ID"
='x11'

              AND "MODIFIED"=11 OR "ID"='x12' AND "MODIFIED"=12 OR "ID"='x13' AN
D "MODIFIED"=13 OR

              "ID"='x14' AND "MODIFIED"=14 OR "ID"='x15' AND "MODIFIED"=15 OR "I
D"='x16' AND

              "MODIFIED"=16 OR "ID"='x2' AND "MODIFIED"=2 OR "ID"='x3' AND "MODI
FIED"=3 OR "ID"='x4'

              AND "MODIFIED"=4 OR "ID"='x5' AND "MODIFIED"=5 OR "ID"='x6' AND "M
ODIFIED"=6 OR

              "ID"='x7' AND "MODIFIED"=7 OR "ID"='x8' AND "MODIFIED"=8 OR "ID"='
x9' AND "MODIFIED"=9)

   4 - access("ID"='x1' OR "ID"='x10' OR "ID"='x11' OR "ID"='x12' OR "ID"='x13'
OR

              "ID"='x14' OR "ID"='x15' OR "ID"='x16' OR "ID"='x2' OR "ID"='x3' O
R "ID"='x4' OR

              "ID"='x5' OR "ID"='x6' OR "ID"='x7' OR "ID"='x8' OR "ID"='x9')
{noformat}

where "SYS_C00175322" is the primary unique index on ID.

For 8 conditions:

{noformat}
-----------------------------------------------------------------------------------------------

| Id  | Operation                     | Name          | Rows  | Bytes | Cost (%CPU)| Time     |

-----------------------------------------------------------------------------------------------

|   0 | DELETE STATEMENT              |               |     1 |   122 |    14(0)| 00:00:01 |

|   1 |  DELETE                       | NODES         |       |       |   |          |

|   2 |   INLIST ITERATOR             |               |       |       |   |          |

|*  3 |    TABLE ACCESS BY INDEX ROWID| NODES         |     1 |   122 |    14(0)| 00:00:01 |

|*  4 |     INDEX UNIQUE SCAN         | SYS_C00175322 |     4 |       |    11(0)| 00:00:01 |

-----------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("ID"='x1' AND "MODIFIED"=1 OR "ID"='x2' AND "MODIFIED"=2 OR "ID"='
x3'

              AND "MODIFIED"=3 OR "ID"='x4' AND "MODIFIED"=4 OR "ID"='x5' AND "M
ODIFIED"=5 OR

              "ID"='x6' AND "MODIFIED"=6 OR "ID"='x7' AND "MODIFIED"=7 OR "ID"='
x8' AND "MODIFIED"=8)

   4 - access("ID"='x1' OR "ID"='x2' OR "ID"='x3' OR "ID"='x4' OR "ID"='x5' OR
              "ID"='x6' OR "ID"='x7' OR "ID"='x8')

{noformat}


> RDB: RevisionGC performance on Oracle
> -------------------------------------
>
>                 Key: OAK-6789
>                 URL: https://issues.apache.org/jira/browse/OAK-6789
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: rdbmk
>            Reporter: Julian Reschke
>            Assignee: Julian Reschke
>
> In RevisionGC on Oracle, performance of deletes is bad. Using RevisionGCTest:
> {noformat}
> VersionGCStats{ignoredGCDueToCheckPoint=false, canceled=false, deletedDocGCCount=92092 (of which leaf: 92000), updateResurrectedGCCount=0, splitDocGCCount=1, intermediateSplitDocGCCount=0, iterationCount=2, timeActive=7.429 min, timeToCollectDeletedDocs=1394 ms, timeToCheckDeletedDocs=250.4 ms, timeToSortDocIds=374.0 us, timeTakenToUpdateResurrectedDocs=17.00 us, timeTakenToDeleteDeletedDocs=7.399 min, timeTakenToCollectAndDeleteSplitDocs=97.95 ms}
> Performed RevisionGC in 7,429 min
> {noformat}
> Compared with DB2:
> {noformat}
> VersionGCStats{ignoredGCDueToCheckPoint=false, canceled=false, deletedDocGCCount=96096 (of which leaf: 96000), updateResurrectedGCCount=0, splitDocGCCount=1, intermediateSplitDocGCCount=0, iterationCount=2, timeActive=8.240 s, timeToCollectDeletedDocs=1780 ms, timeToCheckDeletedDocs=259.7 ms, timeToSortDocIds=237.0 us, timeTakenToUpdateResurrectedDocs=19.00 us, timeTakenToDeleteDeletedDocs=4.552 s, timeTakenToCollectAndDeleteSplitDocs=685.4 ms}
> Performed RevisionGC in 8,243 s
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)