You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Julian Reschke (JIRA)" <ji...@apache.org> on 2017/10/06 11:51:00 UTC

[jira] [Comment Edited] (OAK-6789) RDB: RevisionGC performance on Oracle

    [ https://issues.apache.org/jira/browse/OAK-6789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16194484#comment-16194484 ] 

Julian Reschke edited comment on OAK-6789 at 10/6/17 11:50 AM:
---------------------------------------------------------------

Further tests show that it's indeed the condition on MODIFIED that triggers the behavior.

Removing it (breaking the API contract...) fixes the performance problem. So does adding an index on MODIFIED:

{noformat}
------------------------------------------------------------------------------------------------------

| Id  | Operation                             | Name         | Rows  | Bytes | Cost (%CPU)| Time     |

------------------------------------------------------------------------------------------------------

|   0 | DELETE STATEMENT                      |              |     1 |   122 |  20   (0)| 00:00:01 |

|   1 |  DELETE                               | NODES        |       |       |          |          |

|   2 |   INLIST ITERATOR                     |              |       |       |          |          |

|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| NODES        |     1 |   122 |  20   (0)| 00:00:01 |

|*  4 |     INDEX RANGE SCAN                  | MODIFIED_IDX |     8 |       |  18   (0)| 00:00:01 |

------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("ID"='x1' AND "MODIFIED"=1 OR "ID"='x10' AND "MODIFIED"=10 OR "ID"
='x11' AND

              "MODIFIED"=11 OR "ID"='x12' AND "MODIFIED"=12 OR "ID"='x13' AND "M
ODIFIED"=13 OR "ID"='x14'

              AND "MODIFIED"=14 OR "ID"='x15' AND "MODIFIED"=15 OR "ID"='x16' AN
D "MODIFIED"=16 OR

              "ID"='x2' AND "MODIFIED"=2 OR "ID"='x3' AND "MODIFIED"=3 OR "ID"='
x4' AND "MODIFIED"=4 OR

              "ID"='x5' AND "MODIFIED"=5 OR "ID"='x6' AND "MODIFIED"=6 OR "ID"='
x7' AND "MODIFIED"=7 OR

              "ID"='x8' AND "MODIFIED"=8 OR "ID"='x9' AND "MODIFIED"=9)
   4 - access("MODIFIED"=1 OR "MODIFIED"=10 OR "MODIFIED"=11 OR "MODIFIED"=12 OR


              "MODIFIED"=13 OR "MODIFIED"=14 OR "MODIFIED"=15 OR "MODIFIED"=16 O
R "MODIFIED"=2 OR

              "MODIFIED"=3 OR "MODIFIED"=4 OR "MODIFIED"=5 OR "MODIFIED"=6 OR "M
ODIFIED"=7 OR "MODIFIED"=8

              OR "MODIFIED"=9)
{noformat}

Note that this switches step 4 to an index scan on MODIFIED_IDX rather than the primary key index. This fixes the performance in the test case, but might be undesirable in large repos where many rows have the same modified date.

It would be interesting to understand why Oracle prefers the MODIFIED_IDX over the primrary key index, when present. FWWI; it would be possible to add a hint to the query to force Oracle to use a specific index (not tried over JDBC, through).

Finally for completeness: the plan when we have a composite index on ID and MODIFIED:

{noformat}
| Id  | Operation          | Name            | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------------------

|   0 | DELETE STATEMENT   |                 |     1 |   122 |    19   (0)| 00:00:01 |

|   1 |  DELETE            | NODES           |       |       |            |     |

|   2 |   INLIST ITERATOR  |                 |       |       |            |     |

|*  3 |    INDEX RANGE SCAN| ID_MODIFIED_IDX |     1 |   122 |    19   (0)| 00:00:01 |

--------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access(("ID"='x1' AND "MODIFIED"=1 OR "ID"='x10' AND "MODIFIED"=10 OR
              "ID"='x11' AND "MODIFIED"=11 OR "ID"='x12' AND "MODIFIED"=12 OR "I
D"='x13'

              AND "MODIFIED"=13 OR "ID"='x14' AND "MODIFIED"=14 OR "ID"='x15' AN
D

              "MODIFIED"=15 OR "ID"='x16' AND "MODIFIED"=16 OR "ID"='x2' AND "MO
DIFIED"=2

              OR "ID"='x3' AND "MODIFIED"=3 OR "ID"='x4' AND "MODIFIED"=4 OR "ID
"='x5' AND

              "MODIFIED"=5 OR "ID"='x6' AND "MODIFIED"=6 OR "ID"='x7' AND "MODIF
IED"=7 OR

              "ID"='x8' AND "MODIFIED"=8 OR "ID"='x9' AND "MODIFIED"=9))
{noformat}

which allows to process the whole condition with a single index lookup, but might have undesirable effects elsewhere.



was (Author: reschke):
Further tests show that it's indeed the condition on MODIFIED that triggers the behavior.

Removing it (breaking the API contract...) fixes the performance problem. So does adding an index on MODIFIED:

{noformat}
------------------------------------------------------------------------------------------------------

| Id  | Operation                             | Name         | Rows  | Bytes | Cost (%CPU)| Time     |

------------------------------------------------------------------------------------------------------

|   0 | DELETE STATEMENT                      |              |     1 |   122 |  20   (0)| 00:00:01 |

|   1 |  DELETE                               | NODES        |       |       |          |          |

|   2 |   INLIST ITERATOR                     |              |       |       |          |          |

|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| NODES        |     1 |   122 |  20   (0)| 00:00:01 |

|*  4 |     INDEX RANGE SCAN                  | MODIFIED_IDX |     8 |       |  18   (0)| 00:00:01 |

------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("ID"='x1' AND "MODIFIED"=1 OR "ID"='x10' AND "MODIFIED"=10 OR "ID"
='x11' AND

              "MODIFIED"=11 OR "ID"='x12' AND "MODIFIED"=12 OR "ID"='x13' AND "M
ODIFIED"=13 OR "ID"='x14'

              AND "MODIFIED"=14 OR "ID"='x15' AND "MODIFIED"=15 OR "ID"='x16' AN
D "MODIFIED"=16 OR

              "ID"='x2' AND "MODIFIED"=2 OR "ID"='x3' AND "MODIFIED"=3 OR "ID"='
x4' AND "MODIFIED"=4 OR

              "ID"='x5' AND "MODIFIED"=5 OR "ID"='x6' AND "MODIFIED"=6 OR "ID"='
x7' AND "MODIFIED"=7 OR

              "ID"='x8' AND "MODIFIED"=8 OR "ID"='x9' AND "MODIFIED"=9)
   4 - access("MODIFIED"=1 OR "MODIFIED"=10 OR "MODIFIED"=11 OR "MODIFIED"=12 OR


              "MODIFIED"=13 OR "MODIFIED"=14 OR "MODIFIED"=15 OR "MODIFIED"=16 O
R "MODIFIED"=2 OR

              "MODIFIED"=3 OR "MODIFIED"=4 OR "MODIFIED"=5 OR "MODIFIED"=6 OR "M
ODIFIED"=7 OR "MODIFIED"=8

              OR "MODIFIED"=9)
{noformat}

Note that this switches step 4 to an index scan on MODIFIED_IDX rather than the primary key index. This fixes the performance in the test case, but might be undesirable in large repos where many rows have the same modified date.

It would be interesting to understand why Oracle prefers the MODIFIED_IDX over the primrary key index, when present. FWWI; it would be possible to add a hint to the query to force Oracle to use a specific index (not tried over JDBC, through).

Finally for completeness: the plan when we have a composite index on ID and MODIFIED:

{noformat}
| Id  | Operation          | Name            | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------------------

|   0 | DELETE STATEMENT   |                 |     1 |   122 |    19   (0)| 00:00:01 |

|   1 |  DELETE            | NODES           |       |       |            |     |

|   2 |   INLIST ITERATOR  |                 |       |       |            |     |

|*  3 |    INDEX RANGE SCAN| ID_MODIFIED_IDX |     1 |   122 |    19   (0)| 00:00:01 |

--------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access(("ID"='x1' AND "MODIFIED"=1 OR "ID"='x10' AND "MODIFIED"=10 OR
              "ID"='x11' AND "MODIFIED"=11 OR "ID"='x12' AND "MODIFIED"=12 OR "I
D"='x13'

              AND "MODIFIED"=13 OR "ID"='x14' AND "MODIFIED"=14 OR "ID"='x15' AN
D

              "MODIFIED"=15 OR "ID"='x16' AND "MODIFIED"=16 OR "ID"='x2' AND "MO
DIFIED"=2

              OR "ID"='x3' AND "MODIFIED"=3 OR "ID"='x4' AND "MODIFIED"=4 OR "ID
"='x5' AND

              "MODIFIED"=5 OR "ID"='x6' AND "MODIFIED"=6 OR "ID"='x7' AND "MODIF
IED"=7 OR

              "ID"='x8' AND "MODIFIED"=8 OR "ID"='x9' AND "MODIFIED"=9))
{noformat}

which allows to process the whole condition with a single index lookup, but might hav undesirable effects elsewhere.


> RDB: RevisionGC performance on Oracle
> -------------------------------------
>
>                 Key: OAK-6789
>                 URL: https://issues.apache.org/jira/browse/OAK-6789
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: rdbmk
>            Reporter: Julian Reschke
>            Assignee: Julian Reschke
>
> In RevisionGC on Oracle, performance of deletes is bad. Using RevisionGCTest:
> {noformat}
> VersionGCStats{ignoredGCDueToCheckPoint=false, canceled=false, deletedDocGCCount=92092 (of which leaf: 92000), updateResurrectedGCCount=0, splitDocGCCount=1, intermediateSplitDocGCCount=0, iterationCount=2, timeActive=7.429 min, timeToCollectDeletedDocs=1394 ms, timeToCheckDeletedDocs=250.4 ms, timeToSortDocIds=374.0 us, timeTakenToUpdateResurrectedDocs=17.00 us, timeTakenToDeleteDeletedDocs=7.399 min, timeTakenToCollectAndDeleteSplitDocs=97.95 ms}
> Performed RevisionGC in 7,429 min
> {noformat}
> Compared with DB2:
> {noformat}
> VersionGCStats{ignoredGCDueToCheckPoint=false, canceled=false, deletedDocGCCount=96096 (of which leaf: 96000), updateResurrectedGCCount=0, splitDocGCCount=1, intermediateSplitDocGCCount=0, iterationCount=2, timeActive=8.240 s, timeToCollectDeletedDocs=1780 ms, timeToCheckDeletedDocs=259.7 ms, timeToSortDocIds=237.0 us, timeTakenToUpdateResurrectedDocs=19.00 us, timeTakenToDeleteDeletedDocs=4.552 s, timeTakenToCollectAndDeleteSplitDocs=685.4 ms}
> Performed RevisionGC in 8,243 s
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)