You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by "Marcel Reutegger (JIRA)" <ji...@apache.org> on 2005/10/18 17:43:44 UTC

[jira] Created: (JCR-257) Use separate index for jcr:system tree

Use separate index for jcr:system tree
--------------------------------------

         Key: JCR-257
         URL: http://issues.apache.org/jira/browse/JCR-257
     Project: Jackrabbit
        Type: Improvement
    Reporter: Marcel Reutegger
 Assigned to: Marcel Reutegger 
    Priority: Minor
     Fix For: 1.0


Currently each workspace index also includes index data of repository wide data (e.g. version nodes under jcr:system). There are several drawbacks with this approach:

- indexing is duplicated and does not scale when using a lot of workspaces
- workspaces cannot be 'put to sleep' when they are not actively used.

The repository should have an additional index for system data, which includes: versioning and nodetype representation in content. Basically data under /jcr:system.

Queries issued on a workspace will then use two index to execute the query: the workspace index and the system index.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (JCR-257) Use separate index for jcr:system tree

Posted by "Przemo Pakulski (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/JCR-257?page=comments#action_12370821 ] 

Przemo Pakulski commented on JCR-257:
-------------------------------------

I checked r386604 and it looks that doesn't work as expected.

Even if 'repository wide system index that contains /jcr:system tree' is not configured, following nodes are still indexed : versionLabels, versionStorage, versionHistory, version, frozenNode.

Additionally index is duplicated over all workspaces, what lead to huge index size and performance/memory problems especially if we use many workspaces.

What's interesting is that if I remove all index folders, and restart repository then all indexes are rebuilded without mentioned nodes, and indexes are much smaller then. 

> Use separate index for jcr:system tree
> --------------------------------------
>
>          Key: JCR-257
>          URL: http://issues.apache.org/jira/browse/JCR-257
>      Project: Jackrabbit
>         Type: Improvement
>     Reporter: Marcel Reutegger
>     Assignee: Marcel Reutegger
>     Priority: Minor
>      Fix For: 1.0

>
> Currently each workspace index also includes index data of repository wide data (e.g. version nodes under jcr:system). There are several drawbacks with this approach:
> - indexing is duplicated and does not scale when using a lot of workspaces
> - workspaces cannot be 'put to sleep' when they are not actively used.
> The repository should have an additional index for system data, which includes: versioning and nodetype representation in content. Basically data under /jcr:system.
> Queries issued on a workspace will then use two index to execute the query: the workspace index and the system index.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Resolved: (JCR-257) Use separate index for jcr:system tree

Posted by "Marcel Reutegger (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/JCR-257?page=all ]
     
Marcel Reutegger resolved JCR-257:
----------------------------------

    Resolution: Fixed

Also updated repository.xml files in contrib projects.

> Use separate index for jcr:system tree
> --------------------------------------
>
>          Key: JCR-257
>          URL: http://issues.apache.org/jira/browse/JCR-257
>      Project: Jackrabbit
>         Type: Improvement
>     Reporter: Marcel Reutegger
>     Assignee: Marcel Reutegger
>     Priority: Minor
>      Fix For: 1.0

>
> Currently each workspace index also includes index data of repository wide data (e.g. version nodes under jcr:system). There are several drawbacks with this approach:
> - indexing is duplicated and does not scale when using a lot of workspaces
> - workspaces cannot be 'put to sleep' when they are not actively used.
> The repository should have an additional index for system data, which includes: versioning and nodetype representation in content. Basically data under /jcr:system.
> Queries issued on a workspace will then use two index to execute the query: the workspace index and the system index.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (JCR-257) Use separate index for jcr:system tree

Posted by "Marcel Reutegger (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/JCR-257?page=comments#action_12360905 ] 

Marcel Reutegger commented on JCR-257:
--------------------------------------

Separated index as proposed. There is one repository wide system index that contains /jcr:system tree. In addition to this 'global' index there are still the workspace indexes as before. Queries are executed on both indexes and will return results from both indexes.

Separating the indexes now also allows to disable indexing of versions. One simply does not configure a system index on the repository level.

Important note: this causes a minor backward compatibility issue. Existing configurations do not have a system search index configured on the repository level and will not index versions anymore. That means, queries will return versions of nodes that have been checked in before this code change but no checkins after this change. Apart from that Jackrabbit will work just fine. If you need to search versions of nodes see below how.

Migration instructions:
- add a SearchIndex element at the end of the repository configuration. See jackrabbit/src/main/config/repository.xml for an example
- delete index folders in all your workspace directories
- restart jackrabbit (will re-index workspaces and jcr:system tree)

svn revision: 357961

> Use separate index for jcr:system tree
> --------------------------------------
>
>          Key: JCR-257
>          URL: http://issues.apache.org/jira/browse/JCR-257
>      Project: Jackrabbit
>         Type: Improvement
>     Reporter: Marcel Reutegger
>     Assignee: Marcel Reutegger
>     Priority: Minor
>      Fix For: 1.0

>
> Currently each workspace index also includes index data of repository wide data (e.g. version nodes under jcr:system). There are several drawbacks with this approach:
> - indexing is duplicated and does not scale when using a lot of workspaces
> - workspaces cannot be 'put to sleep' when they are not actively used.
> The repository should have an additional index for system data, which includes: versioning and nodetype representation in content. Basically data under /jcr:system.
> Queries issued on a workspace will then use two index to execute the query: the workspace index and the system index.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (JCR-257) Use separate index for jcr:system tree

Posted by "Marcel Reutegger (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/JCR-257?page=comments#action_12371109 ] 

Marcel Reutegger commented on JCR-257:
--------------------------------------

Paths of events that origin from the version storage are wrong and thus are not filtered correctly anymore. This problem was probably introduced when fixing JCR-141.

Test cases should be extended to check paths of version events.

> Use separate index for jcr:system tree
> --------------------------------------
>
>          Key: JCR-257
>          URL: http://issues.apache.org/jira/browse/JCR-257
>      Project: Jackrabbit
>         Type: Improvement
>     Reporter: Marcel Reutegger
>     Assignee: Marcel Reutegger
>     Priority: Minor
>      Fix For: 1.0

>
> Currently each workspace index also includes index data of repository wide data (e.g. version nodes under jcr:system). There are several drawbacks with this approach:
> - indexing is duplicated and does not scale when using a lot of workspaces
> - workspaces cannot be 'put to sleep' when they are not actively used.
> The repository should have an additional index for system data, which includes: versioning and nodetype representation in content. Basically data under /jcr:system.
> Queries issued on a workspace will then use two index to execute the query: the workspace index and the system index.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Updated: (JCR-257) Use separate index for jcr:system tree

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/JCR-257?page=all ]

Jukka Zitting updated JCR-257:
------------------------------

    Fix Version:     (was: 1.0)
        Version: 0.9
                 1.0

> Use separate index for jcr:system tree
> --------------------------------------
>
>          Key: JCR-257
>          URL: http://issues.apache.org/jira/browse/JCR-257
>      Project: Jackrabbit
>         Type: Improvement
>     Versions: 0.9, 1.0
>     Reporter: Marcel Reutegger
>     Assignee: Marcel Reutegger
>     Priority: Minor

>
> Currently each workspace index also includes index data of repository wide data (e.g. version nodes under jcr:system). There are several drawbacks with this approach:
> - indexing is duplicated and does not scale when using a lot of workspaces
> - workspaces cannot be 'put to sleep' when they are not actively used.
> The repository should have an additional index for system data, which includes: versioning and nodetype representation in content. Basically data under /jcr:system.
> Queries issued on a workspace will then use two index to execute the query: the workspace index and the system index.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Reopened: (JCR-257) Use separate index for jcr:system tree

Posted by "Marcel Reutegger (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/JCR-257?page=all ]
     
Marcel Reutegger reopened JCR-257:
----------------------------------


> Use separate index for jcr:system tree
> --------------------------------------
>
>          Key: JCR-257
>          URL: http://issues.apache.org/jira/browse/JCR-257
>      Project: Jackrabbit
>         Type: Improvement
>     Reporter: Marcel Reutegger
>     Assignee: Marcel Reutegger
>     Priority: Minor
>      Fix For: 1.0

>
> Currently each workspace index also includes index data of repository wide data (e.g. version nodes under jcr:system). There are several drawbacks with this approach:
> - indexing is duplicated and does not scale when using a lot of workspaces
> - workspaces cannot be 'put to sleep' when they are not actively used.
> The repository should have an additional index for system data, which includes: versioning and nodetype representation in content. Basically data under /jcr:system.
> Queries issued on a workspace will then use two index to execute the query: the workspace index and the system index.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira