You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Alex Parvulescu (Updated) (JIRA)" <ji...@apache.org> on 2011/11/24 10:50:41 UTC

[jira] [Updated] (JCR-2906) Multivalued property sorted by last/random value

     [ https://issues.apache.org/jira/browse/JCR-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alex Parvulescu updated JCR-2906:
---------------------------------

    Attachment: JCR-2906.patch

Really good analysis, thanks for pointing out where the problem is!

The problem is not that the JCR spec may or may not define sorting on a multi-valued property. the problem is the sort behavior is not stable when dealing with MVPs.

Like Paul correctly pointed out, whenever there is a MVP present, the value in the cache gets overwritten by the last value found by the lucene Term query. So in fact an MVP is represented in the sort by just one of its values (which can apparently change at runtime - that is easily reproducible by running the attached test a few times).

The solution is to use the position info that comes via lucene's TermPositions. This does contain the term's position within the current document allowing us to use it as an index for MVPs.
The downside is that the Comparables have to support arrays as well as simple values, so I've added a class (ComparableArray) that simply delegates compareTo calls to the inner array of Comparables. This way all the sql languages (xpath&sql&sql2) have similar sort for MVPs.



Attaching patch.

                
> Multivalued property sorted by last/random value
> ------------------------------------------------
>
>                 Key: JCR-2906
>                 URL: https://issues.apache.org/jira/browse/JCR-2906
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: indexing
>    Affects Versions: 2.2
>         Environment: Windows 7, Sun JDK 1.6.0_23
>            Reporter: Paul Lysak
>              Labels: multivalued, sort
>         Attachments: JCR-2906.patch
>
>
> Sorting on multivalued property may produce incorrect result because sorting is performed only by last value of multivalued property.
> Steps to reproduce:
> 1. Create multivalued field in repository. Example from nodetypes file:
> <propertyDefinition name="MyProperty" requiredType="String" autoCreated="false" mandatory="false"
>    onParentVersion="COPY" protected="false" multiple="false">
> 2. Create few records so that all records except one would contain single value for MyProperty and one record would contain 
> first value which is greater then of any other record and the second value is somewhere in the middle. Here is an example:
> 1st record: "aaaa"
> 2nd record: "cccc"
> 3rd record: "dddd", "bbbb"
> 3. Run some query which sorts Example of XPath query:
> //*[...here are some criteria...] order by @MyProperty ascending
> The query would return documents in such order:
> "aaaa"
> "dddd", "bbbb"
> "cccc"
> which is not expected order (expected same order as they were entered - as "aaaa" < "cccc", "cccc" < "dddd")
> After some digging I found out that it happens because method 
> org.apache.jackrabbit.core.query.lucene.SharedFieldCache.getValueIndex
> (called from org.apache.jackrabbit.core.query.lucene.SharedFieldSortComparator.SimpleScoreDocComparator constructor)
> returns only last Comparable of the document. Here is overwrites previous value:
> retArray[termDocs.doc()] = getValue(value, type);
> I tried to concatenate comparables (just to check if it would work for my case):
> 	if(retArray[termDocs.doc()] == null) {
> 		retArray[termDocs.doc()] = getValue(value, type);
> 	} else {
> 		retArray[termDocs.doc()] =
> 				retArray[termDocs.doc()] + " " + getValue(value, type);
> 	}
> But it didn't worked well either - TermEnum returns terms not in the same order as JackRabbit returns values of multivalued field
> (as an example ["qwer", "asdf"] may become ["asdf", "qwer"] ). So, simple concatenation doesn't help.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira