You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Thomas Mueller (JIRA)" <ji...@apache.org> on 2017/01/11 09:38:58 UTC
[jira] [Comment Edited] (OAK-5369) Lucene Property Index: Syntax Error, cannot parse

    [ https://issues.apache.org/jira/browse/OAK-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15817803#comment-15817803 ] 

Thomas Mueller edited comment on OAK-5369 at 1/11/17 9:38 AM:
--------------------------------------------------------------

The Lucene index uses Lucene StandardQueryParser to parse the raw "contains" query (LucenePropertyIndex.tokenToQuery). There is some pre-processing done to escape characters {{':' , '/', '!', '&', '|', '='}} in rewriteQueryText, but "-" and "+" are never escaped. According to the Lucene documentation, they are [special characters|http://lucene.apache.org/core/4_0_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#+]. 

The Lucene StandardQueryParser is generated using JavaCC, which makes it hard to debug. I made some tests and found out '\+' and '\-' doesn't need to be escaped if they are alone or part of a term, for example "\+", "\-", "hello\+world", "hello\-world", "h\+", "h\-", "\-h\+", "\-h\-" are all fine. However, tokens that start with "\-\-", "\+\-", "\-\+", or "\+\+" for some reason are _not_ fine, and throw Syntax Error as above. 

Also not fine are "\-(", "\+)", "\+{", "\+^", "\+"", "\+\~", but those are more clearly syntax errors. I think combinations of "\+" and "\-" are quite common and should be supported.


was (Author: tmueller):
The Lucene index uses Lucene StandardQueryParser to parse the raw "contains" query (LucenePropertyIndex.tokenToQuery). There is some pre-processing done to escape characters {{':' , '/', '!', '&', '|', '='}} in rewriteQueryText, but "-" and "+" are never escaped. According to the Lucene documentation, they are [special characters|http://lucene.apache.org/core/4_0_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#+]. 

The Lucene StandardQueryParser is generated using JavaCC, which makes it hard to debug. I made some tests and found out '+' and '-' doesn't need to be escaped if they are alone or part of a term, for example "+", "-", "hello+world", "hello-world", "h+", "h-", "-h+", "-h-" are all fine. However, tokens that start with "--", "+-", "-+", or "++" for some reason are _not_ fine, and throw Syntax Error as above. 

Also not fine are "-(", "+)", "+{", "+^", "+"", "+~", but those are more clearly syntax errors. I think combinations of "+" and "-" are quite common and should be supported.

> Lucene Property Index: Syntax Error, cannot parse
> -------------------------------------------------
>
>                 Key: OAK-5369
>                 URL: https://issues.apache.org/jira/browse/OAK-5369
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: lucene
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>             Fix For: 1.5.18, 1.6, 1.4.13
>
>
> The following query throws an exception in Apache Lucene:
> {noformat}
> /jcr:root//*[jcr:contains(., 'hello -- world')]
> 22.12.2016 16:42:54.511 *WARN* [qtp1944702753-3846] org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex query via org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex@1c0006db failed.
> java.lang.RuntimeException: INVALID_SYNTAX_CANNOT_PARSE: Syntax Error, cannot parse hello -- world:  
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex.tokenToQuery(LucenePropertyIndex.java:1450)
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex.tokenToQuery(LucenePropertyIndex.java:1418)
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex.access$900(LucenePropertyIndex.java:180)
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$3.visitTerm(LucenePropertyIndex.java:1353)
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$3.visit(LucenePropertyIndex.java:1307)
> 	at org.apache.jackrabbit.oak.query.fulltext.FullTextContains.accept(FullTextContains.java:63)
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex.getFullTextQuery(LucenePropertyIndex.java:1303)
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex.getLuceneRequest(LucenePropertyIndex.java:791)
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex.access$300(LucenePropertyIndex.java:180)
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$1.loadDocs(LucenePropertyIndex.java:375)
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$1.computeNext(LucenePropertyIndex.java:317)
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$1.computeNext(LucenePropertyIndex.java:306)
> 	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> 	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LucenePathCursor$1.hasNext(LucenePropertyIndex.java:1571)
> 	at com.google.common.collect.Iterators$7.computeNext(Iterators.java:645)
> 	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> 	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> 	at org.apache.jackrabbit.oak.spi.query.Cursors$PathCursor.hasNext(Cursors.java:205)
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex$LucenePathCursor.hasNext(LucenePropertyIndex.java:1595)
> 	at org.apache.jackrabbit.oak.query.ast.SelectorImpl.next(SelectorImpl.java:420)
> 	at org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.fetchNext(QueryImpl.java:828)
> 	at org.apache.jackrabbit.oak.query.QueryImpl$RowIterator.hasNext(QueryImpl.java:853)
> 	at org.apache.jackrabbit.oak.jcr.query.QueryResultImpl$1.fetch(QueryResultImpl.java:98)
> 	at org.apache.jackrabbit.oak.jcr.query.QueryResultImpl$1.<init>(QueryResultImpl.java:94)
> 	at org.apache.jackrabbit.oak.jcr.query.QueryResultImpl.getRows(QueryResultImpl.java:78)
> Caused by: org.apache.lucene.queryparser.flexible.standard.parser.ParseException: Syntax Error, cannot parse hello -- world:  
> 	at org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.generateParseException(StandardSyntaxParser.java:1054)
> 	at org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.jj_consume_token(StandardSyntaxParser.java:936)
> 	at org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.Clause(StandardSyntaxParser.java:486)
> 	at org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.ModClause(StandardSyntaxParser.java:303)
> 	at org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.ConjQuery(StandardSyntaxParser.java:234)
> 	at org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.DisjQuery(StandardSyntaxParser.java:204)
> 	at org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.Query(StandardSyntaxParser.java:186)
> 	at org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.TopLevelQuery(StandardSyntaxParser.java:147)
> 	at org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.parse(StandardSyntaxParser.java:65)
> 	at org.apache.lucene.queryparser.flexible.core.QueryParserHelper.parse(QueryParserHelper.java:250)
> 	at org.apache.lucene.queryparser.flexible.standard.StandardQueryParser.parse(StandardQueryParser.java:168)
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.LucenePropertyIndex.tokenToQuery(LucenePropertyIndex.java:1448)
> 	... 63 common frames omitted
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)