You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2004/03/02 16:34:59 UTC

Re: Question regarding escaped sequence

I have a feeling that query escaping really is broken in Lucene.
Try running the class below like this:

prompt> java Escaper '+string' '\+string'

I get:

$ java Escaper '+string' '\+string'
0: +string
1: \+string
QUERY: \+string
HITS: 0

That should give me 1 hit, shouldn't it?

import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.analysis.*;
import org.apache.lucene.search.*;
import org.apache.lucene.index.*;
import org.apache.lucene.store.*;
import org.apache.lucene.document.*;

public class Escaper
{
    public static void main(String[] args) throws Exception
    {
        System.out.println("0: " + args[0]);
        System.out.println("1: " + args[1]);
        
        Directory dir = new RAMDirectory();
        IndexWriter writer = new IndexWriter(dir, new
WhitespaceAnalyzer(), true);
        Document doc = new Document();
        doc.add(Field.Text("text", args[0]));
        writer.addDocument(doc);
        writer.optimize();
        writer.close();

        QueryParser qp = new QueryParser("text", new
WhitespaceAnalyzer());
        Query q = qp.parse(args[1]);
        System.out.println("QUERY: " + q.toString("text"));

        IndexSearcher searcher = new IndexSearcher(dir);
        Hits hits = searcher.search(q);
        System.out.println("HITS: " + hits.length());
        searcher.close();
    }
}

Thanks,
Otis


--- Jean-Francois Halleux <ha...@skynet.be> wrote:
> Hello,
> 
> 	in TestQueryParser, method testEscaped(), I see the following:
> 
> ...
> assertQueryEquals("\\+blah", a, "\\+blah");
> assertQueryEquals("\\(blah", a, "\\(blah");
> 
> assertQueryEquals("\\-blah", a, "\\-blah");
> assertQueryEquals("\\!blah", a, "\\!blah");
> assertQueryEquals("\\{blah", a, "\\{blah");
> assertQueryEquals("\\}blah", a, "\\}blah");
> ...
> 
> is this really the expected behavior? Shouldn't \\-blah be
> interpreted
> as -blah and \\!blah as !blah ?
> 
> Thanks,
> 
> Jean-Francois Halleux
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: Queries with only non required terms: not as OR?

Posted by Paul Elschot <pa...@xs4all.nl>.

Doug,

On Wednesday 03 March 2004 18:47, Doug Cutting wrote:
> Paul Elschot wrote:
> > I read a bit into the source code and I found this comment at
> > BooleanQuery.scorer():
> >
> > // Also, at this point a
> > // BooleanScorer cannot be embedded in a ConjunctionScorer, as the hits
> > // from a BooleanScorer are not always sorted by document number (sigh)
> > // and hence BooleanScorer cannot implement skipTo() correctly, which is
> > // required by ConjunctionScorer.
> >
> > The test function I used assumes that documents will be collected in
> > order. Could this be the source of the problem?
>
> It could be.

I'll make the test search in the array of doc nrs that it receives now.

> I only realized recently that BooleanScorer does some local reordering
> of document numbers passed to the HitCollector.  There's no easy fix.

I assume it works correctly, so why fix it, except for speed?

> When I get a chance I intend to rewrite BooleanScorer to fix this and to
> correctly implement skipTo().  The result will be somewhat slower for

You might find the previously posted test code to be a test case for
that. It's nice to see a possible real use this :) even though I was doing
something wrong.

> some queries, especially those with a large number of optional terms,
> but will sometimes be faster when it's nested in other queries, and
> skipTo() can be leveraged.  I would like to get to this in next few

When the two cases can be distinguished, you might try and leave the current
method in for the large number of optional terms.
I like speed, and I guess I'm not the only one. 
Also, with the term vectors in CVS one might expect more queries with optional
terms resulting from relevance feedback methods.

> weeks, and then make a 1.4 RC1 release.  The fix will take a few days
> work.  If I can find someone to fund the work it may happen sooner.
> Right now other projects have higher priority for me.

Lucene is moving fast enough for me...

Thanks a lot,
Paul.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: Queries with only non required terms: not as OR?

Posted by Doug Cutting <cu...@apache.org>.

Paul Elschot wrote:
> I read a bit into the source code and I found this comment at 
> BooleanQuery.scorer():
> 
> // Also, at this point a
> // BooleanScorer cannot be embedded in a ConjunctionScorer, as the hits
> // from a BooleanScorer are not always sorted by document number (sigh)
> // and hence BooleanScorer cannot implement skipTo() correctly, which is
> // required by ConjunctionScorer.
> 
> The test function I used assumes that documents will be collected in 
> order. Could this be the source of the problem?

It could be.

I only realized recently that BooleanScorer does some local reordering 
of document numbers passed to the HitCollector.  There's no easy fix.

When I get a chance I intend to rewrite BooleanScorer to fix this and to 
correctly implement skipTo().  The result will be somewhat slower for 
some queries, especially those with a large number of optional terms, 
but will sometimes be faster when it's nested in other queries, and 
skipTo() can be leveraged.  I would like to get to this in next few 
weeks, and then make a 1.4 RC1 release.  The fix will take a few days 
work.  If I can find someone to fund the work it may happen sooner. 
Right now other projects have higher priority for me.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: Queries with only non required terms: not as OR?

Posted by Paul Elschot <pa...@xs4all.nl>.

Hi,

I read a bit into the source code and I found this comment at 
BooleanQuery.scorer():

// Also, at this point a
// BooleanScorer cannot be embedded in a ConjunctionScorer, as the hits
// from a BooleanScorer are not always sorted by document number (sigh)
// and hence BooleanScorer cannot implement skipTo() correctly, which is
// required by ConjunctionScorer.

The test function I used assumes that documents will be collected in 
order. Could this be the source of the problem?

Paul.


On Tuesday 02 March 2004 21:48, Paul Elschot wrote:
> Hello,
>
> I'm trying to implement a query language with ao. AND and OR
> operators for Lucene. I can get the AND operator to work
> by mapping it to a BooleanQuery with only required terms.
> However, when I  try to implement the OR operator by
> mapping it to a BooleanQuery with non required terms,
> it happens that documents that match the AND like query:
>
> +word1 +word2
>
> do not match the OR like query:
>
> word1 word2
>
> This happens to the very first doc in the test database
> below.
>
> Strange enough this behaviour is not inconsistent with
> the documentation for BooleanQuery for non required
> terms. From the API java doc: "required means
> that documents which do not match this sub-query will
> not match the boolean query."
>
> (I'm using the WhitespaceAnalyzer for the queries and for
> indexing the test db.)
>
> I can only assume that I am doing something wrong.
> What is the right way to implement 'all docs that have
> at least one of'  (ie. OR like) queries in Lucene?
>
> Thanks,
> Paul
>
>
...


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Queries with only non required terms: not as OR?

Posted by Paul Elschot <pa...@xs4all.nl>.

Hello,

I'm trying to implement a query language with ao. AND and OR
operators for Lucene. I can get the AND operator to work
by mapping it to a BooleanQuery with only required terms.
However, when I  try to implement the OR operator by
mapping it to a BooleanQuery with non required terms,
it happens that documents that match the AND like query:

+word1 +word2

do not match the OR like query:

word1 word2

This happens to the very first doc in the test database
below.

Strange enough this behaviour is not inconsistent with
the documentation for BooleanQuery for non required
terms. From the API java doc: "required means
that documents which do not match this sub-query will
not match the boolean query."

(I'm using the WhitespaceAnalyzer for the queries and for
indexing the test db.)

I can only assume that I am doing something wrong.
What is the right way to implement 'all docs that have
at least one of'  (ie. OR like) queries in Lucene?

Thanks,
Paul


P.S. The source code for the test is inline below. It can be
put in the file
src/test/org/apache/lucene/TestSearchL.java
of a cvs working copy after which 'ant test' shows
two passing test cases for AND (test03And..), and
two failing test cases for OR (test04Or..).
The other tests have been disabled by changing
their name from test... to tst.., these pass normally
when enabled.
The code was derived from TestSearch.java, and I left
out the licence here for brevity:


package org.apache.lucene;

import junit.framework.TestCase;
import junit.framework.TestSuite;
import junit.textui.TestRunner;

import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Searcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.HitCollector;
import org.apache.lucene.queryParser.QueryParser;

public class TestSearchL extends TestCase {
    public static void main(String args[]) {
        TestRunner.run(new TestSuite(TestSearchL.class));
    }

    final String fieldName = "contents";

    String[] docs1 = {
        "word1 word2 word3",
        "word4 word5",
        "ord1 ord2 ord3",
        "orda1 orda2 orda3 word2 worda3",
        "a c e a b c"
    };

    Directory dBase1 = createDb(docs1);

    public void normalTest1(String query, int[] expdnrs) throws Exception {
        new NormalQueryTest( query, expdnrs, dBase1, docs1).doTest();
    }

    public void tst02Terms01() throws Exception {
        int[] expdnrs = {0}; normalTest1( "word1", expdnrs);
    }
    public void tst02Terms02() throws Exception {
        int[] expdnrs = {0, 1, 3}; normalTest1( "word*", expdnrs);
    }
    public void tst02Terms03() throws Exception {
        int[] expdnrs = {2}; normalTest1( "ord2", expdnrs);
    }
    public void tst02Terms04() throws Exception {
        int[] expdnrs = {}; normalTest1( "gnork*", expdnrs);
    }
    public void tst02Terms05() throws Exception {
        int[] expdnrs = {0, 1, 3}; normalTest1( "wor*", expdnrs);
    }
    public void tst02Terms06() throws Exception {
        int[] expdnrs = {}; normalTest1( "ab", expdnrs);
    }

    public void test03And01() throws Exception {
        int[] expdnrs = {0}; normalTest1( "+word1 +word2", expdnrs);
    }
    public void test03And02() throws Exception {
        int[] expdnrs = {3}; normalTest1( "+word* +ord*", expdnrs);
    }

    public void test04Or01() throws Exception {
        int[] expdnrs = {0, 3};	normalTest1( "word1 word2", expdnrs);
    }
    public void test04Or02() throws Exception {
        int[] expdnrs = {0, 1, 2, 3}; normalTest1( "word* ord*", expdnrs);
    }

    class NormalQueryTest {
	String queryText;
	final int[] expectedDocNrs;
	Directory dBase;
	String[] docs;

	NormalQueryTest(String qt, int[] expdnrs, Directory db, String[] documents) {
	    queryText = qt;
	    expectedDocNrs = expdnrs;
	    dBase = db;
	    docs = documents;
	}

	public void doTest() throws Exception {
	    Analyzer analyzer = new WhitespaceAnalyzer();
	    QueryParser parser = new QueryParser(fieldName, analyzer);
	    Query query = parser.parse(queryText);

	    System.out.println("QueryL: " + queryText);
	    System.out.println("ParsedL: " + query.toString());
	    TestCollector tc = new TestCollector();
	    Searcher searcher = new IndexSearcher(dBase);
	    try {
		searcher.search(query, tc);
	    } finally {
		searcher.close();
	    }
	    tc.checkNrHits();
	}

	class TestCollector extends HitCollector {
	    int totalMatched;

	    TestCollector() { totalMatched = 0; }

	    public void collect(int docNr, float score) {
		System.out.println(docNr + " '" + docs[docNr] + "': " + score);
		assertTrue(queryText + ": positive score", score > 0.0);
		assertTrue(queryText + ": too many hits", totalMatched < 
expectedDocNrs.length);
		assertEquals(queryText + ": doc nr for hit " + totalMatched, 
expectedDocNrs[totalMatched], docNr);
		totalMatched++;
	    }

	    void checkNrHits() { assertEquals(queryText + ": nr of hits", 
expectedDocNrs.length, totalMatched); }
	}
    }

    private Directory createDb(String[] docs) {
	try {
	    Directory directory = new RAMDirectory();
	    Analyzer analyzer = new WhitespaceAnalyzer();
	    IndexWriter writer = new IndexWriter(directory, analyzer, true);
	    for (int j = 0; j < docs.length; j++) {
		Document d = new Document();
		d.add(Field.Text(fieldName, docs[j]));
		writer.addDocument(d);
	    }
	    writer.close();
	    return directory;
	} catch (java.io.IOException ioe) {
	    throw new Error(ioe);
	}
    }
}



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: Question regarding escaped sequence

Posted by Otis Gospodnetic <ot...@yahoo.com>.

OK. I'll take a look at those patches again, as soon as I find time.

Otis

--- Erik Hatcher <er...@ehatchersolutions.com> wrote:
> I'm just as confused by QueryParser and character escaping as the
> next 
> guy :)
> 
> Jean-Francois' patches seemed fine to me, although if I remember 
> correctly there were lots of patches all merged together.  I'm weary
> of 
> applying too many things all at once.  Maybe I'm wrong about the 
> patches though.
> 
> 	Erik
> 
> 
> On Mar 2, 2004, at 2:33 PM, Otis Gospodnetic wrote:
> 
> > Yes, I'm aware of the patch.  I was looking at it today, and then
> your
> > old email below.  The patch assumes that the existing code and even
> > unit tests have a bug, and had it all along, which sounds amazing,
> so I
> > want to double-check with somebody on lucene-dev who knows
> QueryParser
> > and escaping issues better than me....Erik? :)
> >
> > Once we resolve this, I'll apply your patch, if the unit tests and
> the
> > code it tests really are buggy.
> >
> > \\-Otis
> >
> >
> > --- Jean-Francois Halleux <ha...@skynet.be> wrote:
> >> I fixed the escaping bug and others in the patch I submitted for
> Bug
> >> 24665:
> >> "[PATCH] Query parser doesn't handle escaped field names"
> >>
> >> I think the fix was clean. I traced it to an image token returned
> by
> >> JavaCC
> >> still containing the escaped char. I included several tests as
> well
> >> if I
> >> remember well.
> >>
> >> This patch never got applied, don't know why.
> >>
> >>
> >> KR,
> >>
> >> Jean-Francois Halleux
> >>
> >> -----Original Message-----
> >> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> >> Sent: mardi 2 mars 2004 16:35
> >> To: Lucene Developers List; halleux.jf@skynet.be
> >> Subject: Re: Question regarding escaped sequence
> >>
> >>
> >> I have a feeling that query escaping really is broken in Lucene.
> >> Try running the class below like this:
> >>
> >> prompt> java Escaper '+string' '\+string'
> >>
> >> I get:
> >>
> >> $ java Escaper '+string' '\+string'
> >> 0: +string
> >> 1: \+string
> >> QUERY: \+string
> >> HITS: 0
> >>
> >> That should give me 1 hit, shouldn't it?
> >>
> >> import org.apache.lucene.queryParser.QueryParser;
> >> import org.apache.lucene.analysis.*;
> >> import org.apache.lucene.earch.*;
> >> import org.apache.lucene.index.*;
> >> import org.apache.lucene.store.*;
> >> import org.apache.lucene.document.*;
> >>
> >> public class Escaper
> >> {
> >>     public static void main(String[] args) throws Exception
> >>     {
> >>         System.out.println("0: " + args[0]);
> >>         System.out.println("1: " + args[1]);
> >>
> >>         Directory dir = new RAMDirectory();
> >>         IndexWriter writer = new IndexWriter(dir, new
> >> WhitespaceAnalyzer(), true);
> >>         Document doc = new Document();
> >>         doc.add(Field.Text("text", args[0]));
> >>         writer.addDocument(doc);
> >>         writer.optimize();
> >>         writer.close();
> >>
> >>         QueryParser qp = new QueryParser("text", new
> >> WhitespaceAnalyzer());
> >>         Query q = qp.parse(args[1]);
> >>         System.out.println("QUERY: " + q.toString("text"));
> >>
> >>         IndexSearcher searcher = new IndexSearcher(dir);
> >>         Hits hits = searcher.search(q);
> >>         System.out.println("HITS: " + hits.length());
> >>         searcher.close();
> >>     }
> >> }
> >>
> >> Thanks,
> >> Otis
> >>
> >>
> >> --- Jean-Francois Halleux <ha...@skynet.be> wrote:
> >>> Hello,
> >>>
> >>> 	in TestQueryParser, method testEscaped(), I see the following:
> >>>
> >>> ...
> >>> assertQueryEquals("\\+blah", a, "\\+blah");
> >>> assertQueryEquals("\\(blah", a, "\\(blah");
> >>>
> >>> assertQueryEquals("\\-blah", a, "\\-blah");
> >>> assertQueryEquals("\\!blah", a, "\\!blah");
> >>> assertQueryEquals("\\{blah", a, "\\{blah");
> >>> assertQueryEquals("\\}blah", a, "\\}blah");
> >>> ...
> >>>
> >>> is this really the expected behavior? Shouldn't \\-blah be
> >>> interpreted
> >>> as -blah and \\!blah as !blah ?
> >>>
> >>> Thanks,
> >>>
> >>> Jean-Francois Halleux
> >>>
> >>>
> >>>
> >>
> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> >>> For additional commands, e-mail:
> lucene-dev-help@jakarta.apache.org
> >>>
> >>
> >>
> >>
> >>
> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> >> For additional commands, e-mail:
> lucene-dev-help@jakarta.apache.org
> >>
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: Question regarding escaped sequence

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

I'm just as confused by QueryParser and character escaping as the next 
guy :)

Jean-Francois' patches seemed fine to me, although if I remember 
correctly there were lots of patches all merged together.  I'm weary of 
applying too many things all at once.  Maybe I'm wrong about the 
patches though.

	Erik


On Mar 2, 2004, at 2:33 PM, Otis Gospodnetic wrote:

> Yes, I'm aware of the patch.  I was looking at it today, and then your
> old email below.  The patch assumes that the existing code and even
> unit tests have a bug, and had it all along, which sounds amazing, so I
> want to double-check with somebody on lucene-dev who knows QueryParser
> and escaping issues better than me....Erik? :)
>
> Once we resolve this, I'll apply your patch, if the unit tests and the
> code it tests really are buggy.
>
> \\-Otis
>
>
> --- Jean-Francois Halleux <ha...@skynet.be> wrote:
>> I fixed the escaping bug and others in the patch I submitted for Bug
>> 24665:
>> "[PATCH] Query parser doesn't handle escaped field names"
>>
>> I think the fix was clean. I traced it to an image token returned by
>> JavaCC
>> still containing the escaped char. I included several tests as well
>> if I
>> remember well.
>>
>> This patch never got applied, don't know why.
>>
>>
>> KR,
>>
>> Jean-Francois Halleux
>>
>> -----Original Message-----
>> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
>> Sent: mardi 2 mars 2004 16:35
>> To: Lucene Developers List; halleux.jf@skynet.be
>> Subject: Re: Question regarding escaped sequence
>>
>>
>> I have a feeling that query escaping really is broken in Lucene.
>> Try running the class below like this:
>>
>> prompt> java Escaper '+string' '\+string'
>>
>> I get:
>>
>> $ java Escaper '+string' '\+string'
>> 0: +string
>> 1: \+string
>> QUERY: \+string
>> HITS: 0
>>
>> That should give me 1 hit, shouldn't it?
>>
>> import org.apache.lucene.queryParser.QueryParser;
>> import org.apache.lucene.analysis.*;
>> import org.apache.lucene.earch.*;
>> import org.apache.lucene.index.*;
>> import org.apache.lucene.store.*;
>> import org.apache.lucene.document.*;
>>
>> public class Escaper
>> {
>>     public static void main(String[] args) throws Exception
>>     {
>>         System.out.println("0: " + args[0]);
>>         System.out.println("1: " + args[1]);
>>
>>         Directory dir = new RAMDirectory();
>>         IndexWriter writer = new IndexWriter(dir, new
>> WhitespaceAnalyzer(), true);
>>         Document doc = new Document();
>>         doc.add(Field.Text("text", args[0]));
>>         writer.addDocument(doc);
>>         writer.optimize();
>>         writer.close();
>>
>>         QueryParser qp = new QueryParser("text", new
>> WhitespaceAnalyzer());
>>         Query q = qp.parse(args[1]);
>>         System.out.println("QUERY: " + q.toString("text"));
>>
>>         IndexSearcher searcher = new IndexSearcher(dir);
>>         Hits hits = searcher.search(q);
>>         System.out.println("HITS: " + hits.length());
>>         searcher.close();
>>     }
>> }
>>
>> Thanks,
>> Otis
>>
>>
>> --- Jean-Francois Halleux <ha...@skynet.be> wrote:
>>> Hello,
>>>
>>> 	in TestQueryParser, method testEscaped(), I see the following:
>>>
>>> ...
>>> assertQueryEquals("\\+blah", a, "\\+blah");
>>> assertQueryEquals("\\(blah", a, "\\(blah");
>>>
>>> assertQueryEquals("\\-blah", a, "\\-blah");
>>> assertQueryEquals("\\!blah", a, "\\!blah");
>>> assertQueryEquals("\\{blah", a, "\\{blah");
>>> assertQueryEquals("\\}blah", a, "\\}blah");
>>> ...
>>>
>>> is this really the expected behavior? Shouldn't \\-blah be
>>> interpreted
>>> as -blah and \\!blah as !blah ?
>>>
>>> Thanks,
>>>
>>> Jean-Francois Halleux
>>>
>>>
>>>
>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

RE: Question regarding escaped sequence

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Yes, I'm aware of the patch.  I was looking at it today, and then your
old email below.  The patch assumes that the existing code and even
unit tests have a bug, and had it all along, which sounds amazing, so I
want to double-check with somebody on lucene-dev who knows QueryParser
and escaping issues better than me....Erik? :)

Once we resolve this, I'll apply your patch, if the unit tests and the
code it tests really are buggy.

\\-Otis


--- Jean-Francois Halleux <ha...@skynet.be> wrote:
> I fixed the escaping bug and others in the patch I submitted for Bug
> 24665:
> "[PATCH] Query parser doesn't handle escaped field names"
> 
> I think the fix was clean. I traced it to an image token returned by
> JavaCC
> still containing the escaped char. I included several tests as well
> if I
> remember well.
> 
> This patch never got applied, don't know why.
> 
> 
> KR,
> 
> Jean-Francois Halleux
> 
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: mardi 2 mars 2004 16:35
> To: Lucene Developers List; halleux.jf@skynet.be
> Subject: Re: Question regarding escaped sequence
> 
> 
> I have a feeling that query escaping really is broken in Lucene.
> Try running the class below like this:
> 
> prompt> java Escaper '+string' '\+string'
> 
> I get:
> 
> $ java Escaper '+string' '\+string'
> 0: +string
> 1: \+string
> QUERY: \+string
> HITS: 0
> 
> That should give me 1 hit, shouldn't it?
> 
> import org.apache.lucene.queryParser.QueryParser;
> import org.apache.lucene.analysis.*;
> import org.apache.lucene.earch.*;
> import org.apache.lucene.index.*;
> import org.apache.lucene.store.*;
> import org.apache.lucene.document.*;
> 
> public class Escaper
> {
>     public static void main(String[] args) throws Exception
>     {
>         System.out.println("0: " + args[0]);
>         System.out.println("1: " + args[1]);
> 
>         Directory dir = new RAMDirectory();
>         IndexWriter writer = new IndexWriter(dir, new
> WhitespaceAnalyzer(), true);
>         Document doc = new Document();
>         doc.add(Field.Text("text", args[0]));
>         writer.addDocument(doc);
>         writer.optimize();
>         writer.close();
> 
>         QueryParser qp = new QueryParser("text", new
> WhitespaceAnalyzer());
>         Query q = qp.parse(args[1]);
>         System.out.println("QUERY: " + q.toString("text"));
> 
>         IndexSearcher searcher = new IndexSearcher(dir);
>         Hits hits = searcher.search(q);
>         System.out.println("HITS: " + hits.length());
>         searcher.close();
>     }
> }
> 
> Thanks,
> Otis
> 
> 
> --- Jean-Francois Halleux <ha...@skynet.be> wrote:
> > Hello,
> >
> > 	in TestQueryParser, method testEscaped(), I see the following:
> >
> > ...
> > assertQueryEquals("\\+blah", a, "\\+blah");
> > assertQueryEquals("\\(blah", a, "\\(blah");
> >
> > assertQueryEquals("\\-blah", a, "\\-blah");
> > assertQueryEquals("\\!blah", a, "\\!blah");
> > assertQueryEquals("\\{blah", a, "\\{blah");
> > assertQueryEquals("\\}blah", a, "\\}blah");
> > ...
> >
> > is this really the expected behavior? Shouldn't \\-blah be
> > interpreted
> > as -blah and \\!blah as !blah ?
> >
> > Thanks,
> >
> > Jean-Francois Halleux
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> >
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

RE: Question regarding escaped sequence

Posted by Jean-Francois Halleux <ha...@skynet.be>.

I fixed the escaping bug and others in the patch I submitted for Bug 24665:
"[PATCH] Query parser doesn't handle escaped field names"

I think the fix was clean. I traced it to an image token returned by JavaCC
still containing the escaped char. I included several tests as well if I
remember well.

This patch never got applied, don't know why.


KR,

Jean-Francois Halleux

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: mardi 2 mars 2004 16:35
To: Lucene Developers List; halleux.jf@skynet.be
Subject: Re: Question regarding escaped sequence


I have a feeling that query escaping really is broken in Lucene.
Try running the class below like this:

prompt> java Escaper '+string' '\+string'

I get:

$ java Escaper '+string' '\+string'
0: +string
1: \+string
QUERY: \+string
HITS: 0

That should give me 1 hit, shouldn't it?

import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.analysis.*;
import org.apache.lucene.earch.*;
import org.apache.lucene.index.*;
import org.apache.lucene.store.*;
import org.apache.lucene.document.*;

public class Escaper
{
    public static void main(String[] args) throws Exception
    {
        System.out.println("0: " + args[0]);
        System.out.println("1: " + args[1]);

        Directory dir = new RAMDirectory();
        IndexWriter writer = new IndexWriter(dir, new
WhitespaceAnalyzer(), true);
        Document doc = new Document();
        doc.add(Field.Text("text", args[0]));
        writer.addDocument(doc);
        writer.optimize();
        writer.close();

        QueryParser qp = new QueryParser("text", new
WhitespaceAnalyzer());
        Query q = qp.parse(args[1]);
        System.out.println("QUERY: " + q.toString("text"));

        IndexSearcher searcher = new IndexSearcher(dir);
        Hits hits = searcher.search(q);
        System.out.println("HITS: " + hits.length());
        searcher.close();
    }
}

Thanks,
Otis


--- Jean-Francois Halleux <ha...@skynet.be> wrote:
> Hello,
>
> 	in TestQueryParser, method testEscaped(), I see the following:
>
> ...
> assertQueryEquals("\\+blah", a, "\\+blah");
> assertQueryEquals("\\(blah", a, "\\(blah");
>
> assertQueryEquals("\\-blah", a, "\\-blah");
> assertQueryEquals("\\!blah", a, "\\!blah");
> assertQueryEquals("\\{blah", a, "\\{blah");
> assertQueryEquals("\\}blah", a, "\\}blah");
> ...
>
> is this really the expected behavior? Shouldn't \\-blah be
> interpreted
> as -blah and \\!blah as !blah ?
>
> Thanks,
>
> Jean-Francois Halleux
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: Question regarding escaped sequence

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

Otis,

Just a public "thanks!" for applying these patches.  If there were more 
hours in a day I'd have been more proactive with it myself.

	Erik


On Mar 3, 2004, at 7:33 AM, Otis Gospodnetic wrote:

> I closed the latter two, but the first one is a JMeter bug.
> Thanks for your work, I think this fix will make several people happy!
>
> Otis
>
> --- Jean-Francois Halleux <ha...@skynet.be> wrote:
>> Otis, you can probably close bugs 16370, 11636, and 14665 as well.
>>
>> Have a look at those too.
>>
>> KR,
>>
>> Jeff
>>
>> -----Original Message-----
>> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
>> Sent: mercredi 3 mars 2004 12:19
>> To: Lucene Developers List
>> Subject: Re: Question regarding escaped sequence
>>
>>
>> This indeed fixes the bug that the code further below demonstrates,
>> so
>> I'm comitting it.
>>
>> http://issues.apache.org/bugzilla/show_bug.cgi?id=24665
>>
>> Otis
>>
>> --- Otis Gospodnetic <ot...@yahoo.com> wrote:
>>> I have a feeling that query escaping really is broken in Lucene.
>>> Try running the class below like this:
>>>
>>> prompt> java Escaper '+string' '\+string'
>>>
>>> I get:
>>>
>>> $ java Escaper '+string' '\+string'
>>> 0: +string
>>> 1: \+string
>>> QUERY: \+string
>>> HITS: 0
>>>
>>> That should give me 1 hit, shouldn't it?
>>>
>>> import org.apache.lucene.queryParser.QueryParser;
>>> import org.apache.lucene.analysis.*;
>>> import org.apache.lucene.search.*;
>>> import org.apache.lucene.index.*;
>>> import org.apache.lucene.store.*;
>>> import org.apache.lucene.document.*;
>>>
>>> public class Escaper
>>> {
>>>     public static void main(String[] args) throws Exception
>>>     {
>>>         System.out.println("0: " + args[0]);
>>>         System.out.println("1: " + args[1]);
>>>
>>>         Directory dir = new RAMDirectory();
>>>         IndexWriter writer = new IndexWriter(dir, new
>>> WhitespaceAnalyzer(), true);
>>>         Document doc = new Document();
>>>         doc.add(Field.Text("text", args[0]));
>>>         writer.addDocument(doc);
>>>         writer.optimize();
>>>         writer.close();
>>>
>>>         QueryParser qp = new QueryParser("text", new
>>> WhitespaceAnalyzer());
>>>         Query q = qp.parse(args[1]);
>>>         System.out.println("QUERY: " + q.toString("text"));
>>>
>>>         IndexSearcher searcher = new IndexSearcher(dir);
>>>         Hits hits = searcher.search(q);
>>>         System.out.println("HITS: " + hits.length());
>>>         searcher.close();
>>>     }
>>> }
>>>
>>> Thanks,
>>> Otis
>>>
>>>
>>> --- Jean-Francois Halleux <ha...@skynet.be> wrote:
>>>> Hello,
>>>>
>>>> 	in TestQueryParser, method testEscaped(), I see the following:
>>>>
>>>> ...
>>>> assertQueryEquals("\\+blah", a, "\\+blah");
>>>> assertQueryEquals("\\(blah", a, "\\(blah");
>>>>
>>>> assertQueryEquals("\\-blah", a, "\\-blah");
>>>> assertQueryEquals("\\!blah", a, "\\!blah");
>>>> assertQueryEquals("\\{blah", a, "\\{blah");
>>>> assertQueryEquals("\\}blah", a, "\\}blah");
>>>> ...
>>>>
>>>> is this really the expected behavior? Shouldn't \\-blah be
>>>> interpreted
>>>> as -blah and \\!blah as !blah ?
>>>>
>>>> Thanks,
>>>>
>>>> Jean-Francois Halleux
>>>>
>>>>
>>>>
>>>
>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>>>> For additional commands, e-mail:
>> lucene-dev-help@jakarta.apache.org
>>>>
>>>
>>>
>>>
>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

RE: Question regarding escaped sequence

Posted by Otis Gospodnetic <ot...@yahoo.com>.

I closed the latter two, but the first one is a JMeter bug.
Thanks for your work, I think this fix will make several people happy!

Otis

--- Jean-Francois Halleux <ha...@skynet.be> wrote:
> Otis, you can probably close bugs 16370, 11636, and 14665 as well.
> 
> Have a look at those too.
> 
> KR,
> 
> Jeff
> 
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: mercredi 3 mars 2004 12:19
> To: Lucene Developers List
> Subject: Re: Question regarding escaped sequence
> 
> 
> This indeed fixes the bug that the code further below demonstrates,
> so
> I'm comitting it.
> 
> http://issues.apache.org/bugzilla/show_bug.cgi?id=24665
> 
> Otis
> 
> --- Otis Gospodnetic <ot...@yahoo.com> wrote:
> > I have a feeling that query escaping really is broken in Lucene.
> > Try running the class below like this:
> > 
> > prompt> java Escaper '+string' '\+string'
> > 
> > I get:
> > 
> > $ java Escaper '+string' '\+string'
> > 0: +string
> > 1: \+string
> > QUERY: \+string
> > HITS: 0
> > 
> > That should give me 1 hit, shouldn't it?
> > 
> > import org.apache.lucene.queryParser.QueryParser;
> > import org.apache.lucene.analysis.*;
> > import org.apache.lucene.search.*;
> > import org.apache.lucene.index.*;
> > import org.apache.lucene.store.*;
> > import org.apache.lucene.document.*;
> > 
> > public class Escaper
> > {
> >     public static void main(String[] args) throws Exception
> >     {
> >         System.out.println("0: " + args[0]);
> >         System.out.println("1: " + args[1]);
> >         
> >         Directory dir = new RAMDirectory();
> >         IndexWriter writer = new IndexWriter(dir, new
> > WhitespaceAnalyzer(), true);
> >         Document doc = new Document();
> >         doc.add(Field.Text("text", args[0]));
> >         writer.addDocument(doc);
> >         writer.optimize();
> >         writer.close();
> > 
> >         QueryParser qp = new QueryParser("text", new
> > WhitespaceAnalyzer());
> >         Query q = qp.parse(args[1]);
> >         System.out.println("QUERY: " + q.toString("text"));
> > 
> >         IndexSearcher searcher = new IndexSearcher(dir);
> >         Hits hits = searcher.search(q);
> >         System.out.println("HITS: " + hits.length());
> >         searcher.close();
> >     }
> > }
> > 
> > Thanks,
> > Otis
> > 
> > 
> > --- Jean-Francois Halleux <ha...@skynet.be> wrote:
> > > Hello,
> > > 
> > > 	in TestQueryParser, method testEscaped(), I see the following:
> > > 
> > > ...
> > > assertQueryEquals("\\+blah", a, "\\+blah");
> > > assertQueryEquals("\\(blah", a, "\\(blah");
> > > 
> > > assertQueryEquals("\\-blah", a, "\\-blah");
> > > assertQueryEquals("\\!blah", a, "\\!blah");
> > > assertQueryEquals("\\{blah", a, "\\{blah");
> > > assertQueryEquals("\\}blah", a, "\\}blah");
> > > ...
> > > 
> > > is this really the expected behavior? Shouldn't \\-blah be
> > > interpreted
> > > as -blah and \\!blah as !blah ?
> > > 
> > > Thanks,
> > > 
> > > Jean-Francois Halleux
> > > 
> > > 
> > >
> >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail:
> lucene-dev-help@jakarta.apache.org
> > > 
> > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

RE: Question regarding escaped sequence

Posted by Jean-Francois Halleux <ha...@skynet.be>.

Otis, you can probably close bugs 16370, 11636, and 14665 as well.

Have a look at those too.

KR,

Jeff

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: mercredi 3 mars 2004 12:19
To: Lucene Developers List
Subject: Re: Question regarding escaped sequence


This indeed fixes the bug that the code further below demonstrates, so
I'm comitting it.

http://issues.apache.org/bugzilla/show_bug.cgi?id=24665

Otis

--- Otis Gospodnetic <ot...@yahoo.com> wrote:
> I have a feeling that query escaping really is broken in Lucene.
> Try running the class below like this:
> 
> prompt> java Escaper '+string' '\+string'
> 
> I get:
> 
> $ java Escaper '+string' '\+string'
> 0: +string
> 1: \+string
> QUERY: \+string
> HITS: 0
> 
> That should give me 1 hit, shouldn't it?
> 
> import org.apache.lucene.queryParser.QueryParser;
> import org.apache.lucene.analysis.*;
> import org.apache.lucene.search.*;
> import org.apache.lucene.index.*;
> import org.apache.lucene.store.*;
> import org.apache.lucene.document.*;
> 
> public class Escaper
> {
>     public static void main(String[] args) throws Exception
>     {
>         System.out.println("0: " + args[0]);
>         System.out.println("1: " + args[1]);
>         
>         Directory dir = new RAMDirectory();
>         IndexWriter writer = new IndexWriter(dir, new
> WhitespaceAnalyzer(), true);
>         Document doc = new Document();
>         doc.add(Field.Text("text", args[0]));
>         writer.addDocument(doc);
>         writer.optimize();
>         writer.close();
> 
>         QueryParser qp = new QueryParser("text", new
> WhitespaceAnalyzer());
>         Query q = qp.parse(args[1]);
>         System.out.println("QUERY: " + q.toString("text"));
> 
>         IndexSearcher searcher = new IndexSearcher(dir);
>         Hits hits = searcher.search(q);
>         System.out.println("HITS: " + hits.length());
>         searcher.close();
>     }
> }
> 
> Thanks,
> Otis
> 
> 
> --- Jean-Francois Halleux <ha...@skynet.be> wrote:
> > Hello,
> > 
> > 	in TestQueryParser, method testEscaped(), I see the following:
> > 
> > ...
> > assertQueryEquals("\\+blah", a, "\\+blah");
> > assertQueryEquals("\\(blah", a, "\\(blah");
> > 
> > assertQueryEquals("\\-blah", a, "\\-blah");
> > assertQueryEquals("\\!blah", a, "\\!blah");
> > assertQueryEquals("\\{blah", a, "\\{blah");
> > assertQueryEquals("\\}blah", a, "\\}blah");
> > ...
> > 
> > is this really the expected behavior? Shouldn't \\-blah be
> > interpreted
> > as -blah and \\!blah as !blah ?
> > 
> > Thanks,
> > 
> > Jean-Francois Halleux
> > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: Question regarding escaped sequence

Posted by Otis Gospodnetic <ot...@yahoo.com>.

This indeed fixes the bug that the code further below demonstrates, so
I'm comitting it.

http://issues.apache.org/bugzilla/show_bug.cgi?id=24665

Otis

--- Otis Gospodnetic <ot...@yahoo.com> wrote:
> I have a feeling that query escaping really is broken in Lucene.
> Try running the class below like this:
> 
> prompt> java Escaper '+string' '\+string'
> 
> I get:
> 
> $ java Escaper '+string' '\+string'
> 0: +string
> 1: \+string
> QUERY: \+string
> HITS: 0
> 
> That should give me 1 hit, shouldn't it?
> 
> import org.apache.lucene.queryParser.QueryParser;
> import org.apache.lucene.analysis.*;
> import org.apache.lucene.search.*;
> import org.apache.lucene.index.*;
> import org.apache.lucene.store.*;
> import org.apache.lucene.document.*;
> 
> public class Escaper
> {
>     public static void main(String[] args) throws Exception
>     {
>         System.out.println("0: " + args[0]);
>         System.out.println("1: " + args[1]);
>         
>         Directory dir = new RAMDirectory();
>         IndexWriter writer = new IndexWriter(dir, new
> WhitespaceAnalyzer(), true);
>         Document doc = new Document();
>         doc.add(Field.Text("text", args[0]));
>         writer.addDocument(doc);
>         writer.optimize();
>         writer.close();
> 
>         QueryParser qp = new QueryParser("text", new
> WhitespaceAnalyzer());
>         Query q = qp.parse(args[1]);
>         System.out.println("QUERY: " + q.toString("text"));
> 
>         IndexSearcher searcher = new IndexSearcher(dir);
>         Hits hits = searcher.search(q);
>         System.out.println("HITS: " + hits.length());
>         searcher.close();
>     }
> }
> 
> Thanks,
> Otis
> 
> 
> --- Jean-Francois Halleux <ha...@skynet.be> wrote:
> > Hello,
> > 
> > 	in TestQueryParser, method testEscaped(), I see the following:
> > 
> > ...
> > assertQueryEquals("\\+blah", a, "\\+blah");
> > assertQueryEquals("\\(blah", a, "\\(blah");
> > 
> > assertQueryEquals("\\-blah", a, "\\-blah");
> > assertQueryEquals("\\!blah", a, "\\!blah");
> > assertQueryEquals("\\{blah", a, "\\{blah");
> > assertQueryEquals("\\}blah", a, "\\}blah");
> > ...
> > 
> > is this really the expected behavior? Shouldn't \\-blah be
> > interpreted
> > as -blah and \\!blah as !blah ?
> > 
> > Thanks,
> > 
> > Jean-Francois Halleux
> > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org