You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Chris Fraschetti <fr...@gmail.com> on 2004/09/20 05:44:10 UTC

range and content query

can someone assist me in building or deny the possibility of combing a
range query and a standard query?

say for instance i have two fields i'm searching on... one being the a
field with an epoch date associated with the entry, and the
content....  so how can I make a query to select a range of thos
epochs, as well as search through the content? can it be done in one
query, or do I have to perform a query upon a query, and if so, what
might the syntax look like?

-- 
____________________________
Chris Fraschetti
e fraschetti@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: range and content query

Posted by Chris Fraschetti <fr...@gmail.com>.
very correct you are. changing the format of the numbers when i index
then and when i do the range fixed my problem.. thanks much.


On Mon, 20 Sep 2004 09:08:50 +0200, Morus Walter <mo...@tanto.de> wrote:
> Chris Fraschetti writes:
> > I've more or less figured out the query string required to get a range
> > of docs.. say date[0 TO 10]    assuming my dates are from 1 to 10 (for
> > the sake of this example) ... my query has results that I don't
> > understand. if i do from 0 TO 10, then I only get results matching
> > 0,1,10  ... if i do 0 TO 8, i get all results ... from 0 to 10...   if
> > i do   1 TO 5  ... then i get results 1,2,3,4,5,10  ... very strange.
> >
> that's not strange. Lucene indexes strings and compares strings. Not numbers.
> So the order is
> 1
> 10
> 101
> 11
> 2
> 20
> 21
> 3
> 4
> and so on
> 
> I't up to you to make your number look a way that it will work, e.g.
> use leading '0' to get
> 001
> 002
> 003
> 004
> 010
> 011
> 020
> 021
> ...
> 
> I think there's a page in the wiki about these issues.
> 
> > here is how my query looks...
> > query: +date_field:[1 TO 5]
> >
> > here is how the date was added...
> > Document doc = new Document();
> > doc.add(Field.UnIndexed("arcpath_field", filename));
> > doc.add(Field.Keyword("date_field", date));
> > doc.add(Field.Text("content_field", content));
> > writer.addDocument(doc);
> >
> > I tried Field.Text for the date and also received the same results.
> > Essentially I have a loop to add 11 strings... indexes 0 to 10... and
> > add "doc0", "0", "some text"  for each..  and the results i get as as
> > explained above... any ideas?
> >
> > Here is my simple searching code.. i'm currently not searching for any
> > text... i just want to test the range feature right now....
> >
> > query_string = " +("+DATE_FIELD+":["+start_date+" TO "+end_date+"])";
> > Searcher searcher = new IndexSearcher(index_path);
> > QueryParser parser = new QueryParser(CONTENT_FIELD, new StandardAnalyzer());
> > parser.setOperator(QueryParser.DEFAULT_OPERATOR_OR);
> > Query query = parser.parse(query_string);
> > System.out.println("query: "+query.toString());
> > Hits hits = searcher.search(query);
> >
> 
> It's a bad practice to create search strings that have to be decomposed
> by query parser again, if have the parts already at hand.
> At least in most cases.
> I don't know the details how and when query parser will call the analyzer
> and what standard analyzer does with numbers.
> What does query.toString() output?
> 
> But the main problem seems to be your misunderstanding of searching numbers
> in lucene. They are just strings and are treated by their lexical
> representation not their numeric value.
> 
> Morus
> 



-- 
___________________________________________________
Chris Fraschetti, Student CompSci System Admin
University of San Francisco
e fraschetti@gmail.com | http://meteora.cs.usfca.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: range and content query

Posted by Morus Walter <mo...@tanto.de>.
Chris Fraschetti writes:
> I've more or less figured out the query string required to get a range
> of docs.. say date[0 TO 10]    assuming my dates are from 1 to 10 (for
> the sake of this example) ... my query has results that I don't
> understand. if i do from 0 TO 10, then I only get results matching
> 0,1,10  ... if i do 0 TO 8, i get all results ... from 0 to 10...   if
> i do   1 TO 5  ... then i get results 1,2,3,4,5,10  ... very strange.
> 
that's not strange. Lucene indexes strings and compares strings. Not numbers.
So the order is
1
10
101
11
2
20
21
3
4
and so on

I't up to you to make your number look a way that it will work, e.g.
use leading '0' to get
001
002
003
004
010
011
020
021
...

I think there's a page in the wiki about these issues.

> here is how my query looks...
> query: +date_field:[1 TO 5]
> 
> here is how the date was added...
> Document doc = new Document();
> doc.add(Field.UnIndexed("arcpath_field", filename));
> doc.add(Field.Keyword("date_field", date));
> doc.add(Field.Text("content_field", content));
> writer.addDocument(doc);
> 
> I tried Field.Text for the date and also received the same results.
> Essentially I have a loop to add 11 strings... indexes 0 to 10... and
> add "doc0", "0", "some text"  for each..  and the results i get as as
> explained above... any ideas?
> 
> Here is my simple searching code.. i'm currently not searching for any
> text... i just want to test the range feature right now....
> 
> query_string = " +("+DATE_FIELD+":["+start_date+" TO "+end_date+"])";
> Searcher searcher = new IndexSearcher(index_path);
> QueryParser parser = new QueryParser(CONTENT_FIELD, new StandardAnalyzer());
> parser.setOperator(QueryParser.DEFAULT_OPERATOR_OR);
> Query query = parser.parse(query_string);
> System.out.println("query: "+query.toString());
> Hits hits = searcher.search(query);
> 

It's a bad practice to create search strings that have to be decomposed
by query parser again, if have the parts already at hand.
At least in most cases.
I don't know the details how and when query parser will call the analyzer
and what standard analyzer does with numbers.
What does query.toString() output?

But the main problem seems to be your misunderstanding of searching numbers
in lucene. They are just strings and are treated by their lexical 
representation not their numeric value.

Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: range and content query

Posted by Chris Fraschetti <fr...@gmail.com>.
I've more or less figured out the query string required to get a range
of docs.. say date[0 TO 10]    assuming my dates are from 1 to 10 (for
the sake of this example) ... my query has results that I don't
understand. if i do from 0 TO 10, then I only get results matching
0,1,10  ... if i do 0 TO 8, i get all results ... from 0 to 10...   if
i do   1 TO 5  ... then i get results 1,2,3,4,5,10  ... very strange.

here is how my query looks...
query: +date_field:[1 TO 5]

here is how the date was added...
Document doc = new Document();
doc.add(Field.UnIndexed("arcpath_field", filename));
doc.add(Field.Keyword("date_field", date));
doc.add(Field.Text("content_field", content));
writer.addDocument(doc);

I tried Field.Text for the date and also received the same results.
Essentially I have a loop to add 11 strings... indexes 0 to 10... and
add "doc0", "0", "some text"  for each..  and the results i get as as
explained above... any ideas?

Here is my simple searching code.. i'm currently not searching for any
text... i just want to test the range feature right now....

query_string = " +("+DATE_FIELD+":["+start_date+" TO "+end_date+"])";
Searcher searcher = new IndexSearcher(index_path);
QueryParser parser = new QueryParser(CONTENT_FIELD, new StandardAnalyzer());
parser.setOperator(QueryParser.DEFAULT_OPERATOR_OR);
Query query = parser.parse(query_string);
System.out.println("query: "+query.toString());
Hits hits = searcher.search(query);


On Mon, 20 Sep 2004 08:24:17 +0200, Morus Walter <mo...@tanto.de> wrote:
> Chris Fraschetti writes:
> > can someone assist me in building or deny the possibility of combing a
> > range query and a standard query?
> >
> > say for instance i have two fields i'm searching on... one being the a
> > field with an epoch date associated with the entry, and the
> > content....  so how can I make a query to select a range of thos
> > epochs, as well as search through the content? can it be done in one
> > query, or do I have to perform a query upon a query, and if so, what
> > might the syntax look like?
> > 
> if you create the query using the API use a boolean query to combine
> the two basic queries.
> 
> If you use query parser use AND or OR.
> 
> Note that range queries are expanded into boolean queries (OR combined)
> which may be a problem if the number of terms matching the range is too
> big. Depends on your date entries and especially how precise they are.
> Alternatively you might consider using a filter.
> 
> HTH
>        Morus
> 



-- 
___________________________________________________
Chris Fraschetti, Student CompSci System Admin
University of San Francisco
e fraschetti@gmail.com | http://meteora.cs.usfca.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: range and content query

Posted by Morus Walter <mo...@tanto.de>.
Chris Fraschetti writes:
> can someone assist me in building or deny the possibility of combing a
> range query and a standard query?
> 
> say for instance i have two fields i'm searching on... one being the a
> field with an epoch date associated with the entry, and the
> content....  so how can I make a query to select a range of thos
> epochs, as well as search through the content? can it be done in one
> query, or do I have to perform a query upon a query, and if so, what
> might the syntax look like?
> 
if you create the query using the API use a boolean query to combine
the two basic queries.

If you use query parser use AND or OR.

Note that range queries are expanded into boolean queries (OR combined)
which may be a problem if the number of terms matching the range is too
big. Depends on your date entries and especially how precise they are.
Alternatively you might consider using a filter.

HTH
	Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org