You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Siraj Haider <si...@jobdiva.com> on 2009/03/11 15:20:01 UTC

search problem when indexed using Field.setOmitTf()

We are having a problem running searches on an index after upgrading to 
2.4 and using the new Field.setOmitTf() function.  The index size has 
been dramatically reduces and even the search performace is better.  But 
searches do not return any results if searching for something that has a 
space in it.

Thats how I am running the search:

    Sort sort = new Sort(new SortField("DATECREATED", SortField.STRING, 
true));
        QueryParser queryParser = new QueryParser("", new 
WhitespaceAnalyzer());
        Query query = queryParser.parse("SQL SERVER");
        TopFieldDocs tfd = indexSearcher.search(query, null, 9999999, sort);

this query does not return results if query string has an space, i.e. 
"SQL SERVER".  This behaviour changes if we dont use 
Field.setOmitTf(true) while indexing and search returns right results.  
Please advice how to acheive reduced index size bby using 
Field.setOmitTf() as well as searching strings with space between words?

thanks


This electronic mail message and any attachments may contain information which is privileged, sensitive and/or otherwise exempt from disclosure under applicable law. The information is intended only for the use of the individual or entity named as the addressee above. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution (electronic or otherwise) or forwarding of, or the taking of any action in reliance on, the contents of this transmission is strictly prohibited. If you have received this electronic transmission in error, please notify us by telephone, facsimile, or e-mail as noted above to arrange for the return of any electronic mail or attachments. Thank You.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: search problem when indexed using Field.setOmitTf()

Posted by Michael McCandless <lu...@mikemccandless.com>.
Siraj Haider wrote:

> Yonik Seeley wrote:
>> On Wed, Mar 11, 2009 at 2:35 PM, Michael McCandless
>> <lu...@mikemccandless.com> wrote:
>>
>>> This is expected: phrase searches will not work when you omitTf.
>>>
>>
>> But why would a phrase query be created?  The code given looks like  
>> it
>> should create a boolean query with two terms.
>>
>> Of course, the given code also uses "" as the default field....
>>
>>
> I used "SQL SERVER" just as an example, in order to make is simple.   
> My
> actual query is like this [S000:"SQL SERVER"] where S000 is the field.

OK, that query does create a PhraseQuery.

> Is there anyway to selectively keep the position information and  
> discard
> other stuff like term frequency, pay load etc?

Payloads consume no space if you don't use them.  Term frequency can't  
be separately turned off.

Other things to try:

   - Turn off field storing (Field.Store.NO) and turn off term vectors  
(Field.TermVector.NO), if you haven't already.

   - Disable norms saves a tiny amount of disk space, but you lose  
boosting.

   - Reduce the number of fields on each doc

   - Run optimize.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: search problem when indexed using Field.setOmitTf()

Posted by Siraj Haider <si...@jobdiva.com>.
Yonik Seeley wrote:
> On Wed, Mar 11, 2009 at 2:35 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>   
>> This is expected: phrase searches will not work when you omitTf.
>>     
>
> But why would a phrase query be created?  The code given looks like it
> should create a boolean query with two terms.
>
> Of course, the given code also uses "" as the default field....
>
>   
I used "SQL SERVER" just as an example, in order to make is simple.  My 
actual query is like this [S000:"SQL SERVER"] where S000 is the field.
Is there anyway to selectively keep the position information and discard 
other stuff like term frequency, pay load etc?
> -Yonik
> http://www.lucidimagination.com
>
>   
>>>       QueryParser queryParser = new QueryParser("", new
>>> WhitespaceAnalyzer());
>>>       Query query = queryParser.parse("SQL SERVER");
>>>       
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>   
-siraj


This electronic mail message and any attachments may contain information which is privileged, sensitive and/or otherwise exempt from disclosure under applicable law. The information is intended only for the use of the individual or entity named as the addressee above. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution (electronic or otherwise) or forwarding of, or the taking of any action in reliance on, the contents of this transmission is strictly prohibited. If you have received this electronic transmission in error, please notify us by telephone, facsimile, or e-mail as noted above to arrange for the return of any electronic mail or attachments. Thank You.

Re: search problem when indexed using Field.setOmitTf()

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Wed, Mar 11, 2009 at 2:35 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> This is expected: phrase searches will not work when you omitTf.

But why would a phrase query be created?  The code given looks like it
should create a boolean query with two terms.

Of course, the given code also uses "" as the default field....

-Yonik
http://www.lucidimagination.com

>>       QueryParser queryParser = new QueryParser("", new
>> WhitespaceAnalyzer());
>>       Query query = queryParser.parse("SQL SERVER");

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: search problem when indexed using Field.setOmitTf()

Posted by Michael McCandless <lu...@mikemccandless.com>.
OK I opened https://issues.apache.org/jira/browse/LUCENE-1561.

Mike

Otis Gospodnetic wrote:

>
> I bet omitTf will be confusing to people.  When I see omitTf I read  
> that as "aha, don't store term frequency".  I don't read that as  
> "don't store term frequency and don't store positional  
> information".  We'll have to document this well or maybe even  
> consider renaming this so it's more self-descriptive.
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: Michael McCandless <lu...@mikemccandless.com>
>> To: java-user@lucene.apache.org
>> Sent: Wednesday, March 11, 2009 2:35:02 PM
>> Subject: Re: search problem when indexed using Field.setOmitTf()
>>
>>
>> This is expected: phrase searches will not work when you omitTf.
>>
>> omitTf means positional information about tokens is not saved in  
>> the index.
>> Span queries & phrase queries require that positional information  
>> to work.
>>
>> Mike
>>
>> Siraj Haider wrote:
>>
>>> We are having a problem running searches on an index after  
>>> upgrading to
>>> 2.4 and using the new Field.setOmitTf() function.  The index size  
>>> has
>>> been dramatically reduces and even the search performace is  
>>> better.  But
>>> searches do not return any results if searching for something that  
>>> has a
>>> space in it.
>>>
>>> Thats how I am running the search:
>>>
>>>   Sort sort = new Sort(new SortField("DATECREATED",  
>>> SortField.STRING,
>>> true));
>>>       QueryParser queryParser = new QueryParser("", new
>>> WhitespaceAnalyzer());
>>>       Query query = queryParser.parse("SQL SERVER");
>>>       TopFieldDocs tfd = indexSearcher.search(query, null,  
>>> 9999999, sort);
>>>
>>> this query does not return results if query string has an space,  
>>> i.e.
>>> "SQL SERVER".  This behaviour changes if we dont use
>>> Field.setOmitTf(true) while indexing and search returns right  
>>> results.
>>> Please advice how to acheive reduced index size bby using
>>> Field.setOmitTf() as well as searching strings with space between  
>>> words?
>>>
>>> thanks
>>>
>>>
>>> This electronic mail message and any attachments may contain  
>>> information which
>> is privileged, sensitive and/or otherwise exempt from disclosure  
>> under
>> applicable law. The information is intended only for the use of the  
>> individual
>> or entity named as the addressee above. If you are not the intended  
>> recipient,
>> you are hereby notified that any disclosure, copying, distribution  
>> (electronic
>> or otherwise) or forwarding of, or the taking of any action in  
>> reliance on, the
>> contents of this transmission is strictly prohibited. If you have  
>> received this
>> electronic transmission in error, please notify us by telephone,  
>> facsimile, or
>> e-mail as noted above to arrange for the return of any electronic  
>> mail or
>> attachments. Thank You.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: search problem when indexed using Field.setOmitTf()

Posted by Michael McCandless <lu...@mikemccandless.com>.
I agree, the name is cryptic, and we should also strengthen the  
javadocs to explain what searches will not work properly if you use  
it.  I'll open an issue.

Any suggestions for better name?  omitTermPositions?

Mike

Otis Gospodnetic wrote:

>
> I bet omitTf will be confusing to people.  When I see omitTf I read  
> that as "aha, don't store term frequency".  I don't read that as  
> "don't store term frequency and don't store positional  
> information".  We'll have to document this well or maybe even  
> consider renaming this so it's more self-descriptive.
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: Michael McCandless <lu...@mikemccandless.com>
>> To: java-user@lucene.apache.org
>> Sent: Wednesday, March 11, 2009 2:35:02 PM
>> Subject: Re: search problem when indexed using Field.setOmitTf()
>>
>>
>> This is expected: phrase searches will not work when you omitTf.
>>
>> omitTf means positional information about tokens is not saved in  
>> the index.
>> Span queries & phrase queries require that positional information  
>> to work.
>>
>> Mike
>>
>> Siraj Haider wrote:
>>
>>> We are having a problem running searches on an index after  
>>> upgrading to
>>> 2.4 and using the new Field.setOmitTf() function.  The index size  
>>> has
>>> been dramatically reduces and even the search performace is  
>>> better.  But
>>> searches do not return any results if searching for something that  
>>> has a
>>> space in it.
>>>
>>> Thats how I am running the search:
>>>
>>>   Sort sort = new Sort(new SortField("DATECREATED",  
>>> SortField.STRING,
>>> true));
>>>       QueryParser queryParser = new QueryParser("", new
>>> WhitespaceAnalyzer());
>>>       Query query = queryParser.parse("SQL SERVER");
>>>       TopFieldDocs tfd = indexSearcher.search(query, null,  
>>> 9999999, sort);
>>>
>>> this query does not return results if query string has an space,  
>>> i.e.
>>> "SQL SERVER".  This behaviour changes if we dont use
>>> Field.setOmitTf(true) while indexing and search returns right  
>>> results.
>>> Please advice how to acheive reduced index size bby using
>>> Field.setOmitTf() as well as searching strings with space between  
>>> words?
>>>
>>> thanks
>>>
>>>
>>> This electronic mail message and any attachments may contain  
>>> information which
>> is privileged, sensitive and/or otherwise exempt from disclosure  
>> under
>> applicable law. The information is intended only for the use of the  
>> individual
>> or entity named as the addressee above. If you are not the intended  
>> recipient,
>> you are hereby notified that any disclosure, copying, distribution  
>> (electronic
>> or otherwise) or forwarding of, or the taking of any action in  
>> reliance on, the
>> contents of this transmission is strictly prohibited. If you have  
>> received this
>> electronic transmission in error, please notify us by telephone,  
>> facsimile, or
>> e-mail as noted above to arrange for the return of any electronic  
>> mail or
>> attachments. Thank You.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: search problem when indexed using Field.setOmitTf()

Posted by Otis Gospodnetic <ot...@yahoo.com>.
I bet omitTf will be confusing to people.  When I see omitTf I read that as "aha, don't store term frequency".  I don't read that as "don't store term frequency and don't store positional information".  We'll have to document this well or maybe even consider renaming this so it's more self-descriptive.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Michael McCandless <lu...@mikemccandless.com>
> To: java-user@lucene.apache.org
> Sent: Wednesday, March 11, 2009 2:35:02 PM
> Subject: Re: search problem when indexed using Field.setOmitTf()
> 
> 
> This is expected: phrase searches will not work when you omitTf.
> 
> omitTf means positional information about tokens is not saved in the index.  
> Span queries & phrase queries require that positional information to work.
> 
> Mike
> 
> Siraj Haider wrote:
> 
> > We are having a problem running searches on an index after upgrading to
> > 2.4 and using the new Field.setOmitTf() function.  The index size has
> > been dramatically reduces and even the search performace is better.  But
> > searches do not return any results if searching for something that has a
> > space in it.
> > 
> > Thats how I am running the search:
> > 
> >    Sort sort = new Sort(new SortField("DATECREATED", SortField.STRING,
> > true));
> >        QueryParser queryParser = new QueryParser("", new
> > WhitespaceAnalyzer());
> >        Query query = queryParser.parse("SQL SERVER");
> >        TopFieldDocs tfd = indexSearcher.search(query, null, 9999999, sort);
> > 
> > this query does not return results if query string has an space, i.e.
> > "SQL SERVER".  This behaviour changes if we dont use
> > Field.setOmitTf(true) while indexing and search returns right results.
> > Please advice how to acheive reduced index size bby using
> > Field.setOmitTf() as well as searching strings with space between words?
> > 
> > thanks
> > 
> > 
> > This electronic mail message and any attachments may contain information which 
> is privileged, sensitive and/or otherwise exempt from disclosure under 
> applicable law. The information is intended only for the use of the individual 
> or entity named as the addressee above. If you are not the intended recipient, 
> you are hereby notified that any disclosure, copying, distribution (electronic 
> or otherwise) or forwarding of, or the taking of any action in reliance on, the 
> contents of this transmission is strictly prohibited. If you have received this 
> electronic transmission in error, please notify us by telephone, facsimile, or 
> e-mail as noted above to arrange for the return of any electronic mail or 
> attachments. Thank You.
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: search problem when indexed using Field.setOmitTf()

Posted by Michael McCandless <lu...@mikemccandless.com>.
This is expected: phrase searches will not work when you omitTf.

omitTf means positional information about tokens is not saved in the  
index.  Span queries & phrase queries require that positional  
information to work.

Mike

Siraj Haider wrote:

> We are having a problem running searches on an index after upgrading  
> to
> 2.4 and using the new Field.setOmitTf() function.  The index size has
> been dramatically reduces and even the search performace is better.   
> But
> searches do not return any results if searching for something that  
> has a
> space in it.
>
> Thats how I am running the search:
>
>    Sort sort = new Sort(new SortField("DATECREATED", SortField.STRING,
> true));
>        QueryParser queryParser = new QueryParser("", new
> WhitespaceAnalyzer());
>        Query query = queryParser.parse("SQL SERVER");
>        TopFieldDocs tfd = indexSearcher.search(query, null, 9999999,  
> sort);
>
> this query does not return results if query string has an space, i.e.
> "SQL SERVER".  This behaviour changes if we dont use
> Field.setOmitTf(true) while indexing and search returns right results.
> Please advice how to acheive reduced index size bby using
> Field.setOmitTf() as well as searching strings with space between  
> words?
>
> thanks
>
>
> This electronic mail message and any attachments may contain  
> information which is privileged, sensitive and/or otherwise exempt  
> from disclosure under applicable law. The information is intended  
> only for the use of the individual or entity named as the addressee  
> above. If you are not the intended recipient, you are hereby  
> notified that any disclosure, copying, distribution (electronic or  
> otherwise) or forwarding of, or the taking of any action in reliance  
> on, the contents of this transmission is strictly prohibited. If you  
> have received this electronic transmission in error, please notify  
> us by telephone, facsimile, or e-mail as noted above to arrange for  
> the return of any electronic mail or attachments. Thank You.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org