You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Rory Franklin <ro...@chillibean.tv> on 2012/06/20 13:05:20 UTC

CouchDB Lucene boost problem

Hi,

We've got a machine with couchdb-lucene 0.9 on it and various machines that use 0.7 and there seems to be a discrepancy between the two versions.

In one of our indexes we are boosting a particular field so that it comes above others in search results (we are searching for an ID and linked ID's in other documents, but want the document that matches the ID directly to return above the linked documents), and the definition looks like this:

ret.add(doc.entry_human_id, {'field':'sort_entry_human_id', 'type' : 'int', 'boost' : 1.5})

On 0.7 this works absolutely fine, but on 0.9 we are seeing this error:

2012-06-20 10:48:04,251 WARN [lia_development] Exiting due to exception.
java.lang.UnsupportedOperationException: You cannot set an index-time boost: norms are omitted for field 'sort_entry_human_id'
        at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:85)
        at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:276)
        at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:852)
        at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:2167)
        at com.github.rnewson.couchdb.lucene.DatabaseIndexer.handleResponse(DatabaseIndexer.java:393)
        at com.github.rnewson.couchdb.lucene.DatabaseIndexer.handleResponse(DatabaseIndexer.java:83)
        at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:735)
        at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:709)
        at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:700)
        at com.github.rnewson.couchdb.lucene.DatabaseIndexer.run(DatabaseIndexer.java:473)
        at java.lang.Thread.run(Thread.java:680)



Removing the boost and restarting couchdb-lucene resolves the issue, I'm just curious as to what the problem may be?



Thanks,
Rory

Re: CouchDB Lucene boost problem

Posted by Rory Franklin <ro...@chillibean.tv>.
Actually, I see that you can boost using the ^ symbol. 


Rory

On Wednesday, 20 June 2012 at 15:13, Rory Franklin wrote: 
> The term we are searching for (just a numeric ID) exists in two places depending on the document. Either as an "entry_human_id" which is the ID of that specific document, or within an array of "linked_entry_human_ids" where the document is linked to another document.
> 
> What we want is the document that matches the ID to the entry_human_id comes above any documents that match the ID in the linked_entry_human_ids (hence the boost on entry_human_id), but this doesn't seem to be happening.
> 
> Would that be better suited to a query-time boost? If so, is there a specific way to boost a term in the query as I cannot see one in the readme.
> 
> 
> 
> Rory
> 
> 
> On Wednesday, 20 June 2012 at 15:07, Robert Newson wrote:
> 
> > 
> > There's index-time and query-time boosting, perhaps that explains it?
> > 
> > 
> > On 20 Jun 2012, at 14:35, Rory Franklin wrote:
> > 
> > > The error is now gone, which is great. Thanks so much for fixing that so quickly!
> > > 
> > > Looking at the results of the search (with debug on for readability), it doesn't look like the boost is actually getting applied? I have a boost on another field (same name, but without the sort_ prefix) and in the BooleanQuery section of the search result it looks like this:
> > > 
> > > entry_human_id:1220091,boost=1.0
> > > The value in the index is actually 1.5. Is it the case that the boost isn't taking effect or that my query doesn't manually boost that field?
> > > 
> > > Rory
> > > 
> > > 
> > > On Wednesday, 20 June 2012 at 13:21, Robert Newson wrote:
> > > 
> > > > anyway, I fixed it. 
> > > > 
> > > > You'll need to delete the index already built, though, and I'd love to hear back from you when you try it.
> > > > 
> > > > B.
> > > > 
> > > > On 20 Jun 2012, at 12:05, Rory Franklin wrote:
> > > > 
> > > > > Hi,
> > > > > 
> > > > > We've got a machine with couchdb-lucene 0.9 on it and various machines that use 0.7 and there seems to be a discrepancy between the two versions.
> > > > > 
> > > > > In one of our indexes we are boosting a particular field so that it comes above others in search results (we are searching for an ID and linked ID's in other documents, but want the document that matches the ID directly to return above the linked documents), and the definition looks like this:
> > > > > 
> > > > > ret.add(doc.entry_human_id, {'field':'sort_entry_human_id', 'type' : 'int', 'boost' : 1.5})
> > > > > 
> > > > > On 0.7 this works absolutely fine, but on 0.9 we are seeing this error:
> > > > > 
> > > > > 2012-06-20 10:48:04,251 WARN [lia_development] Exiting due to exception.
> > > > > java.lang.UnsupportedOperationException: You cannot set an index-time boost: norms are omitted for field 'sort_entry_human_id'
> > > > > at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:85)
> > > > > at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:276)
> > > > > at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:852)
> > > > > at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:2167)
> > > > > at com.github.rnewson.couchdb.lucene.DatabaseIndexer.handleResponse(DatabaseIndexer.java:393)
> > > > > at com.github.rnewson.couchdb.lucene.DatabaseIndexer.handleResponse(DatabaseIndexer.java:83)
> > > > > at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:735)
> > > > > at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:709)
> > > > > at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:700)
> > > > > at com.github.rnewson.couchdb.lucene.DatabaseIndexer.run(DatabaseIndexer.java:473)
> > > > > at java.lang.Thread.run(Thread.java:680)
> > > > > 
> > > > > 
> > > > > 
> > > > > Removing the boost and restarting couchdb-lucene resolves the issue, I'm just curious as to what the problem may be?
> > > > > 
> > > > > 
> > > > > 
> > > > > Thanks,
> > > > > Rory
> > > > > 
> > > > 
> > > > 
> > > 
> > > 
> > 
> > 
> > 
> > 
> 
> 


Re: CouchDB Lucene boost problem

Posted by Rory Franklin <ro...@chillibean.tv>.
The term we are searching for (just a numeric ID) exists in two places depending on the document. Either as an "entry_human_id" which is the ID of that specific document, or within an array of "linked_entry_human_ids" where the document is linked to another document.

What we want is the document that matches the ID to the entry_human_id comes above any documents that match the ID in the linked_entry_human_ids (hence the boost on entry_human_id), but this doesn't seem to be happening.

Would that be better suited to a query-time boost? If so, is there a specific way to boost a term in the query as I cannot see one in the readme.



Rory


On Wednesday, 20 June 2012 at 15:07, Robert Newson wrote:

> 
> There's index-time and query-time boosting, perhaps that explains it?
> 
> 
> On 20 Jun 2012, at 14:35, Rory Franklin wrote:
> 
> > The error is now gone, which is great. Thanks so much for fixing that so quickly!
> > 
> > Looking at the results of the search (with debug on for readability), it doesn't look like the boost is actually getting applied? I have a boost on another field (same name, but without the sort_ prefix) and in the BooleanQuery section of the search result it looks like this:
> > 
> > entry_human_id:1220091,boost=1.0
> > The value in the index is actually 1.5. Is it the case that the boost isn't taking effect or that my query doesn't manually boost that field?
> > 
> > Rory
> > 
> > 
> > On Wednesday, 20 June 2012 at 13:21, Robert Newson wrote:
> > 
> > > anyway, I fixed it. 
> > > 
> > > You'll need to delete the index already built, though, and I'd love to hear back from you when you try it.
> > > 
> > > B.
> > > 
> > > On 20 Jun 2012, at 12:05, Rory Franklin wrote:
> > > 
> > > > Hi,
> > > > 
> > > > We've got a machine with couchdb-lucene 0.9 on it and various machines that use 0.7 and there seems to be a discrepancy between the two versions.
> > > > 
> > > > In one of our indexes we are boosting a particular field so that it comes above others in search results (we are searching for an ID and linked ID's in other documents, but want the document that matches the ID directly to return above the linked documents), and the definition looks like this:
> > > > 
> > > > ret.add(doc.entry_human_id, {'field':'sort_entry_human_id', 'type' : 'int', 'boost' : 1.5})
> > > > 
> > > > On 0.7 this works absolutely fine, but on 0.9 we are seeing this error:
> > > > 
> > > > 2012-06-20 10:48:04,251 WARN [lia_development] Exiting due to exception.
> > > > java.lang.UnsupportedOperationException: You cannot set an index-time boost: norms are omitted for field 'sort_entry_human_id'
> > > > at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:85)
> > > > at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:276)
> > > > at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:852)
> > > > at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:2167)
> > > > at com.github.rnewson.couchdb.lucene.DatabaseIndexer.handleResponse(DatabaseIndexer.java:393)
> > > > at com.github.rnewson.couchdb.lucene.DatabaseIndexer.handleResponse(DatabaseIndexer.java:83)
> > > > at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:735)
> > > > at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:709)
> > > > at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:700)
> > > > at com.github.rnewson.couchdb.lucene.DatabaseIndexer.run(DatabaseIndexer.java:473)
> > > > at java.lang.Thread.run(Thread.java:680)
> > > > 
> > > > 
> > > > 
> > > > Removing the boost and restarting couchdb-lucene resolves the issue, I'm just curious as to what the problem may be?
> > > > 
> > > > 
> > > > 
> > > > Thanks,
> > > > Rory
> > > > 
> > > 
> > > 
> > 
> > 
> 
> 
> 



Re: CouchDB Lucene boost problem

Posted by Robert Newson <rn...@apache.org>.
There's index-time and query-time boosting, perhaps that explains it?


On 20 Jun 2012, at 14:35, Rory Franklin wrote:

> The error is now gone, which is great. Thanks so much for fixing that so quickly!
> 
> Looking at the results of the search (with debug on for readability), it doesn't look like the boost is actually getting applied? I have a boost on another field (same name, but without the sort_ prefix) and in the BooleanQuery section of the search result it looks like this:
> 
> entry_human_id:1220091,boost=1.0
> The value in the index is actually 1.5. Is it the case that the boost isn't taking effect or that my query doesn't manually boost that field?
> 
> Rory
> 
> 
> On Wednesday, 20 June 2012 at 13:21, Robert Newson wrote:
> 
>> anyway, I fixed it. 
>> 
>> You'll need to delete the index already built, though, and I'd love to hear back from you when you try it.
>> 
>> B.
>> 
>> On 20 Jun 2012, at 12:05, Rory Franklin wrote:
>> 
>>> Hi,
>>> 
>>> We've got a machine with couchdb-lucene 0.9 on it and various machines that use 0.7 and there seems to be a discrepancy between the two versions.
>>> 
>>> In one of our indexes we are boosting a particular field so that it comes above others in search results (we are searching for an ID and linked ID's in other documents, but want the document that matches the ID directly to return above the linked documents), and the definition looks like this:
>>> 
>>> ret.add(doc.entry_human_id, {'field':'sort_entry_human_id', 'type' : 'int', 'boost' : 1.5})
>>> 
>>> On 0.7 this works absolutely fine, but on 0.9 we are seeing this error:
>>> 
>>> 2012-06-20 10:48:04,251 WARN [lia_development] Exiting due to exception.
>>> java.lang.UnsupportedOperationException: You cannot set an index-time boost: norms are omitted for field 'sort_entry_human_id'
>>> at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:85)
>>> at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:276)
>>> at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:852)
>>> at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:2167)
>>> at com.github.rnewson.couchdb.lucene.DatabaseIndexer.handleResponse(DatabaseIndexer.java:393)
>>> at com.github.rnewson.couchdb.lucene.DatabaseIndexer.handleResponse(DatabaseIndexer.java:83)
>>> at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:735)
>>> at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:709)
>>> at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:700)
>>> at com.github.rnewson.couchdb.lucene.DatabaseIndexer.run(DatabaseIndexer.java:473)
>>> at java.lang.Thread.run(Thread.java:680)
>>> 
>>> 
>>> 
>>> Removing the boost and restarting couchdb-lucene resolves the issue, I'm just curious as to what the problem may be?
>>> 
>>> 
>>> 
>>> Thanks,
>>> Rory
>>> 
>> 
>> 
>> 
> 
> 


Re: CouchDB Lucene boost problem

Posted by Rory Franklin <ro...@chillibean.tv>.
The error is now gone, which is great. Thanks so much for fixing that so quickly!

Looking at the results of the search (with debug on for readability), it doesn't look like the boost is actually getting applied? I have a boost on another field (same name, but without the sort_ prefix) and in the BooleanQuery section of the search result it looks like this:

entry_human_id:1220091,boost=1.0
The value in the index is actually 1.5. Is it the case that the boost isn't taking effect or that my query doesn't manually boost that field?

Rory


On Wednesday, 20 June 2012 at 13:21, Robert Newson wrote:

> anyway, I fixed it. 
> 
> You'll need to delete the index already built, though, and I'd love to hear back from you when you try it.
> 
> B.
> 
> On 20 Jun 2012, at 12:05, Rory Franklin wrote:
> 
> > Hi,
> > 
> > We've got a machine with couchdb-lucene 0.9 on it and various machines that use 0.7 and there seems to be a discrepancy between the two versions.
> > 
> > In one of our indexes we are boosting a particular field so that it comes above others in search results (we are searching for an ID and linked ID's in other documents, but want the document that matches the ID directly to return above the linked documents), and the definition looks like this:
> > 
> > ret.add(doc.entry_human_id, {'field':'sort_entry_human_id', 'type' : 'int', 'boost' : 1.5})
> > 
> > On 0.7 this works absolutely fine, but on 0.9 we are seeing this error:
> > 
> > 2012-06-20 10:48:04,251 WARN [lia_development] Exiting due to exception.
> > java.lang.UnsupportedOperationException: You cannot set an index-time boost: norms are omitted for field 'sort_entry_human_id'
> > at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:85)
> > at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:276)
> > at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:852)
> > at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:2167)
> > at com.github.rnewson.couchdb.lucene.DatabaseIndexer.handleResponse(DatabaseIndexer.java:393)
> > at com.github.rnewson.couchdb.lucene.DatabaseIndexer.handleResponse(DatabaseIndexer.java:83)
> > at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:735)
> > at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:709)
> > at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:700)
> > at com.github.rnewson.couchdb.lucene.DatabaseIndexer.run(DatabaseIndexer.java:473)
> > at java.lang.Thread.run(Thread.java:680)
> > 
> > 
> > 
> > Removing the boost and restarting couchdb-lucene resolves the issue, I'm just curious as to what the problem may be?
> > 
> > 
> > 
> > Thanks,
> > Rory
> > 
> 
> 
> 



Re: CouchDB Lucene boost problem

Posted by Robert Newson <rn...@apache.org>.
anyway, I fixed it. 

You'll need to delete the index already built, though, and I'd love to hear back from you when you try it.

B.

On 20 Jun 2012, at 12:05, Rory Franklin wrote:

> Hi,
> 
> We've got a machine with couchdb-lucene 0.9 on it and various machines that use 0.7 and there seems to be a discrepancy between the two versions.
> 
> In one of our indexes we are boosting a particular field so that it comes above others in search results (we are searching for an ID and linked ID's in other documents, but want the document that matches the ID directly to return above the linked documents), and the definition looks like this:
> 
> ret.add(doc.entry_human_id, {'field':'sort_entry_human_id', 'type' : 'int', 'boost' : 1.5})
> 
> On 0.7 this works absolutely fine, but on 0.9 we are seeing this error:
> 
> 2012-06-20 10:48:04,251 WARN [lia_development] Exiting due to exception.
> java.lang.UnsupportedOperationException: You cannot set an index-time boost: norms are omitted for field 'sort_entry_human_id'
>        at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:85)
>        at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:276)
>        at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:852)
>        at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:2167)
>        at com.github.rnewson.couchdb.lucene.DatabaseIndexer.handleResponse(DatabaseIndexer.java:393)
>        at com.github.rnewson.couchdb.lucene.DatabaseIndexer.handleResponse(DatabaseIndexer.java:83)
>        at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:735)
>        at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:709)
>        at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:700)
>        at com.github.rnewson.couchdb.lucene.DatabaseIndexer.run(DatabaseIndexer.java:473)
>        at java.lang.Thread.run(Thread.java:680)
> 
> 
> 
> Removing the boost and restarting couchdb-lucene resolves the issue, I'm just curious as to what the problem may be?
> 
> 
> 
> Thanks,
> Rory


Re: CouchDB Lucene boost problem

Posted by Robert Newson <rn...@apache.org>.
Hi Rory,

It seems the default for numeric fields change to exclude norms by default between the version of Lucene used between 0.7 and 0.9;

SOLR-3140: https://issues.apache.org/jira/browse/SOLR-3140

File a ticket at https://github.com/rnewson/couchdb-lucene/issues, if you supply a patch it'll happen much faster. :)

B.

On 20 Jun 2012, at 12:05, Rory Franklin wrote:

> Hi,
> 
> We've got a machine with couchdb-lucene 0.9 on it and various machines that use 0.7 and there seems to be a discrepancy between the two versions.
> 
> In one of our indexes we are boosting a particular field so that it comes above others in search results (we are searching for an ID and linked ID's in other documents, but want the document that matches the ID directly to return above the linked documents), and the definition looks like this:
> 
> ret.add(doc.entry_human_id, {'field':'sort_entry_human_id', 'type' : 'int', 'boost' : 1.5})
> 
> On 0.7 this works absolutely fine, but on 0.9 we are seeing this error:
> 
> 2012-06-20 10:48:04,251 WARN [lia_development] Exiting due to exception.
> java.lang.UnsupportedOperationException: You cannot set an index-time boost: norms are omitted for field 'sort_entry_human_id'
>        at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:85)
>        at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:276)
>        at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:852)
>        at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:2167)
>        at com.github.rnewson.couchdb.lucene.DatabaseIndexer.handleResponse(DatabaseIndexer.java:393)
>        at com.github.rnewson.couchdb.lucene.DatabaseIndexer.handleResponse(DatabaseIndexer.java:83)
>        at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:735)
>        at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:709)
>        at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:700)
>        at com.github.rnewson.couchdb.lucene.DatabaseIndexer.run(DatabaseIndexer.java:473)
>        at java.lang.Thread.run(Thread.java:680)
> 
> 
> 
> Removing the boost and restarting couchdb-lucene resolves the issue, I'm just curious as to what the problem may be?
> 
> 
> 
> Thanks,
> Rory