You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Nutch User - 1 <nu...@gmail.com> on 2011/07/04 09:43:54 UTC

Searching for documents with a certain boost value

Hi.

As I have described here
(http://lucene.472066.n3.nabble.com/URL-redirection-and-zero-scores-td3085311.html)
I have encountered a situation where some of my indexed documents have
zero boost value.

I'd like to know if there's a way to search which ones have zero as
their boost value. I have tried to do a Lucene query with Luke but it
failed. The query was: boost:"00 00 00 00". (The boost field seems to be
a binary one, so it may have something to do with the problem.)

I allowed leading * in wildcard queries, and url:* returned me every
document as it should. However, boost:* returned none. Can this boost
field even be used as a search criteria?

Best regards,
Nutch User - 1

Re: Searching for documents with a certain boost value

Posted by lewis john mcgibbney <le...@gmail.com>.
Hi,

I am sorry that I have not been able to try and replicate the scenario and
confirm whether I get zero scores in a similar situation as I am temporarily
unable to do so but I would like to add this resource [1], if you have not
seen it yet. I am aware that this doesn't address the problem directly but
if we can start thinking more about the way scoring is done then maybe we
can get further to uncovering the solution to finding whether or not we can
search for fields within our document or documents within our index which
have a boost value of zero. Obviously the reference I include is relevant
specifically to Nutch versions using Lucene however I'm hoping that as we
are referring to scoring  done by the OPIC filter that the outcome will be
consistent across versions including those which do not use legacy Lucene.
Can someone please correct me if I am wrong here...

Focussing specifically on your question, it appears that a document field is
not shown if a term was not found in a particular field e.g. there is no
score value given. This would suggest that we cannot query for it, therefore
my gut instinct is that we cannot query for a zero value present within
these fields. N.B I cannot confirm this, I am merely going on the little
research I have done into the OPIC scoring algorithm. It would be nice if
someone could confirm otherwise and correct me though.

[1]
http://wiki.apache.org/nutch/FAQ#How_is_scoring_done_in_Nutch.3F_.28Or.2C_explain_the_.22explain.22_page.3F.29


---------- Forwarded message ----------
From: Nutch User - 1 <nu...@gmail.com>
Date: Mon, Jul 4, 2011 at 12:43 AM
Subject: Searching for documents with a certain boost value
To: user@nutch.apache.org


Hi.

As I have described here
(
http://lucene.472066.n3.nabble.com/URL-redirection-and-zero-scores-td3085311.html
)
I have encountered a situation where some of my indexed documents have
zero boost value.

I'd like to know if there's a way to search which ones have zero as
their boost value. I have tried to do a Lucene query with Luke but it
failed. The query was: boost:"00 00 00 00". (The boost field seems to be
a binary one, so it may have something to do with the problem.)

I allowed leading * in wildcard queries, and url:* returned me every
document as it should. However, boost:* returned none. Can this boost
field even be used as a search criteria?

Best regards,
Nutch User - 1



-- 
*Lewis*