You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Steven White <sw...@gmail.com> on 2019/07/27 19:40:22 UTC

Ranking

Hi everyone,

I have 2 files like so:

FA has the letter "i" only 2 times, and the file size is 54,246 bytes
FB has the letter "i" 362 times and the file size is 9,953

When I search on the letter "i" FB is ranked lower which confuses me
because I was under the impression the occurrences of the term in a
document and the document size is a factor as such I was expecting FB to
rank higher.  Did I get this right?  If not, what's causing FB to rank
lower?

I'm on Solr 8.1

Thanks

Steven

Re: Ranking

Posted by David Hastings <DH...@wshein.com>.
I can’t imagine this is actually true unless you have a default copy field and I is in one of them. Also the letter “I” is a bizarre test case

> On Jul 27, 2019, at 3:40 PM, Steven White <sw...@gmail.com> wrote:
> 
> Hi everyone,
> 
> I have 2 files like so:
> 
> FA has the letter "i" only 2 times, and the file size is 54,246 bytes
> FB has the letter "i" 362 times and the file size is 9,953
> 
> When I search on the letter "i" FB is ranked lower which confuses me
> because I was under the impression the occurrences of the term in a
> document and the document size is a factor as such I was expecting FB to
> rank higher.  Did I get this right?  If not, what's causing FB to rank
> lower?
> 
> I'm on Solr 8.1
> 
> Thanks
> 
> Steven

Re: Ranking

Posted by Charlie Hull <ch...@flax.co.uk>.
There are also various tools including a Chrome plugin and (my own 
employer's) www.splainer.io that make the debug info a little easier to 
read and understand.

Cheers

Charlie

On 27/07/2019 21:55, Erik Hatcher wrote:
> The details of the scoring can be seen by setting &debug=true
>
>      Erik
>
>> On Jul 27, 2019, at 15:40, Steven White <sw...@gmail.com> wrote:
>>
>> Hi everyone,
>>
>> I have 2 files like so:
>>
>> FA has the letter "i" only 2 times, and the file size is 54,246 bytes
>> FB has the letter "i" 362 times and the file size is 9,953
>>
>> When I search on the letter "i" FB is ranked lower which confuses me
>> because I was under the impression the occurrences of the term in a
>> document and the document size is a factor as such I was expecting FB to
>> rank higher.  Did I get this right?  If not, what's causing FB to rank
>> lower?
>>
>> I'm on Solr 8.1
>>
>> Thanks
>>
>> Steven
> Delivered-To: charlie@flax.co.uk
> Received: by 2002:a17:906:2458:0:0:0:0 with SMTP id a24csp1586014ejb;
>          Sat, 27 Jul 2019 13:56:14 -0700 (PDT)
> X-Google-Smtp-Source: APXvYqyOJGFc4Jfb6hSGC2motoP0si1xBGLcaJJA51C4gS6Zvj3RhV87HVLng5R2Y5xLRevmVPEd
> X-Received: by 2002:a17:906:6986:: with SMTP id i6mr77368759ejr.89.1564260974407;
>          Sat, 27 Jul 2019 13:56:14 -0700 (PDT)
> ARC-Seal: i=1; a=rsa-sha256; t=1564260974; cv=none;
>          d=google.com; s=arc-20160816;
>          b=zN1DpMaPqAdm/h1qacUMD1I+QZIKptGL+PnvQz4ljHII0QwZa7Gx1TNvxaq+0nw4D4
>           drx9a7vt/UkqHCt2wtOTUMc1urSZ4E1nQJ+dbdvHg7xjy2huamH9k+9zBI1kepKvfcWx
>           YmlAS3JrTqrUmwrWxZ+CkOo3OQcZZmTMBD4DnYdFaPb3X+sMdsEBAIpsJwcnrNCtju5Q
>           b4ggGFIqHpW59puiTLwH2M8CJd4PQQ7V7nAgjRZM1Oe5heOmB+V4XxCxu7heEmPfqrEO
>           h+N5NKVKTh6E/8tIeySxUGbWrJjrRkd5u1XyLLeVIyRf4GTBqCjSO9IvXaEsyDBRmrXg
>           tEfw==
> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816;
>          h=to:in-reply-to:references:message-id:subject:date:mime-version
>           :content-transfer-encoding:from:dkim-signature:delivered-to:reply-to
>           :list-id:list-post:list-unsubscribe:list-help:precedence
>           :mailing-list;
>          bh=1gHvTKtoTkpa065pNgBbCPIiB7MlA4jsaGdI1mo8Lbo=;
>          b=saPnNho+dSivQ3e9Uu0lBLqYmPH2lyw1eGMFpPviInXy3sLb2Y3y1APtkoCXP9QMuE
>           JjUgYYsGqQDUTq7vTbmw+E2KcT24hIlAhPUULs7Qjvw6SVOPDph4JnwgSmtkSp6aqnuz
>           Ta1s/VuJMK26hay09FT84OweEcouXXz990wsidhx1upOLl1SFdeRK7OAVAKGtmsdGkC3
>           rP85w63QI30Y6gLZ4yBfMSnFX3x9ziUNtET0UrUe4GoKCxLlBjt3C8PI0dEb3IZvhPd0
>           oNcXWEpGI/zCi/8LB1dobg/7RIu52ZIU/1vk5m/DlSUMInDlhoQWU2pyMkYkIWsyw5NW
>           c6Jw==
> ARC-Authentication-Results: i=1; mx.google.com;
>         dkim=pass header.i=@gmail.com header.s=20161025 header.b=ZC6lb5Cm;
>         spf=pass (google.com: domain of solr-user-return-148978-charlie=flax.co.uk@lucene.apache.org designates 207.244.88.153 as permitted sender) smtp.mailfrom="solr-user-return-148978-charlie=flax.co.uk@lucene.apache.org";
>         dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
> Return-Path: <so...@lucene.apache.org>
> Received: from mail.apache.org (hermes.apache.org. [207.244.88.153])
>          by mx.google.com with SMTP id g28si14813479edc.275.2019.07.27.13.56.13
>          for <ch...@flax.co.uk>;
>          Sat, 27 Jul 2019 13:56:14 -0700 (PDT)
> Received-SPF: pass (google.com: domain of solr-user-return-148978-charlie=flax.co.uk@lucene.apache.org designates 207.244.88.153 as permitted sender) client-ip=207.244.88.153;
> Authentication-Results: mx.google.com;
>         dkim=pass header.i=@gmail.com header.s=20161025 header.b=ZC6lb5Cm;
>         spf=pass (google.com: domain of solr-user-return-148978-charlie=flax.co.uk@lucene.apache.org designates 207.244.88.153 as permitted sender) smtp.mailfrom="solr-user-return-148978-charlie=flax.co.uk@lucene.apache.org";
>         dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
> Received: (qmail 76698 invoked by uid 500); 27 Jul 2019 20:56:05 -0000
> Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm
> Precedence: bulk
> List-Help: <ma...@lucene.apache.org>
> List-Unsubscribe: <ma...@lucene.apache.org>
> List-Post: <ma...@lucene.apache.org>
> List-Id: <solr-user.lucene.apache.org>
> Reply-To: solr-user@lucene.apache.org
> Delivered-To: mailing list solr-user@lucene.apache.org
> Received: (qmail 76679 invoked by uid 99); 27 Jul 2019 20:56:02 -0000
> Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142)
>      by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Jul 2019 20:56:02 +0000
> Received: from localhost (localhost [127.0.0.1])
> 	by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 6F826180C7B
> 	for <so...@lucene.apache.org>; Sat, 27 Jul 2019 20:56:01 +0000 (UTC)
> X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org
> X-Spam-Flag: NO
> X-Spam-Score: -0.2
> X-Spam-Level:
> X-Spam-Status: No, score=-0.2 tagged_above=-999 required=6.31
> 	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
> 	DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
> 	SPF_PASS=-0.001] autolearn=disabled
> Authentication-Results: spamd3-us-west.apache.org (amavisd-new);
> 	dkim=pass (2048-bit key) header.d=gmail.com
> Received: from mx1-ec2-va.apache.org ([10.40.0.8])
> 	by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024)
> 	with ESMTP id tkntRGqBd7lZ for <so...@lucene.apache.org>;
> 	Sat, 27 Jul 2019 20:55:59 +0000 (UTC)
> Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.222.175; helo=mail-qk1-f175.google.com; envelope-from=erik.hatcher@gmail.com; receiver=<UNKNOWN>
> Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175])
> 	by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 261BCBC7B3
> 	for <so...@lucene.apache.org>; Sat, 27 Jul 2019 20:55:59 +0000 (UTC)
> Received: by mail-qk1-f175.google.com with SMTP id d15so41571526qkl.4
>          for <so...@lucene.apache.org>; Sat, 27 Jul 2019 13:55:59 -0700 (PDT)
> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
>          d=gmail.com; s=20161025;
>          h=from:content-transfer-encoding:mime-version:date:subject:message-id
>           :references:in-reply-to:to;
>          bh=1gHvTKtoTkpa065pNgBbCPIiB7MlA4jsaGdI1mo8Lbo=;
>          b=ZC6lb5CmIWySYfPuspRyKS8kpKRIgrHEALHWqB+cXPH187pmfYwKnSr1LIMNGiJso5
>           PBWWaIV8Rdt1rCOEiIZk6hWbC9xEsiSiAYuirIpJMAKsjigJXr+ua25jQDKB5EL/DIJ9
>           7Ygo2v5BzEmGb6h3Fxvmq71HEkwuOd5+Vi+6OoZdpkiuseD+pfEVUCp0FC0uAoP7wJKA
>           J/Z9xJvU4m0kCvIo9ofeNNCv/nmMBjBUjZOvA6EUOfKPuBf0HOT6rW1K5gUenabNTc3Y
>           hgqN3i5d8mRfM531Ts0/s90EbSrN+yKLnXsi5J7Y+ZGJzLgybGajBuJpGUy8zSxaq138
>           a7Mw==
> X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
>          d=1e100.net; s=20161025;
>          h=x-gm-message-state:from:content-transfer-encoding:mime-version:date
>           :subject:message-id:references:in-reply-to:to;
>          bh=1gHvTKtoTkpa065pNgBbCPIiB7MlA4jsaGdI1mo8Lbo=;
>          b=TQjzBgBLERdlcF7x7vkFeoWbONWInnLJTGH5xre4s0oCCMzTrqF3s3Fh6z8unQrOz4
>           6WY0czoSp83jXHH4mQqoERTz1gaIXZZguzwNBPWe8t76Qf+GCpXCsxU6ZLG6Cn/qydup
>           JcjcqeERlOMRySbUA17L7cDrUXWGh7x14KkdJqSByrXqatT00astGrTJswcmEfxiULTd
>           cFMja9+dBSEGradQMPQfkvKB3rizOjauXO13LojKmXpfrX3h5oSXPk1QdscVDBzMDBkd
>           rpUgMBLWVo/PgJ269AfhfAkr0sNeWfk0Vm+IOmLRokJ2OrOYoRR9i16uH1+r/GRxSqrY
>           Prhg==
> X-Gm-Message-State: APjAAAWgIU3qTtZge+065LST9X7uBq4HN90TvcjzsAQas1RpKTe48fSP
> 	AmBL+r3+kuch3DEuvd7/tbw/1siqIXo=
> X-Received: by 2002:a37:4e92:: with SMTP id c140mr62121531qkb.48.1564260952874;
>          Sat, 27 Jul 2019 13:55:52 -0700 (PDT)
> Received: from [192.168.0.102] ([71.51.161.116])
>          by smtp.gmail.com with ESMTPSA id r26sm24358675qkm.57.2019.07.27.13.55.52
>          for <so...@lucene.apache.org>
>          (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
>          Sat, 27 Jul 2019 13:55:52 -0700 (PDT)
> From: Erik Hatcher <er...@gmail.com>
> Content-Type: text/plain;
> 	charset=us-ascii
> Content-Transfer-Encoding: 7bit
> Mime-Version: 1.0 (1.0)
> Date: Sat, 27 Jul 2019 16:55:51 -0400
> Subject: Re: Ranking
> Message-Id: <9D...@gmail.com>
> References: <CA...@mail.gmail.com>
> In-Reply-To: <CA...@mail.gmail.com>
> To: solr-user@lucene.apache.org
> X-Mailer: iPhone Mail (16F203)
>
> The details of the scoring can be seen by setting &debug=true
>
>      Erik
>
>> On Jul 27, 2019, at 15:40, Steven White <sw...@gmail.com> wrote:
>>
>> Hi everyone,
>>
>> I have 2 files like so:
>>
>> FA has the letter "i" only 2 times, and the file size is 54,246 bytes
>> FB has the letter "i" 362 times and the file size is 9,953
>>
>> When I search on the letter "i" FB is ranked lower which confuses me
>> because I was under the impression the occurrences of the term in a
>> document and the document size is a factor as such I was expecting FB to
>> rank higher.  Did I get this right?  If not, what's causing FB to rank
>> lower?
>>
>> I'm on Solr 8.1
>>
>> Thanks
>>
>> Steven


-- 
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: Ranking

Posted by Erik Hatcher <er...@gmail.com>.
The details of the scoring can be seen by setting &debug=true 

    Erik 

> On Jul 27, 2019, at 15:40, Steven White <sw...@gmail.com> wrote:
> 
> Hi everyone,
> 
> I have 2 files like so:
> 
> FA has the letter "i" only 2 times, and the file size is 54,246 bytes
> FB has the letter "i" 362 times and the file size is 9,953
> 
> When I search on the letter "i" FB is ranked lower which confuses me
> because I was under the impression the occurrences of the term in a
> document and the document size is a factor as such I was expecting FB to
> rank higher.  Did I get this right?  If not, what's causing FB to rank
> lower?
> 
> I'm on Solr 8.1
> 
> Thanks
> 
> Steven