You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Steven White <sw...@gmail.com> on 2019/07/27 19:40:22 UTC
Ranking
Hi everyone,
I have 2 files like so:
FA has the letter "i" only 2 times, and the file size is 54,246 bytes
FB has the letter "i" 362 times and the file size is 9,953
When I search on the letter "i" FB is ranked lower which confuses me
because I was under the impression the occurrences of the term in a
document and the document size is a factor as such I was expecting FB to
rank higher. Did I get this right? If not, what's causing FB to rank
lower?
I'm on Solr 8.1
Thanks
Steven
Re: Ranking
Posted by David Hastings <DH...@wshein.com>.
I can’t imagine this is actually true unless you have a default copy field and I is in one of them. Also the letter “I” is a bizarre test case
> On Jul 27, 2019, at 3:40 PM, Steven White <sw...@gmail.com> wrote:
>
> Hi everyone,
>
> I have 2 files like so:
>
> FA has the letter "i" only 2 times, and the file size is 54,246 bytes
> FB has the letter "i" 362 times and the file size is 9,953
>
> When I search on the letter "i" FB is ranked lower which confuses me
> because I was under the impression the occurrences of the term in a
> document and the document size is a factor as such I was expecting FB to
> rank higher. Did I get this right? If not, what's causing FB to rank
> lower?
>
> I'm on Solr 8.1
>
> Thanks
>
> Steven
Re: Ranking
Posted by Charlie Hull <ch...@flax.co.uk>.
There are also various tools including a Chrome plugin and (my own
employer's) www.splainer.io that make the debug info a little easier to
read and understand.
Cheers
Charlie
On 27/07/2019 21:55, Erik Hatcher wrote:
> The details of the scoring can be seen by setting &debug=true
>
> Erik
>
>> On Jul 27, 2019, at 15:40, Steven White <sw...@gmail.com> wrote:
>>
>> Hi everyone,
>>
>> I have 2 files like so:
>>
>> FA has the letter "i" only 2 times, and the file size is 54,246 bytes
>> FB has the letter "i" 362 times and the file size is 9,953
>>
>> When I search on the letter "i" FB is ranked lower which confuses me
>> because I was under the impression the occurrences of the term in a
>> document and the document size is a factor as such I was expecting FB to
>> rank higher. Did I get this right? If not, what's causing FB to rank
>> lower?
>>
>> I'm on Solr 8.1
>>
>> Thanks
>>
>> Steven
> Delivered-To: charlie@flax.co.uk
> Received: by 2002:a17:906:2458:0:0:0:0 with SMTP id a24csp1586014ejb;
> Sat, 27 Jul 2019 13:56:14 -0700 (PDT)
> X-Google-Smtp-Source: APXvYqyOJGFc4Jfb6hSGC2motoP0si1xBGLcaJJA51C4gS6Zvj3RhV87HVLng5R2Y5xLRevmVPEd
> X-Received: by 2002:a17:906:6986:: with SMTP id i6mr77368759ejr.89.1564260974407;
> Sat, 27 Jul 2019 13:56:14 -0700 (PDT)
> ARC-Seal: i=1; a=rsa-sha256; t=1564260974; cv=none;
> d=google.com; s=arc-20160816;
> b=zN1DpMaPqAdm/h1qacUMD1I+QZIKptGL+PnvQz4ljHII0QwZa7Gx1TNvxaq+0nw4D4
> drx9a7vt/UkqHCt2wtOTUMc1urSZ4E1nQJ+dbdvHg7xjy2huamH9k+9zBI1kepKvfcWx
> YmlAS3JrTqrUmwrWxZ+CkOo3OQcZZmTMBD4DnYdFaPb3X+sMdsEBAIpsJwcnrNCtju5Q
> b4ggGFIqHpW59puiTLwH2M8CJd4PQQ7V7nAgjRZM1Oe5heOmB+V4XxCxu7heEmPfqrEO
> h+N5NKVKTh6E/8tIeySxUGbWrJjrRkd5u1XyLLeVIyRf4GTBqCjSO9IvXaEsyDBRmrXg
> tEfw==
> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816;
> h=to:in-reply-to:references:message-id:subject:date:mime-version
> :content-transfer-encoding:from:dkim-signature:delivered-to:reply-to
> :list-id:list-post:list-unsubscribe:list-help:precedence
> :mailing-list;
> bh=1gHvTKtoTkpa065pNgBbCPIiB7MlA4jsaGdI1mo8Lbo=;
> b=saPnNho+dSivQ3e9Uu0lBLqYmPH2lyw1eGMFpPviInXy3sLb2Y3y1APtkoCXP9QMuE
> JjUgYYsGqQDUTq7vTbmw+E2KcT24hIlAhPUULs7Qjvw6SVOPDph4JnwgSmtkSp6aqnuz
> Ta1s/VuJMK26hay09FT84OweEcouXXz990wsidhx1upOLl1SFdeRK7OAVAKGtmsdGkC3
> rP85w63QI30Y6gLZ4yBfMSnFX3x9ziUNtET0UrUe4GoKCxLlBjt3C8PI0dEb3IZvhPd0
> oNcXWEpGI/zCi/8LB1dobg/7RIu52ZIU/1vk5m/DlSUMInDlhoQWU2pyMkYkIWsyw5NW
> c6Jw==
> ARC-Authentication-Results: i=1; mx.google.com;
> dkim=pass header.i=@gmail.com header.s=20161025 header.b=ZC6lb5Cm;
> spf=pass (google.com: domain of solr-user-return-148978-charlie=flax.co.uk@lucene.apache.org designates 207.244.88.153 as permitted sender) smtp.mailfrom="solr-user-return-148978-charlie=flax.co.uk@lucene.apache.org";
> dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
> Return-Path: <so...@lucene.apache.org>
> Received: from mail.apache.org (hermes.apache.org. [207.244.88.153])
> by mx.google.com with SMTP id g28si14813479edc.275.2019.07.27.13.56.13
> for <ch...@flax.co.uk>;
> Sat, 27 Jul 2019 13:56:14 -0700 (PDT)
> Received-SPF: pass (google.com: domain of solr-user-return-148978-charlie=flax.co.uk@lucene.apache.org designates 207.244.88.153 as permitted sender) client-ip=207.244.88.153;
> Authentication-Results: mx.google.com;
> dkim=pass header.i=@gmail.com header.s=20161025 header.b=ZC6lb5Cm;
> spf=pass (google.com: domain of solr-user-return-148978-charlie=flax.co.uk@lucene.apache.org designates 207.244.88.153 as permitted sender) smtp.mailfrom="solr-user-return-148978-charlie=flax.co.uk@lucene.apache.org";
> dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
> Received: (qmail 76698 invoked by uid 500); 27 Jul 2019 20:56:05 -0000
> Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm
> Precedence: bulk
> List-Help: <ma...@lucene.apache.org>
> List-Unsubscribe: <ma...@lucene.apache.org>
> List-Post: <ma...@lucene.apache.org>
> List-Id: <solr-user.lucene.apache.org>
> Reply-To: solr-user@lucene.apache.org
> Delivered-To: mailing list solr-user@lucene.apache.org
> Received: (qmail 76679 invoked by uid 99); 27 Jul 2019 20:56:02 -0000
> Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142)
> by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Jul 2019 20:56:02 +0000
> Received: from localhost (localhost [127.0.0.1])
> by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 6F826180C7B
> for <so...@lucene.apache.org>; Sat, 27 Jul 2019 20:56:01 +0000 (UTC)
> X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org
> X-Spam-Flag: NO
> X-Spam-Score: -0.2
> X-Spam-Level:
> X-Spam-Status: No, score=-0.2 tagged_above=-999 required=6.31
> tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
> DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
> SPF_PASS=-0.001] autolearn=disabled
> Authentication-Results: spamd3-us-west.apache.org (amavisd-new);
> dkim=pass (2048-bit key) header.d=gmail.com
> Received: from mx1-ec2-va.apache.org ([10.40.0.8])
> by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024)
> with ESMTP id tkntRGqBd7lZ for <so...@lucene.apache.org>;
> Sat, 27 Jul 2019 20:55:59 +0000 (UTC)
> Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.222.175; helo=mail-qk1-f175.google.com; envelope-from=erik.hatcher@gmail.com; receiver=<UNKNOWN>
> Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175])
> by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 261BCBC7B3
> for <so...@lucene.apache.org>; Sat, 27 Jul 2019 20:55:59 +0000 (UTC)
> Received: by mail-qk1-f175.google.com with SMTP id d15so41571526qkl.4
> for <so...@lucene.apache.org>; Sat, 27 Jul 2019 13:55:59 -0700 (PDT)
> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
> d=gmail.com; s=20161025;
> h=from:content-transfer-encoding:mime-version:date:subject:message-id
> :references:in-reply-to:to;
> bh=1gHvTKtoTkpa065pNgBbCPIiB7MlA4jsaGdI1mo8Lbo=;
> b=ZC6lb5CmIWySYfPuspRyKS8kpKRIgrHEALHWqB+cXPH187pmfYwKnSr1LIMNGiJso5
> PBWWaIV8Rdt1rCOEiIZk6hWbC9xEsiSiAYuirIpJMAKsjigJXr+ua25jQDKB5EL/DIJ9
> 7Ygo2v5BzEmGb6h3Fxvmq71HEkwuOd5+Vi+6OoZdpkiuseD+pfEVUCp0FC0uAoP7wJKA
> J/Z9xJvU4m0kCvIo9ofeNNCv/nmMBjBUjZOvA6EUOfKPuBf0HOT6rW1K5gUenabNTc3Y
> hgqN3i5d8mRfM531Ts0/s90EbSrN+yKLnXsi5J7Y+ZGJzLgybGajBuJpGUy8zSxaq138
> a7Mw==
> X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
> d=1e100.net; s=20161025;
> h=x-gm-message-state:from:content-transfer-encoding:mime-version:date
> :subject:message-id:references:in-reply-to:to;
> bh=1gHvTKtoTkpa065pNgBbCPIiB7MlA4jsaGdI1mo8Lbo=;
> b=TQjzBgBLERdlcF7x7vkFeoWbONWInnLJTGH5xre4s0oCCMzTrqF3s3Fh6z8unQrOz4
> 6WY0czoSp83jXHH4mQqoERTz1gaIXZZguzwNBPWe8t76Qf+GCpXCsxU6ZLG6Cn/qydup
> JcjcqeERlOMRySbUA17L7cDrUXWGh7x14KkdJqSByrXqatT00astGrTJswcmEfxiULTd
> cFMja9+dBSEGradQMPQfkvKB3rizOjauXO13LojKmXpfrX3h5oSXPk1QdscVDBzMDBkd
> rpUgMBLWVo/PgJ269AfhfAkr0sNeWfk0Vm+IOmLRokJ2OrOYoRR9i16uH1+r/GRxSqrY
> Prhg==
> X-Gm-Message-State: APjAAAWgIU3qTtZge+065LST9X7uBq4HN90TvcjzsAQas1RpKTe48fSP
> AmBL+r3+kuch3DEuvd7/tbw/1siqIXo=
> X-Received: by 2002:a37:4e92:: with SMTP id c140mr62121531qkb.48.1564260952874;
> Sat, 27 Jul 2019 13:55:52 -0700 (PDT)
> Received: from [192.168.0.102] ([71.51.161.116])
> by smtp.gmail.com with ESMTPSA id r26sm24358675qkm.57.2019.07.27.13.55.52
> for <so...@lucene.apache.org>
> (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
> Sat, 27 Jul 2019 13:55:52 -0700 (PDT)
> From: Erik Hatcher <er...@gmail.com>
> Content-Type: text/plain;
> charset=us-ascii
> Content-Transfer-Encoding: 7bit
> Mime-Version: 1.0 (1.0)
> Date: Sat, 27 Jul 2019 16:55:51 -0400
> Subject: Re: Ranking
> Message-Id: <9D...@gmail.com>
> References: <CA...@mail.gmail.com>
> In-Reply-To: <CA...@mail.gmail.com>
> To: solr-user@lucene.apache.org
> X-Mailer: iPhone Mail (16F203)
>
> The details of the scoring can be seen by setting &debug=true
>
> Erik
>
>> On Jul 27, 2019, at 15:40, Steven White <sw...@gmail.com> wrote:
>>
>> Hi everyone,
>>
>> I have 2 files like so:
>>
>> FA has the letter "i" only 2 times, and the file size is 54,246 bytes
>> FB has the letter "i" 362 times and the file size is 9,953
>>
>> When I search on the letter "i" FB is ranked lower which confuses me
>> because I was under the impression the occurrences of the term in a
>> document and the document size is a factor as such I was expecting FB to
>> rank higher. Did I get this right? If not, what's causing FB to rank
>> lower?
>>
>> I'm on Solr 8.1
>>
>> Thanks
>>
>> Steven
--
Charlie Hull
Flax - Open Source Enterprise Search
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828
web: www.flax.co.uk
Re: Ranking
Posted by Erik Hatcher <er...@gmail.com>.
The details of the scoring can be seen by setting &debug=true
Erik
> On Jul 27, 2019, at 15:40, Steven White <sw...@gmail.com> wrote:
>
> Hi everyone,
>
> I have 2 files like so:
>
> FA has the letter "i" only 2 times, and the file size is 54,246 bytes
> FB has the letter "i" 362 times and the file size is 9,953
>
> When I search on the letter "i" FB is ranked lower which confuses me
> because I was under the impression the occurrences of the term in a
> document and the document size is a factor as such I was expecting FB to
> rank higher. Did I get this right? If not, what's causing FB to rank
> lower?
>
> I'm on Solr 8.1
>
> Thanks
>
> Steven