You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Alessandro Benedetti <a....@sease.io> on 2021/08/03 10:20:39 UTC

Re: min_popularity alternative for Solr Relatedness and Semantic Knowledge Graphs

you are very welcome, feel free to reply in this thread if you have any
updates, or open a Jira on Apache Solr (tagging me) or direct message.

Being Apache Solr an open-source project, any help coming from the
community is welcome, and as a committer, I would be delighted to
facilitate these contributions.

Cheers
--------------------------
Alessandro Benedetti
Apache Lucene/Solr Committer
Director, R&D Software Engineer, Search Consultant

www.sease.io


On Fri, 30 Jul 2021 at 08:33, Kerwin <ke...@gmail.com> wrote:

> Hi Alessandro,
>
> Thank you for spending some time to look into my query. I am still trying
> to understand the use of the function under computeRelatedness using the
> number 30 and also some other numbers. The use of the foreground count will
> help as an additional parameter if it were present. It will take me some
> time to work on your idea. Hence for now will continue with what I have.
> Thanks again for your inputs.
>
> On Mon, Jul 26, 2021 at 8:18 PM Alessandro Benedetti <a.benedetti@sease.io
> >
> wrote:
>
> > Hi Kerwin,
> > I was taking a look to your question and the
> > *org.apache.solr.search.facet.RelatednessAgg* code, in line :
> > --------------------------
> > Alessandro Benedetti
> > Apache Lucene/Solr Committer
> > Director, R&D Software Engineer, Search Consultant
> >
> > www.sease.io
> >
> >
> > On Thu, 22 Jul 2021 at 08:27, Kerwin <ke...@gmail.com> wrote:
> >
> > > Hi Solr users,
> > >
> > > I have a question on the relatedness and Semantic Knowledge Graphs
> > feature
> > > in Solr.
> > > While the results are good with the out of box provision, I need some
> > > tweaking on the ability to specify filters or parameters based on only
> > the
> > > foreground count. Right now only the min_popularity parameter is
> > available
> > > which applies to both the foreground dataset or the background one.
> >
> > so far so good
> >
> > > The
> > > white paper from Trey Grainger and his team mention that the z score is
> > > used to calculate the score. As per my understanding, the z score
> > assumes a
> > > normal distribution and is applicable when sample size>30 which I
> assume
> > is
> > > the foreground count.
> >
> > I don't have time right now to go through the paper, but the only place I
> > found the '30' magic number in the class is within this
> > method: org.apache.solr.search.facet.RelatednessAgg#computeRelatedness
> > It's not even defined as a constant nor a variable driven by a param so
> > it's not possible to change it unless we improve the code.
> >
> > > So I would like to control this value with a
> > > parameter or filter. Right now I am getting the approximate count by
> > doing
> > > a reverse calculation on the foreground popularity and the background
> > size
> > > to get the foreground count. Kindly correct me if my understanding is
> > > different from what it should be.
> > >
> > What I recommend is to take a look at the code references I put, and
> write
> > a contribution on your own to add the additional configuration with the
> > explanation.
> > As a committer, I would be happy to review such work and merge it in if
> it
> > improves the relatedness aggregation (we could take the occasion to also
> > rename some of the variables, which seem to not align with java standard
> > 'min_pop' => minPopularity, ect ect
> > Cheers
> >
>