You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Scott Stults <ss...@opensourceconnections.com> on 2016/04/12 21:25:23 UTC

Re: [More Like This] Query building

Hi Alessandro,

It's not uncommon for Solr patches to remain uncommitted for months, even
years. In fact some never get merged. Don't let that discourage you!


k/r,
Scott

On Fri, Mar 11, 2016 at 11:49 AM, Alessandro Benedetti <
abenedetti@apache.org> wrote:

> I start to feel that is not that easy to contribute improvements or small
> fix to Solr ( if they are not super interesting to the mass) .
> I think this one could be a good improvement in the MLT but I would love to
> discuss this with some committer.
> The patch is attached, it is there since months ago...
> Any feedback would be appreciated, I want to contribute, but I need some
> second opinions ...
>
> Cheers
>
> On 11 February 2016 at 13:48, Alessandro Benedetti <ab...@apache.org>
> wrote:
>
> > Hi Guys,
> > is it possible to have any feedback ?
> > Is there any process to speed up bug resolution / discussions ?
> > just want to understand if the patch is not good enough, if I need to
> > improve it or simply no-one took a look ...
> >
> > https://issues.apache.org/jira/browse/LUCENE-6954
> >
> > Cheers
> >
> > On 11 January 2016 at 15:25, Alessandro Benedetti <abenedetti@apache.org
> >
> > wrote:
> >
> >> Hi guys,
> >> the patch seems fine to me.
> >> I didn't spend much more time on the code but I checked the tests and
> the
> >> pre-commit checks.
> >> It seems fine to me.
> >> Let me know ,
> >>
> >> Cheers
> >>
> >> On 31 December 2015 at 18:40, Alessandro Benedetti <
> abenedetti@apache.org
> >> > wrote:
> >>
> >>> https://issues.apache.org/jira/browse/LUCENE-6954
> >>>
> >>> First draft patch available, I will check better the tests new year !
> >>>
> >>> On 29 December 2015 at 13:43, Alessandro Benedetti <
> >>> abenedetti@apache.org> wrote:
> >>>
> >>>> Sure, I will proceed tomorrow with the Jira and the simple patch +
> >>>> tests.
> >>>>
> >>>> In the meantime let's try to collect some additional feedback.
> >>>>
> >>>> Cheers
> >>>>
> >>>> On 29 December 2015 at 12:43, Anshum Gupta <an...@anshumgupta.net>
> >>>> wrote:
> >>>>
> >>>>> Feel free to create a JIRA and put up a patch if you can.
> >>>>>
> >>>>> On Tue, Dec 29, 2015 at 4:26 PM, Alessandro Benedetti <
> >>>>> abenedetti@apache.org
> >>>>> > wrote:
> >>>>>
> >>>>> > Hi guys,
> >>>>> > While I was exploring the way we build the More Like This query, I
> >>>>> > discovered a part I am not convinced of :
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> > Let's see how we build the query :
> >>>>> > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int)
> >>>>> >
> >>>>> > 1) we extract the terms from the interesting fields, adding them to
> >>>>> a map :
> >>>>> >
> >>>>> > Map<String, Int> termFreqMap = new HashMap<>();
> >>>>> >
> >>>>> > *( we lose the relation field-> term, we don't know anymore where
> >>>>> the term
> >>>>> > was coming ! )*
> >>>>> >
> >>>>> > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue
> >>>>> >
> >>>>> > 2) we build the queue that will contain the query terms, at this
> >>>>> point we
> >>>>> > connect again there terms to some field, but :
> >>>>> >
> >>>>> > ...
> >>>>> >> // go through all the fields and find the largest document
> frequency
> >>>>> >> String topField = fieldNames[0];
> >>>>> >> int docFreq = 0;
> >>>>> >> for (String fieldName : fieldNames) {
> >>>>> >>   int freq = ir.docFreq(new Term(fieldName, word));
> >>>>> >>   topField = (freq > docFreq) ? fieldName : topField;
> >>>>> >>   docFreq = (freq > docFreq) ? freq : docFreq;
> >>>>> >> }
> >>>>> >> ...
> >>>>> >
> >>>>> >
> >>>>> > We identify the topField as the field with the highest document
> >>>>> frequency
> >>>>> > for the term t .
> >>>>> > Then we build the termQuery :
> >>>>> >
> >>>>> > queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq,
> tf));
> >>>>> >
> >>>>> > In this way we lose a lot of precision.
> >>>>> > Not sure why we do that.
> >>>>> > I would prefer to keep the relation between terms and fields.
> >>>>> > The MLT query can improve a lot the quality.
> >>>>> > If i run the MLT on 2 fields : *description* and *facilities* for
> >>>>> example.
> >>>>> > It is likely I want to find documents with similar terms in the
> >>>>> > description and similar terms in the facilities, without mixing up
> >>>>> the
> >>>>> > things and loosing the semantic of the terms.
> >>>>> >
> >>>>> > Let me know your opinion,
> >>>>> >
> >>>>> > Cheers
> >>>>> >
> >>>>> >
> >>>>> > --
> >>>>> > --------------------------
> >>>>> >
> >>>>> > Benedetti Alessandro
> >>>>> > Visiting card : http://about.me/alessandro_benedetti
> >>>>> >
> >>>>> > "Tyger, tyger burning bright
> >>>>> > In the forests of the night,
> >>>>> > What immortal hand or eye
> >>>>> > Could frame thy fearful symmetry?"
> >>>>> >
> >>>>> > William Blake - Songs of Experience -1794 England
> >>>>> >
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Anshum Gupta
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> --------------------------
> >>>>
> >>>> Benedetti Alessandro
> >>>> Visiting card : http://about.me/alessandro_benedetti
> >>>>
> >>>> "Tyger, tyger burning bright
> >>>> In the forests of the night,
> >>>> What immortal hand or eye
> >>>> Could frame thy fearful symmetry?"
> >>>>
> >>>> William Blake - Songs of Experience -1794 England
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> --------------------------
> >>>
> >>> Benedetti Alessandro
> >>> Visiting card : http://about.me/alessandro_benedetti
> >>>
> >>> "Tyger, tyger burning bright
> >>> In the forests of the night,
> >>> What immortal hand or eye
> >>> Could frame thy fearful symmetry?"
> >>>
> >>> William Blake - Songs of Experience -1794 England
> >>>
> >>
> >>
> >>
> >> --
> >> --------------------------
> >>
> >> Benedetti Alessandro
> >> Visiting card : http://about.me/alessandro_benedetti
> >>
> >> "Tyger, tyger burning bright
> >> In the forests of the night,
> >> What immortal hand or eye
> >> Could frame thy fearful symmetry?"
> >>
> >> William Blake - Songs of Experience -1794 England
> >>
> >
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com