You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Batanun B <ba...@hotmail.com> on 2022/09/08 10:16:15 UTC

MoreLikeThis with externally supplied text, and facets?

Hi,

I'm evaluating if the MoreLikeThis (mlt) feature of solr can be useful for our editors when they are creating new content. We want to trigger this before the content has been inserted in the system, so there is no document in solr that we can use as a base for the mlt search. So we want to use the "externally supplied text" feature, where we provide the article text in the request body of the search. This works great when we use the mlt request handler (/mlt). But we also would like to get facets for this search, and bug SOLR-7883 is stopping us from doing that.

Some people suggest that we use the mlt query parser instead, as part of our regular request parser (/select). But I can't get that to work together with the "externally supplied text". It gives me the error "Bad contentType for search handler :text/plain".

So, does anyone know how to do a search that uses MoreLikeThis with externally supplied text, and facets at the same time?

Regards


Re: MoreLikeThis with externally supplied text, and facets?

Posted by Alessandro Benedetti <a....@sease.io>.
Automatic pre-processing of documents may be a good fit for an Update
Request Processor.
In Apache Solr I contributed a while back to the:
https://sease.io/2015/07/solr-document-classification-part-1-indexing-time.html


This update request processor internally uses the Apache Lucene document
classification module, that uses the More Like This internally.

I know that potentially you want your editor's feedback in a supervised
way, but if your direction is toward automatic enrichment, take a look as
it could be interesting as an inspiration for your use case.

Cheers
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benedetti@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>


On Thu, 8 Sept 2022 at 12:16, Batanun B <ba...@hotmail.com> wrote:

> Hi,
>
> I'm evaluating if the MoreLikeThis (mlt) feature of solr can be useful for
> our editors when they are creating new content. We want to trigger this
> before the content has been inserted in the system, so there is no document
> in solr that we can use as a base for the mlt search. So we want to use the
> "externally supplied text" feature, where we provide the article text in
> the request body of the search. This works great when we use the mlt
> request handler (/mlt). But we also would like to get facets for this
> search, and bug SOLR-7883 is stopping us from doing that.
>
> Some people suggest that we use the mlt query parser instead, as part of
> our regular request parser (/select). But I can't get that to work together
> with the "externally supplied text". It gives me the error "Bad contentType
> for search handler :text/plain".
>
> So, does anyone know how to do a search that uses MoreLikeThis with
> externally supplied text, and facets at the same time?
>
> Regards
>
>

Re: MoreLikeThis with externally supplied text, and facets?

Posted by Walter Underwood <wu...@wunderwood.org>.
I made this work with 6.x but don’t remember the details, sorry. I think it wanted application/something, maybe the POST format. 

wunder

> On Sep 9, 2022, at 1:38 PM, Mikhail Khludnev <mk...@apache.org> wrote:
> 
> Hold on. JSON query DSL lets you pass quite long content via body. It
> should support {!mlt}. At least it's worth a try.!
> 
>> On Thu, Sep 8, 2022 at 2:53 PM Mikhail Khludnev <mk...@apache.org> wrote:
>> 
>> Hello Batanun
>> I checked {!mlt} source code. It can't swallow external content. I've
>> found that Lucene XML parser
>> https://lucene.apache.org/core/9_1_0/queryparser/org/apache/lucene/queryparser/xml/CorePlusQueriesParser.html
>> is capable to handle <LikeThisQuery>. However, it's diverged and not
>> available in Solr out-of-the-box
>> https://solr.apache.org/guide/8_8/other-parsers.html#xml-query-parser
>> 
>> 
>>> On Thu, Sep 8, 2022 at 1:16 PM Batanun B <ba...@hotmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I'm evaluating if the MoreLikeThis (mlt) feature of solr can be useful
>>> for our editors when they are creating new content. We want to trigger this
>>> before the content has been inserted in the system, so there is no document
>>> in solr that we can use as a base for the mlt search. So we want to use the
>>> "externally supplied text" feature, where we provide the article text in
>>> the request body of the search. This works great when we use the mlt
>>> request handler (/mlt). But we also would like to get facets for this
>>> search, and bug SOLR-7883 is stopping us from doing that.
>>> 
>>> Some people suggest that we use the mlt query parser instead, as part of
>>> our regular request parser (/select). But I can't get that to work together
>>> with the "externally supplied text". It gives me the error "Bad contentType
>>> for search handler :text/plain".
>>> 
>>> So, does anyone know how to do a search that uses MoreLikeThis with
>>> externally supplied text, and facets at the same time?
>>> 
>>> Regards
>>> 
>>> 
>> 
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev


Re: Announcing {!mlt_content} Re: MoreLikeThis with externally supplied text, and facets?

Posted by Alessandro Benedetti <a....@sease.io>.
Well done Mikhail!
Bit busy in the next month but over Christmas, I'll take a look!
So good to see there's still interest in the More Like This (that I really
love)
Cheers
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benedetti@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>


On Sat, 26 Nov 2022 at 16:08, Mikhail Khludnev <mk...@apache.org> wrote:

> Hi,
> I've made {!mlt_content} accepting external content in cloud mode. See
> https://issues.apache.org/jira/browse/SOLR-16420
> It will be released at 9.2. Meanwhile, you can check it in snapshot like
>
> https://ci-builds.apache.org/job/Solr/job/Solr-Artifacts-9x/lastStableBuild/artifact/solr/packaging/build/distributions/
>
> https://ci-builds.apache.org/job/Solr/job/Solr-Artifacts-9x/322/artifact/solr/packaging/build/distributions/solr-9.2.0-jenkins322.tgz
>
> It works on techproducts example like
> q={!mlt_content qf=name mindf=0 mintf=0}SDRAM Unbuffered
>
> On Wed, Sep 28, 2022 at 5:32 PM Mikhail Khludnev <mk...@apache.org> wrote:
>
> > I think https://github.com/apache/solr/pull/1045 is ready. Reviews are
> > welcome.
> >
> > On Tue, Sep 20, 2022 at 12:15 PM Mikhail Khludnev <mk...@apache.org>
> wrote:
> >
> >> For reference, the trick above doesn't work now, I'll work on it under
> >> https://issues.apache.org/jira/browse/SOLR-16420.
> >> Note, that fix for facet support in /mlt handler will be released under
> >> 9.1.
> >>
> >> On Fri, Sep 9, 2022 at 2:37 PM Mikhail Khludnev <mk...@apache.org>
> wrote:
> >>
> >>> Hold on. JSON query DSL lets you pass quite long content via body. It
> >>> should support {!mlt}. At least it's worth a try.!
> >>>
> >>> On Thu, Sep 8, 2022 at 2:53 PM Mikhail Khludnev <mk...@apache.org>
> wrote:
> >>>
> >>>> Hello Batanun
> >>>> I checked {!mlt} source code. It can't swallow external content. I've
> >>>> found that Lucene XML parser
> >>>>
> https://lucene.apache.org/core/9_1_0/queryparser/org/apache/lucene/queryparser/xml/CorePlusQueriesParser.html
> >>>> is capable to handle <LikeThisQuery>. However, it's diverged and not
> >>>> available in Solr out-of-the-box
> >>>> https://solr.apache.org/guide/8_8/other-parsers.html#xml-query-parser
> >>>>
> >>>>
> >>>> On Thu, Sep 8, 2022 at 1:16 PM Batanun B <ba...@hotmail.com> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> I'm evaluating if the MoreLikeThis (mlt) feature of solr can be
> useful
> >>>>> for our editors when they are creating new content. We want to
> trigger this
> >>>>> before the content has been inserted in the system, so there is no
> document
> >>>>> in solr that we can use as a base for the mlt search. So we want to
> use the
> >>>>> "externally supplied text" feature, where we provide the article
> text in
> >>>>> the request body of the search. This works great when we use the mlt
> >>>>> request handler (/mlt). But we also would like to get facets for this
> >>>>> search, and bug SOLR-7883 is stopping us from doing that.
> >>>>>
> >>>>> Some people suggest that we use the mlt query parser instead, as part
> >>>>> of our regular request parser (/select). But I can't get that to work
> >>>>> together with the "externally supplied text". It gives me the error
> "Bad
> >>>>> contentType for search handler :text/plain".
> >>>>>
> >>>>> So, does anyone know how to do a search that uses MoreLikeThis with
> >>>>> externally supplied text, and facets at the same time?
> >>>>>
> >>>>> Regards
> >>>>>
> >>>>>
> >>>>
> >>>> --
> >>>> Sincerely yours
> >>>> Mikhail Khludnev
> >>>>
> >>>
> >>>
> >>> --
> >>> Sincerely yours
> >>> Mikhail Khludnev
> >>>
> >>
> >>
> >> --
> >> Sincerely yours
> >> Mikhail Khludnev
> >>
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Announcing {!mlt_content} Re: MoreLikeThis with externally supplied text, and facets?

Posted by Mikhail Khludnev <mk...@apache.org>.
Hi,
I've made {!mlt_content} accepting external content in cloud mode. See
https://issues.apache.org/jira/browse/SOLR-16420
It will be released at 9.2. Meanwhile, you can check it in snapshot like
https://ci-builds.apache.org/job/Solr/job/Solr-Artifacts-9x/lastStableBuild/artifact/solr/packaging/build/distributions/
https://ci-builds.apache.org/job/Solr/job/Solr-Artifacts-9x/322/artifact/solr/packaging/build/distributions/solr-9.2.0-jenkins322.tgz

It works on techproducts example like
q={!mlt_content qf=name mindf=0 mintf=0}SDRAM Unbuffered

On Wed, Sep 28, 2022 at 5:32 PM Mikhail Khludnev <mk...@apache.org> wrote:

> I think https://github.com/apache/solr/pull/1045 is ready. Reviews are
> welcome.
>
> On Tue, Sep 20, 2022 at 12:15 PM Mikhail Khludnev <mk...@apache.org> wrote:
>
>> For reference, the trick above doesn't work now, I'll work on it under
>> https://issues.apache.org/jira/browse/SOLR-16420.
>> Note, that fix for facet support in /mlt handler will be released under
>> 9.1.
>>
>> On Fri, Sep 9, 2022 at 2:37 PM Mikhail Khludnev <mk...@apache.org> wrote:
>>
>>> Hold on. JSON query DSL lets you pass quite long content via body. It
>>> should support {!mlt}. At least it's worth a try.!
>>>
>>> On Thu, Sep 8, 2022 at 2:53 PM Mikhail Khludnev <mk...@apache.org> wrote:
>>>
>>>> Hello Batanun
>>>> I checked {!mlt} source code. It can't swallow external content. I've
>>>> found that Lucene XML parser
>>>> https://lucene.apache.org/core/9_1_0/queryparser/org/apache/lucene/queryparser/xml/CorePlusQueriesParser.html
>>>> is capable to handle <LikeThisQuery>. However, it's diverged and not
>>>> available in Solr out-of-the-box
>>>> https://solr.apache.org/guide/8_8/other-parsers.html#xml-query-parser
>>>>
>>>>
>>>> On Thu, Sep 8, 2022 at 1:16 PM Batanun B <ba...@hotmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm evaluating if the MoreLikeThis (mlt) feature of solr can be useful
>>>>> for our editors when they are creating new content. We want to trigger this
>>>>> before the content has been inserted in the system, so there is no document
>>>>> in solr that we can use as a base for the mlt search. So we want to use the
>>>>> "externally supplied text" feature, where we provide the article text in
>>>>> the request body of the search. This works great when we use the mlt
>>>>> request handler (/mlt). But we also would like to get facets for this
>>>>> search, and bug SOLR-7883 is stopping us from doing that.
>>>>>
>>>>> Some people suggest that we use the mlt query parser instead, as part
>>>>> of our regular request parser (/select). But I can't get that to work
>>>>> together with the "externally supplied text". It gives me the error "Bad
>>>>> contentType for search handler :text/plain".
>>>>>
>>>>> So, does anyone know how to do a search that uses MoreLikeThis with
>>>>> externally supplied text, and facets at the same time?
>>>>>
>>>>> Regards
>>>>>
>>>>>
>>>>
>>>> --
>>>> Sincerely yours
>>>> Mikhail Khludnev
>>>>
>>>
>>>
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev

Re: MoreLikeThis with externally supplied text, and facets?

Posted by Mikhail Khludnev <mk...@apache.org>.
I think https://github.com/apache/solr/pull/1045 is ready. Reviews are
welcome.

On Tue, Sep 20, 2022 at 12:15 PM Mikhail Khludnev <mk...@apache.org> wrote:

> For reference, the trick above doesn't work now, I'll work on it under
> https://issues.apache.org/jira/browse/SOLR-16420.
> Note, that fix for facet support in /mlt handler will be released under
> 9.1.
>
> On Fri, Sep 9, 2022 at 2:37 PM Mikhail Khludnev <mk...@apache.org> wrote:
>
>> Hold on. JSON query DSL lets you pass quite long content via body. It
>> should support {!mlt}. At least it's worth a try.!
>>
>> On Thu, Sep 8, 2022 at 2:53 PM Mikhail Khludnev <mk...@apache.org> wrote:
>>
>>> Hello Batanun
>>> I checked {!mlt} source code. It can't swallow external content. I've
>>> found that Lucene XML parser
>>> https://lucene.apache.org/core/9_1_0/queryparser/org/apache/lucene/queryparser/xml/CorePlusQueriesParser.html
>>> is capable to handle <LikeThisQuery>. However, it's diverged and not
>>> available in Solr out-of-the-box
>>> https://solr.apache.org/guide/8_8/other-parsers.html#xml-query-parser
>>>
>>>
>>> On Thu, Sep 8, 2022 at 1:16 PM Batanun B <ba...@hotmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm evaluating if the MoreLikeThis (mlt) feature of solr can be useful
>>>> for our editors when they are creating new content. We want to trigger this
>>>> before the content has been inserted in the system, so there is no document
>>>> in solr that we can use as a base for the mlt search. So we want to use the
>>>> "externally supplied text" feature, where we provide the article text in
>>>> the request body of the search. This works great when we use the mlt
>>>> request handler (/mlt). But we also would like to get facets for this
>>>> search, and bug SOLR-7883 is stopping us from doing that.
>>>>
>>>> Some people suggest that we use the mlt query parser instead, as part
>>>> of our regular request parser (/select). But I can't get that to work
>>>> together with the "externally supplied text". It gives me the error "Bad
>>>> contentType for search handler :text/plain".
>>>>
>>>> So, does anyone know how to do a search that uses MoreLikeThis with
>>>> externally supplied text, and facets at the same time?
>>>>
>>>> Regards
>>>>
>>>>
>>>
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev

Re: MoreLikeThis with externally supplied text, and facets?

Posted by Mikhail Khludnev <mk...@apache.org>.
For reference, the trick above doesn't work now, I'll work on it under
https://issues.apache.org/jira/browse/SOLR-16420.
Note, that fix for facet support in /mlt handler will be released under 9.1.

On Fri, Sep 9, 2022 at 2:37 PM Mikhail Khludnev <mk...@apache.org> wrote:

> Hold on. JSON query DSL lets you pass quite long content via body. It
> should support {!mlt}. At least it's worth a try.!
>
> On Thu, Sep 8, 2022 at 2:53 PM Mikhail Khludnev <mk...@apache.org> wrote:
>
>> Hello Batanun
>> I checked {!mlt} source code. It can't swallow external content. I've
>> found that Lucene XML parser
>> https://lucene.apache.org/core/9_1_0/queryparser/org/apache/lucene/queryparser/xml/CorePlusQueriesParser.html
>> is capable to handle <LikeThisQuery>. However, it's diverged and not
>> available in Solr out-of-the-box
>> https://solr.apache.org/guide/8_8/other-parsers.html#xml-query-parser
>>
>>
>> On Thu, Sep 8, 2022 at 1:16 PM Batanun B <ba...@hotmail.com> wrote:
>>
>>> Hi,
>>>
>>> I'm evaluating if the MoreLikeThis (mlt) feature of solr can be useful
>>> for our editors when they are creating new content. We want to trigger this
>>> before the content has been inserted in the system, so there is no document
>>> in solr that we can use as a base for the mlt search. So we want to use the
>>> "externally supplied text" feature, where we provide the article text in
>>> the request body of the search. This works great when we use the mlt
>>> request handler (/mlt). But we also would like to get facets for this
>>> search, and bug SOLR-7883 is stopping us from doing that.
>>>
>>> Some people suggest that we use the mlt query parser instead, as part of
>>> our regular request parser (/select). But I can't get that to work together
>>> with the "externally supplied text". It gives me the error "Bad contentType
>>> for search handler :text/plain".
>>>
>>> So, does anyone know how to do a search that uses MoreLikeThis with
>>> externally supplied text, and facets at the same time?
>>>
>>> Regards
>>>
>>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev

Re: MoreLikeThis with externally supplied text, and facets?

Posted by Mikhail Khludnev <mk...@apache.org>.
Hold on. JSON query DSL lets you pass quite long content via body. It
should support {!mlt}. At least it's worth a try.!

On Thu, Sep 8, 2022 at 2:53 PM Mikhail Khludnev <mk...@apache.org> wrote:

> Hello Batanun
> I checked {!mlt} source code. It can't swallow external content. I've
> found that Lucene XML parser
> https://lucene.apache.org/core/9_1_0/queryparser/org/apache/lucene/queryparser/xml/CorePlusQueriesParser.html
> is capable to handle <LikeThisQuery>. However, it's diverged and not
> available in Solr out-of-the-box
> https://solr.apache.org/guide/8_8/other-parsers.html#xml-query-parser
>
>
> On Thu, Sep 8, 2022 at 1:16 PM Batanun B <ba...@hotmail.com> wrote:
>
>> Hi,
>>
>> I'm evaluating if the MoreLikeThis (mlt) feature of solr can be useful
>> for our editors when they are creating new content. We want to trigger this
>> before the content has been inserted in the system, so there is no document
>> in solr that we can use as a base for the mlt search. So we want to use the
>> "externally supplied text" feature, where we provide the article text in
>> the request body of the search. This works great when we use the mlt
>> request handler (/mlt). But we also would like to get facets for this
>> search, and bug SOLR-7883 is stopping us from doing that.
>>
>> Some people suggest that we use the mlt query parser instead, as part of
>> our regular request parser (/select). But I can't get that to work together
>> with the "externally supplied text". It gives me the error "Bad contentType
>> for search handler :text/plain".
>>
>> So, does anyone know how to do a search that uses MoreLikeThis with
>> externally supplied text, and facets at the same time?
>>
>> Regards
>>
>>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Sincerely yours
Mikhail Khludnev

Re: MoreLikeThis with externally supplied text, and facets?

Posted by Mikhail Khludnev <mk...@apache.org>.
Hello Batanun
I checked {!mlt} source code. It can't swallow external content. I've found
that Lucene XML parser
https://lucene.apache.org/core/9_1_0/queryparser/org/apache/lucene/queryparser/xml/CorePlusQueriesParser.html
is capable to handle <LikeThisQuery>. However, it's diverged and not
available in Solr out-of-the-box
https://solr.apache.org/guide/8_8/other-parsers.html#xml-query-parser


On Thu, Sep 8, 2022 at 1:16 PM Batanun B <ba...@hotmail.com> wrote:

> Hi,
>
> I'm evaluating if the MoreLikeThis (mlt) feature of solr can be useful for
> our editors when they are creating new content. We want to trigger this
> before the content has been inserted in the system, so there is no document
> in solr that we can use as a base for the mlt search. So we want to use the
> "externally supplied text" feature, where we provide the article text in
> the request body of the search. This works great when we use the mlt
> request handler (/mlt). But we also would like to get facets for this
> search, and bug SOLR-7883 is stopping us from doing that.
>
> Some people suggest that we use the mlt query parser instead, as part of
> our regular request parser (/select). But I can't get that to work together
> with the "externally supplied text". It gives me the error "Bad contentType
> for search handler :text/plain".
>
> So, does anyone know how to do a search that uses MoreLikeThis with
> externally supplied text, and facets at the same time?
>
> Regards
>
>

-- 
Sincerely yours
Mikhail Khludnev