You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by Julien Massiera <ju...@francelabs.com> on 2022/12/01 14:34:26 UTC

Re: Solr 9.x output connector

Hi Karl,

I did a quick alpha version of a Solr 9 connector to test: I can confirm 
that it works with older Solr versions !

HOWEVER, in SolrJ 9, the new Solr client has been reimplemented: it now 
prevents to easily customize the httpClient and the way it performs 
requests. This makes it very challenging - at least for me - to port all 
of the custom code concerning the multipart post requests, as well as 
the basic and preemptive auth of the current Solr connector! Who knows, 
maybe with this new SolrJ client, those custom codes have become useless 
and now the multipart/basic/preemptive auth work OOTB... Unfortunatly, I 
don't have time to test whether those functionalities work OOTB, not to 
mention that I don't have a test environment to give it a try. Maybe the 
MCF committers of these solr related updates could give it a look if I 
commit a final version of the connector on a dedicated branch ?

Julien

On 29/11/2022 22:35, Karl Wright wrote:
> Hi Julien,
>
> Sorry for the delay; I've been under intense pressure at work of late and
> just saw this email now.
>
> Regarding library updates: we should generally go ahead and do those
> FIRST.  There are custom fixes for httpclient checked into the ManifoldCF
> code base so we may need to work a little to get those to build properly.
> But I'm reasonably sure it can be done.  Libraries are backwards compatible
> at the minor version level so all is good there.  When somebody wants to go
> to HttpClient 5, though, we are in trouble.
>
> AFTER that is done we should evaluate whether the 9.x Solr library is
> backwards compatible enough with 8.x to work.  We had to do very little to
> go from 7.x to 8.x, so unless the Solr people suddenly changed their
> philosophy dramatically, it should be possible to do this too.  But we will
> see.
>
> Karl
>
>
> On Tue, Nov 29, 2022 at 9:59 AM Julien Massiera <
> julien.massiera@francelabs.com> wrote:
>
>> Hi Karl,
>>
>> the Solr output connector does not seem to work with Solr 9.x according
>> to our tests. We are going to either update or develop a new connector
>> but there is a problem concerning the libraries required. A solr 9.x
>> connector will of course involve a solrj 9.x lib but also the update of
>> the following libs in MCF:
>>
>> - zookeeper from 3.4.10 to >= 3.7.0 (current 3.8.0)
>> - httpcomponent.httpclient.version from 4.5.3 to 4.5.13
>> - httpcomponent.httpcore.version from 4.4.6 to 4.4.15
>> - httpcomponent.httpmime.version from 4.5.3 to 4.5.13
>>
>> Those updates should not cause problems to other connectors in MCF, the
>> real problem here concerns the current Solr connector as I am not sure
>> that an updated version would be compatible with a Solr < 9.x.
>> There is also the modified solr clients using the custom multi-parts
>> http post methods that will cause some troubles to be ported on Solrj 9
>> .x according to me.
>>
>> If I am not wrong, historically those custom clients were developed to
>> avoid errors with the embedded Tika of Solr for some documents. But
>> IMHO, it has become a challenge that is not worth the effort: the way to
>> go should be to have the documents processed by Tika BEFORE the Solr
>> indexation. Not to mention that the tika embedded in Solr is too old
>> (1.28.1) and will most certainly be removed someday (as stated in this
>> tickethttps://issues.apache.org/jira/browse/SOLR-13973). Thus, I think
>> it is not worth it to port the custom solr clients in the new connector.
>> This would ease the creation of the Solr 9 output connector.
>>
>> Whatever happens, if we want to maintain output connectors for different
>> versions of Solr, and IF the Solr 9 output connector is not compatible
>> with previous versions of Solr (still needs to be checked), we'll end up
>> with several versions of the libs in ManifoldCF. To be honest, I do not
>> see a proper way to deal with the libs conflicts between the two
>> connectors...
>>
>> What do you think ?
>>
>> Regards,
>> Julien
>>
-- 
Julien MASSIERA
Directeur développement produit
France Labs – Les experts du Search
Datafari – Vainqueur du trophée Big Data 2018 au Digital Innovation Makers Summit
www.francelabs.com


Re: Solr 9.x output connector

Posted by Karl Wright <da...@gmail.com>.
Feel free to commit your proposed changes to a branch for evaluation.
Karl

On Thu, Dec 1, 2022 at 9:34 AM Julien Massiera <
julien.massiera@francelabs.com> wrote:

> Hi Karl,
>
> I did a quick alpha version of a Solr 9 connector to test: I can confirm
> that it works with older Solr versions !
>
> HOWEVER, in SolrJ 9, the new Solr client has been reimplemented: it now
> prevents to easily customize the httpClient and the way it performs
> requests. This makes it very challenging - at least for me - to port all
> of the custom code concerning the multipart post requests, as well as
> the basic and preemptive auth of the current Solr connector! Who knows,
> maybe with this new SolrJ client, those custom codes have become useless
> and now the multipart/basic/preemptive auth work OOTB... Unfortunatly, I
> don't have time to test whether those functionalities work OOTB, not to
> mention that I don't have a test environment to give it a try. Maybe the
> MCF committers of these solr related updates could give it a look if I
> commit a final version of the connector on a dedicated branch ?
>
> Julien
>
> On 29/11/2022 22:35, Karl Wright wrote:
> > Hi Julien,
> >
> > Sorry for the delay; I've been under intense pressure at work of late and
> > just saw this email now.
> >
> > Regarding library updates: we should generally go ahead and do those
> > FIRST.  There are custom fixes for httpclient checked into the ManifoldCF
> > code base so we may need to work a little to get those to build properly.
> > But I'm reasonably sure it can be done.  Libraries are backwards
> compatible
> > at the minor version level so all is good there.  When somebody wants to
> go
> > to HttpClient 5, though, we are in trouble.
> >
> > AFTER that is done we should evaluate whether the 9.x Solr library is
> > backwards compatible enough with 8.x to work.  We had to do very little
> to
> > go from 7.x to 8.x, so unless the Solr people suddenly changed their
> > philosophy dramatically, it should be possible to do this too.  But we
> will
> > see.
> >
> > Karl
> >
> >
> > On Tue, Nov 29, 2022 at 9:59 AM Julien Massiera <
> > julien.massiera@francelabs.com> wrote:
> >
> >> Hi Karl,
> >>
> >> the Solr output connector does not seem to work with Solr 9.x according
> >> to our tests. We are going to either update or develop a new connector
> >> but there is a problem concerning the libraries required. A solr 9.x
> >> connector will of course involve a solrj 9.x lib but also the update of
> >> the following libs in MCF:
> >>
> >> - zookeeper from 3.4.10 to >= 3.7.0 (current 3.8.0)
> >> - httpcomponent.httpclient.version from 4.5.3 to 4.5.13
> >> - httpcomponent.httpcore.version from 4.4.6 to 4.4.15
> >> - httpcomponent.httpmime.version from 4.5.3 to 4.5.13
> >>
> >> Those updates should not cause problems to other connectors in MCF, the
> >> real problem here concerns the current Solr connector as I am not sure
> >> that an updated version would be compatible with a Solr < 9.x.
> >> There is also the modified solr clients using the custom multi-parts
> >> http post methods that will cause some troubles to be ported on Solrj 9
> >> .x according to me.
> >>
> >> If I am not wrong, historically those custom clients were developed to
> >> avoid errors with the embedded Tika of Solr for some documents. But
> >> IMHO, it has become a challenge that is not worth the effort: the way to
> >> go should be to have the documents processed by Tika BEFORE the Solr
> >> indexation. Not to mention that the tika embedded in Solr is too old
> >> (1.28.1) and will most certainly be removed someday (as stated in this
> >> tickethttps://issues.apache.org/jira/browse/SOLR-13973). Thus, I think
> >> it is not worth it to port the custom solr clients in the new connector.
> >> This would ease the creation of the Solr 9 output connector.
> >>
> >> Whatever happens, if we want to maintain output connectors for different
> >> versions of Solr, and IF the Solr 9 output connector is not compatible
> >> with previous versions of Solr (still needs to be checked), we'll end up
> >> with several versions of the libs in ManifoldCF. To be honest, I do not
> >> see a proper way to deal with the libs conflicts between the two
> >> connectors...
> >>
> >> What do you think ?
> >>
> >> Regards,
> >> Julien
> >>
> --
> Julien MASSIERA
> Directeur développement produit
> France Labs – Les experts du Search
> Datafari – Vainqueur du trophée Big Data 2018 au Digital Innovation Makers
> Summit
> www.francelabs.com
>
>