You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Tealdi Paolo <pa...@polito.it> on 2022/03/31 10:51:32 UTC

DIH and import from other core

Hi all,

I'm searching for alternative to DIH functionality for record ingestion from one core to another. It's very useful, simple and quick function to check for new configurations.
The new external plugin seems to support only database connection.
Any hints ?

Best regards,
Paolo Tealdi

Ing. Paolo Tealdi                                                                                                                             Area IT - Politecnico Torino
Telefono/Phone : +39-011-0906714 , FAX : +39-011-0906625
Indirizzo/Address : C.so Duca degli Abruzzi,  24 - 10129 Torino - ITALY                     Skype : tealdi.paolo
Please consider your environmental responsibility before printing this e-mail


Re: DIH and import from other core

Posted by Dominique Bejean <do...@eolya.fr>.
Hi,

I suggest to take a look to Apache Nifi, a great multi-threaded ETL
including plugin in order to read from or write to Solr.

Dominique

Le jeu. 31 mars 2022 à 13:12, Tealdi Paolo <pa...@polito.it> a
écrit :

> Hi all,
>
> I'm searching for alternative to DIH functionality for record ingestion
> from one core to another. It's very useful, simple and quick function to
> check for new configurations.
> The new external plugin seems to support only database connection.
> Any hints ?
>
> Best regards,
> Paolo Tealdi
>
> Ing. Paolo Tealdi
>                                                                    Area IT
> - Politecnico Torino
> Telefono/Phone : +39-011-0906714 , FAX : +39-011-0906625
> Indirizzo/Address : C.so Duca degli Abruzzi,  24 - 10129 Torino - ITALY
>                  Skype : tealdi.paolo
> Please consider your environmental responsibility before printing this
> e-mail
>
>

Re: DIH and import from other core

Posted by dmitri maziuk <dm...@gmail.com>.
On 2022-03-31 11:11 AM, matthew sporleder wrote:
> 

> commitWithin will help you a ton, even a short one

Yeah. It's a small table and a "playground" Solr index so I haven't 
bothered tuning it... yet.

Dima

Re: DIH and import from other core

Posted by matthew sporleder <ms...@gmail.com>.

> On Mar 31, 2022, at 12:05 PM, dmitri maziuk <dm...@gmail.com> wrote:
> 
> On 2022-03-31 9:29 AM, Tealdi Paolo wrote:
>> Hi Eric
>> Many thanks for the answer.
>> I noticed that reindexcollection seems to be SLOWER than DIH import.
> 
> (Warning: there be python there)
> 
> This is trimmed down from a working script: https://gist.github.com/dmaziuk/57b9c1926578bc10f12c0999c4b7ab53
> 
> It is slower than DIH. It commits every document, that's likely part of it. I think in your case, if both cores reside on the same server, you will have contention and extra slow-down from that -- compared to pulling from one server and pushing to another. So I wouldn't expect it to be blazing fast.
> 
> The part where it pulls IDs from Solr is trivially modified to pull whole records from your source index, so if you can write python, you can adjust it for your use and see how it goes.
> 
> Dima

commitWithin will help you a ton, even a short one

Re: R: DIH and import from other core

Posted by dmitri maziuk <dm...@gmail.com>.
On 2022-03-31 11:17 AM, Thomas Corthals wrote:

> You can speed that up significantly by sending multiple documents in the
> same request and only committing once:
> https://web.archive.org/web/20170418205443/http://www.raspberry.nl/2011/04/08/solr-update-performance/

Yes: batching multiple documents into a single POST + commit will speed 
it up *a lot*. You'd need to check max_post_size for your server to be 
on the safe side, but I'd expect a couple of gig to be OK these days.

Dima

Re: R: DIH and import from other core

Posted by Thomas Corthals <th...@klascement.net>.
Op do 31 mrt. 2022 om 18:05 schreef dmitri maziuk <dm...@gmail.com>:

> On 2022-03-31 9:29 AM, Tealdi Paolo wrote:
> > Hi Eric
> >
> > Many thanks for the answer.
> > I noticed that reindexcollection seems to be SLOWER than DIH import.
>
> (Warning: there be python there)
>
> This is trimmed down from a working script:
> https://gist.github.com/dmaziuk/57b9c1926578bc10f12c0999c4b7ab53
>
> It is slower than DIH. It commits every document, that's likely part of
> it. I think in your case, if both cores reside on the same server, you
> will have contention and extra slow-down from that -- compared to
> pulling from one server and pushing to another. So I wouldn't expect it
> to be blazing fast.
>
> The part where it pulls IDs from Solr is trivially modified to pull
> whole records from your source index, so if you can write python, you
> can adjust it for your use and see how it goes.
>
> Dima
>

You can speed that up significantly by sending multiple documents in the
same request and only committing once:
https://web.archive.org/web/20170418205443/http://www.raspberry.nl/2011/04/08/solr-update-performance/

If you prefer elephants over pythons, have a look at Solarium's BufferedAdd
plugin that does just that:
https://solarium.readthedocs.io/en/latest/plugins/#bufferedadd-plugin

Thomas

Re: R: DIH and import from other core

Posted by dmitri maziuk <dm...@gmail.com>.
On 2022-03-31 9:29 AM, Tealdi Paolo wrote:
> Hi Eric
> 
> Many thanks for the answer.
> I noticed that reindexcollection seems to be SLOWER than DIH import.

(Warning: there be python there)

This is trimmed down from a working script: 
https://gist.github.com/dmaziuk/57b9c1926578bc10f12c0999c4b7ab53

It is slower than DIH. It commits every document, that's likely part of 
it. I think in your case, if both cores reside on the same server, you 
will have contention and extra slow-down from that -- compared to 
pulling from one server and pushing to another. So I wouldn't expect it 
to be blazing fast.

The part where it pulls IDs from Solr is trivially modified to pull 
whole records from your source index, so if you can write python, you 
can adjust it for your use and see how it goes.

Dima

R: DIH and import from other core

Posted by Tealdi Paolo <pa...@polito.it>.
Hi Eric

Many thanks for the answer.
I noticed that reindexcollection seems to be SLOWER than DIH import.

Best regards,
Paolo Tealdi

Ing. Paolo Tealdi                                                                                                                             Area IT - Politecnico Torino             
Telefono/Phone : +39-011-0906714 , FAX : +39-011-0906625
Indirizzo/Address : C.so Duca degli Abruzzi,  24 - 10129 Torino - ITALY                     Skype : tealdi.paolo
Please consider your environmental responsibility before printing this e-mail


| -----Messaggio originale-----
| Da: Eric Pugh <ep...@opensourceconnections.com>
| Inviato: giovedì 31 marzo 2022 13:00
| A: users@solr.apache.org
| Oggetto: Re: DIH and import from other core
| 
| I’ve used thehttps://solr.apache.org/guide/8_1/collections-
| api.html#reindexcollection
| <thehttps://solr.apache.org/guide/8_1/collections-
| api.html#reindexcollection> command for testing new configurations…
| 
| It uses Solr Streaming under the covers:
| https://solr.apache.org/guide/8_1/streaming-expressions.html
| <https://solr.apache.org/guide/8_1/streaming-expressions.html>
| 
| 
| 
| > On Mar 31, 2022, at 6:51 AM, Tealdi Paolo <pa...@polito.it> wrote:
| >
| > Hi all,
| >
| > I'm searching for alternative to DIH functionality for record ingestion from
| one core to another. It's very useful, simple and quick function to check for
| new configurations.
| > The new external plugin seems to support only database connection.
| > Any hints ?
| >
| > Best regards,
| > Paolo Tealdi
| >
| > Ing. Paolo Tealdi
| Area IT - Politecnico Torino
| > Telefono/Phone : +39-011-0906714 , FAX : +39-011-0906625
| > Indirizzo/Address : C.so Duca degli Abruzzi,  24 - 10129 Torino - ITALY
| Skype : tealdi.paolo
| > Please consider your environmental responsibility before printing this e-
| mail
| >
| 
| _______________________
| Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
| http://www.opensourceconnections.com
| <http://www.opensourceconnections.com/> | My Free/Busy
| <http://tinyurl.com/eric-cal>
| Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
| <https://www.packtpub.com/big-data-and-business-intelligence/apache-
| solr-enterprise-search-server-third-edition-raw>
| This e-mail and all contents, including attachments, is considered to be
| Company Confidential unless explicitly stated otherwise, regardless of
| whether attachments are marked as such.


Re: DIH and import from other core

Posted by Eric Pugh <ep...@opensourceconnections.com>.
I’ve used thehttps://solr.apache.org/guide/8_1/collections-api.html#reindexcollection <thehttps://solr.apache.org/guide/8_1/collections-api.html#reindexcollection> command for testing new configurations…

It uses Solr Streaming under the covers:  https://solr.apache.org/guide/8_1/streaming-expressions.html <https://solr.apache.org/guide/8_1/streaming-expressions.html>



> On Mar 31, 2022, at 6:51 AM, Tealdi Paolo <pa...@polito.it> wrote:
> 
> Hi all,
> 
> I'm searching for alternative to DIH functionality for record ingestion from one core to another. It's very useful, simple and quick function to check for new configurations.
> The new external plugin seems to support only database connection.
> Any hints ?
> 
> Best regards,
> Paolo Tealdi
> 
> Ing. Paolo Tealdi                                                                                                                             Area IT - Politecnico Torino
> Telefono/Phone : +39-011-0906714 , FAX : +39-011-0906625
> Indirizzo/Address : C.so Duca degli Abruzzi,  24 - 10129 Torino - ITALY                     Skype : tealdi.paolo
> Please consider your environmental responsibility before printing this e-mail
> 

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>	
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.