You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Tealdi Paolo <pa...@polito.it> on 2022/03/31 10:51:32 UTC
DIH and import from other core
Hi all,
I'm searching for alternative to DIH functionality for record ingestion from one core to another. It's very useful, simple and quick function to check for new configurations.
The new external plugin seems to support only database connection.
Any hints ?
Best regards,
Paolo Tealdi
Ing. Paolo Tealdi Area IT - Politecnico Torino
Telefono/Phone : +39-011-0906714 , FAX : +39-011-0906625
Indirizzo/Address : C.so Duca degli Abruzzi, 24 - 10129 Torino - ITALY Skype : tealdi.paolo
Please consider your environmental responsibility before printing this e-mail
Re: DIH and import from other core
Posted by Dominique Bejean <do...@eolya.fr>.
Hi,
I suggest to take a look to Apache Nifi, a great multi-threaded ETL
including plugin in order to read from or write to Solr.
Dominique
Le jeu. 31 mars 2022 à 13:12, Tealdi Paolo <pa...@polito.it> a
écrit :
> Hi all,
>
> I'm searching for alternative to DIH functionality for record ingestion
> from one core to another. It's very useful, simple and quick function to
> check for new configurations.
> The new external plugin seems to support only database connection.
> Any hints ?
>
> Best regards,
> Paolo Tealdi
>
> Ing. Paolo Tealdi
> Area IT
> - Politecnico Torino
> Telefono/Phone : +39-011-0906714 , FAX : +39-011-0906625
> Indirizzo/Address : C.so Duca degli Abruzzi, 24 - 10129 Torino - ITALY
> Skype : tealdi.paolo
> Please consider your environmental responsibility before printing this
> e-mail
>
>
Re: DIH and import from other core
Posted by dmitri maziuk <dm...@gmail.com>.
On 2022-03-31 11:11 AM, matthew sporleder wrote:
>
> commitWithin will help you a ton, even a short one
Yeah. It's a small table and a "playground" Solr index so I haven't
bothered tuning it... yet.
Dima
Re: DIH and import from other core
Posted by matthew sporleder <ms...@gmail.com>.
> On Mar 31, 2022, at 12:05 PM, dmitri maziuk <dm...@gmail.com> wrote:
>
> On 2022-03-31 9:29 AM, Tealdi Paolo wrote:
>> Hi Eric
>> Many thanks for the answer.
>> I noticed that reindexcollection seems to be SLOWER than DIH import.
>
> (Warning: there be python there)
>
> This is trimmed down from a working script: https://gist.github.com/dmaziuk/57b9c1926578bc10f12c0999c4b7ab53
>
> It is slower than DIH. It commits every document, that's likely part of it. I think in your case, if both cores reside on the same server, you will have contention and extra slow-down from that -- compared to pulling from one server and pushing to another. So I wouldn't expect it to be blazing fast.
>
> The part where it pulls IDs from Solr is trivially modified to pull whole records from your source index, so if you can write python, you can adjust it for your use and see how it goes.
>
> Dima
commitWithin will help you a ton, even a short one
Re: R: DIH and import from other core
Posted by dmitri maziuk <dm...@gmail.com>.
On 2022-03-31 11:17 AM, Thomas Corthals wrote:
> You can speed that up significantly by sending multiple documents in the
> same request and only committing once:
> https://web.archive.org/web/20170418205443/http://www.raspberry.nl/2011/04/08/solr-update-performance/
Yes: batching multiple documents into a single POST + commit will speed
it up *a lot*. You'd need to check max_post_size for your server to be
on the safe side, but I'd expect a couple of gig to be OK these days.
Dima
Re: R: DIH and import from other core
Posted by Thomas Corthals <th...@klascement.net>.
Op do 31 mrt. 2022 om 18:05 schreef dmitri maziuk <dm...@gmail.com>:
> On 2022-03-31 9:29 AM, Tealdi Paolo wrote:
> > Hi Eric
> >
> > Many thanks for the answer.
> > I noticed that reindexcollection seems to be SLOWER than DIH import.
>
> (Warning: there be python there)
>
> This is trimmed down from a working script:
> https://gist.github.com/dmaziuk/57b9c1926578bc10f12c0999c4b7ab53
>
> It is slower than DIH. It commits every document, that's likely part of
> it. I think in your case, if both cores reside on the same server, you
> will have contention and extra slow-down from that -- compared to
> pulling from one server and pushing to another. So I wouldn't expect it
> to be blazing fast.
>
> The part where it pulls IDs from Solr is trivially modified to pull
> whole records from your source index, so if you can write python, you
> can adjust it for your use and see how it goes.
>
> Dima
>
You can speed that up significantly by sending multiple documents in the
same request and only committing once:
https://web.archive.org/web/20170418205443/http://www.raspberry.nl/2011/04/08/solr-update-performance/
If you prefer elephants over pythons, have a look at Solarium's BufferedAdd
plugin that does just that:
https://solarium.readthedocs.io/en/latest/plugins/#bufferedadd-plugin
Thomas
Re: R: DIH and import from other core
Posted by dmitri maziuk <dm...@gmail.com>.
On 2022-03-31 9:29 AM, Tealdi Paolo wrote:
> Hi Eric
>
> Many thanks for the answer.
> I noticed that reindexcollection seems to be SLOWER than DIH import.
(Warning: there be python there)
This is trimmed down from a working script:
https://gist.github.com/dmaziuk/57b9c1926578bc10f12c0999c4b7ab53
It is slower than DIH. It commits every document, that's likely part of
it. I think in your case, if both cores reside on the same server, you
will have contention and extra slow-down from that -- compared to
pulling from one server and pushing to another. So I wouldn't expect it
to be blazing fast.
The part where it pulls IDs from Solr is trivially modified to pull
whole records from your source index, so if you can write python, you
can adjust it for your use and see how it goes.
Dima
R: DIH and import from other core
Posted by Tealdi Paolo <pa...@polito.it>.
Hi Eric
Many thanks for the answer.
I noticed that reindexcollection seems to be SLOWER than DIH import.
Best regards,
Paolo Tealdi
Ing. Paolo Tealdi Area IT - Politecnico Torino
Telefono/Phone : +39-011-0906714 , FAX : +39-011-0906625
Indirizzo/Address : C.so Duca degli Abruzzi, 24 - 10129 Torino - ITALY Skype : tealdi.paolo
Please consider your environmental responsibility before printing this e-mail
| -----Messaggio originale-----
| Da: Eric Pugh <ep...@opensourceconnections.com>
| Inviato: giovedì 31 marzo 2022 13:00
| A: users@solr.apache.org
| Oggetto: Re: DIH and import from other core
|
| I’ve used thehttps://solr.apache.org/guide/8_1/collections-
| api.html#reindexcollection
| <thehttps://solr.apache.org/guide/8_1/collections-
| api.html#reindexcollection> command for testing new configurations…
|
| It uses Solr Streaming under the covers:
| https://solr.apache.org/guide/8_1/streaming-expressions.html
| <https://solr.apache.org/guide/8_1/streaming-expressions.html>
|
|
|
| > On Mar 31, 2022, at 6:51 AM, Tealdi Paolo <pa...@polito.it> wrote:
| >
| > Hi all,
| >
| > I'm searching for alternative to DIH functionality for record ingestion from
| one core to another. It's very useful, simple and quick function to check for
| new configurations.
| > The new external plugin seems to support only database connection.
| > Any hints ?
| >
| > Best regards,
| > Paolo Tealdi
| >
| > Ing. Paolo Tealdi
| Area IT - Politecnico Torino
| > Telefono/Phone : +39-011-0906714 , FAX : +39-011-0906625
| > Indirizzo/Address : C.so Duca degli Abruzzi, 24 - 10129 Torino - ITALY
| Skype : tealdi.paolo
| > Please consider your environmental responsibility before printing this e-
| mail
| >
|
| _______________________
| Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
| http://www.opensourceconnections.com
| <http://www.opensourceconnections.com/> | My Free/Busy
| <http://tinyurl.com/eric-cal>
| Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
| <https://www.packtpub.com/big-data-and-business-intelligence/apache-
| solr-enterprise-search-server-third-edition-raw>
| This e-mail and all contents, including attachments, is considered to be
| Company Confidential unless explicitly stated otherwise, regardless of
| whether attachments are marked as such.
Re: DIH and import from other core
Posted by Eric Pugh <ep...@opensourceconnections.com>.
I’ve used thehttps://solr.apache.org/guide/8_1/collections-api.html#reindexcollection <thehttps://solr.apache.org/guide/8_1/collections-api.html#reindexcollection> command for testing new configurations…
It uses Solr Streaming under the covers: https://solr.apache.org/guide/8_1/streaming-expressions.html <https://solr.apache.org/guide/8_1/streaming-expressions.html>
> On Mar 31, 2022, at 6:51 AM, Tealdi Paolo <pa...@polito.it> wrote:
>
> Hi all,
>
> I'm searching for alternative to DIH functionality for record ingestion from one core to another. It's very useful, simple and quick function to check for new configurations.
> The new external plugin seems to support only database connection.
> Any hints ?
>
> Best regards,
> Paolo Tealdi
>
> Ing. Paolo Tealdi Area IT - Politecnico Torino
> Telefono/Phone : +39-011-0906714 , FAX : +39-011-0906625
> Indirizzo/Address : C.so Duca degli Abruzzi, 24 - 10129 Torino - ITALY Skype : tealdi.paolo
> Please consider your environmental responsibility before printing this e-mail
>
_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.