You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Anuj Bhargava <an...@gmail.com> on 2021/03/06 11:38:53 UTC

Importing Data - Solr 8.0

I am having some problem importing data via dataimporter. I have a database
(data_archive) containing records for the last 9 years. I am unable to
import all the data via dataimporter. The solr stops. However, I can import
data for the last 12 months and it takes around 20 minutes. How can I add
all the data from the data_archive database and then just add new records
that get added everyday in the data_archive database.

The script I am using is -

<entity name="data_archive" dataSource="ds8" pk="posting_id"
      query="SELECT * FROM data_archive WHERE doc_date BETWEEN (CURDATE() -
INTERVAL 12 MONTH) AND CURDATE()"
      deltaImportQuery="SELECT * FROM data_archive
        WHERE ID = '${dataimporter.delta.posting_id}' AND doc_date BETWEEN
(CURDATE() - INTERVAL 12 MONTH) AND CURDATE()"
      deltaQuery="SELECT posting_id FROM data_archive
        WHERE last_modified > '${dataimporter.last_index_time}' AND
doc_date BETWEEN (CURDATE() - INTERVAL 12 MONTH) AND CURDATE()">
    </entity>

Can someone please modify the above script.

Regards,

Anuj

Re: Importing Data - Solr 8.0

Posted by Jörn Franke <jo...@gmail.com>.
You should not use the dataimporter. It is deprecated and will be removed soon. Please use an external script that reads from the database and then pushes the data in batches to Solr via the Services Solr provided. Alternatively you can use Apache ManifoldCF, Logstash or any other open source / commercial solution.

You will find any error messages in the Solr log.

> Am 06.03.2021 um 12:39 schrieb Anuj Bhargava <an...@gmail.com>:
> 
> I am having some problem importing data via dataimporter. I have a database
> (data_archive) containing records for the last 9 years. I am unable to
> import all the data via dataimporter. The solr stops. However, I can import
> data for the last 12 months and it takes around 20 minutes. How can I add
> all the data from the data_archive database and then just add new records
> that get added everyday in the data_archive database.
> 
> The script I am using is -
> 
> <entity name="data_archive" dataSource="ds8" pk="posting_id"
>      query="SELECT * FROM data_archive WHERE doc_date BETWEEN (CURDATE() -
> INTERVAL 12 MONTH) AND CURDATE()"
>      deltaImportQuery="SELECT * FROM data_archive
>        WHERE ID = '${dataimporter.delta.posting_id}' AND doc_date BETWEEN
> (CURDATE() - INTERVAL 12 MONTH) AND CURDATE()"
>      deltaQuery="SELECT posting_id FROM data_archive
>        WHERE last_modified > '${dataimporter.last_index_time}' AND
> doc_date BETWEEN (CURDATE() - INTERVAL 12 MONTH) AND CURDATE()">
>    </entity>
> 
> Can someone please modify the above script.
> 
> Regards,
> 
> Anuj

Re: Importing Data - Solr 8.0

Posted by dmitri maziuk <dm...@gmail.com>.
On 2021-03-06 5:38 AM, Anuj Bhargava wrote:

> How can I add
> all the data from the data_archive database and then just add new records
> that get added everyday in the data_archive database.

In a nustshell,

>        query="SELECT * FROM data_archive"

to import everything and

>        deltaImportQuery="SELECT * FROM data_archive
>          WHERE posting_id = '${dataimporter.delta.posting_id}'"

to import all "deltas".

To find the deltas, the docs example is

>        deltaQuery="SELECT posting_id FROM data_archive
>          WHERE last_modified > '${dataimporter.last_index_time}'"

Note the key: posting_id in the last 2 queries, and of course you must 
have the last_modified column and have it populated properly for the 
deltaQuery to return something useful.

Dima

Re: Importing Data - Solr 8.0

Posted by Alexander Aristov <al...@gmail.com>.
Blind guess... You have OOM as you don't split to batches and want to
commit in one shot.

сб, 6 мар. 2021 г., 17:28 Shawn Heisey <ap...@elyograg.org>:

> On 3/6/2021 4:38 AM, Anuj Bhargava wrote:
> > I am having some problem importing data via dataimporter. I have a
> database
> > (data_archive) containing records for the last 9 years.
>
> What database software is that connecting to?  I'm assuming you get an
> error in the Solr log.  Can you share the entire text of that error?  It
> will be dozens of lines in length.
>
> Thanks,
> Shawn
>

Re: Importing Data - Solr 8.0

Posted by Shawn Heisey <ap...@elyograg.org>.
On 3/6/2021 4:38 AM, Anuj Bhargava wrote:
> I am having some problem importing data via dataimporter. I have a database
> (data_archive) containing records for the last 9 years.

What database software is that connecting to?  I'm assuming you get an 
error in the Solr log.  Can you share the entire text of that error?  It 
will be dozens of lines in length.

Thanks,
Shawn