You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Michał Świątkowski <mi...@stopzone.pl.INVALID> on 2022/07/04 11:25:03 UTC

Data Import Handler problem in Solr 8

Hello All,

I'm using Solr-8.11.1 in cloud mode with OpenJDK8. From Solr-8 I noticed 
that full-update with option 'optimize' (e.g. 
/dataimport?command=full-import&clean=true&commit=true&wt=json&*optimize=true*) 
removes all collection data and during the update collection is empty. 
Does anyone know is this is a bug or new feature?

Thanks,
Michal

Re: Data Import Handler problem in Solr 8

Posted by Shawn Heisey <ap...@elyograg.org>.
On 7/6/22 04:32, Michał Świątkowski wrote:
> I checked that and collection data will be erased only when I will use 
> clean=true and optimize=true (first query).
>
> 1. clean=true ; optimize=true
> webapp=/solr path=/dataimport 
> params={core=example_collection&optimize=true&indent=on&commit=true&name=dataimport&clean=true&wt=json&command=full-import&_=1657098443936&verbose=true} 
> status=0 QTime=5
>
> 2. clean=true
> webapp=/solr path=/dataimport 
> params={core=example_collection&indent=on&commit=true&name=dataimport&clean=true&wt=json&command=full-import&_=1657098443936&verbose=true} 
> status=0 QTime=4
>
> 3. clean=false ; optimize=true
> webapp=/solr path=/dataimport 
> params={core=example_collection&optimize=true&indent=on&commit=true&name=dataimport&clean=false&wt=json&command=full-import&_=1657098443936&verbose=true} 
> status=0 QTime=5

If you send clean=true then DIH should wipe the index data before it 
begins importing.  If you set optimize=true, then Solr should optimize 
the index AFTER the import is done.  It is very odd to have it behave 
differently when the combination of parameters is used ... maybe when 
both parameters are true, DIH is doing a commit BEFORE importing begins, 
and without that combo, the commit doesn't happen, and a commit is only 
done after the import.

It might be better to set commit and optimize to false, and manually do 
those operations yourself after importing completes. Just an FYI ... 
optimizing is generally not recommended because of how long it can take 
and the fact that it uses a lot of system resources.

Note that in Solr 9.x DIH is no longer present.  This is because the 
feature has some problems, especially in cloud mode.  You seem to have 
stumbled onto one of the many bugs DIH has.

You may have greater luck with the separate version of DIH:

https://github.com/rohitbemax/dataimporthandler

You can also do the import with a new collection and then update an 
alias to point the "true" collection name to the new one after indexing 
is complete.  This is a good paradigm to use in general.

Thanks,
Shawn


Re: Data Import Handler problem in Solr 8

Posted by Michał Świątkowski <mi...@stopzone.pl.INVALID>.
Hi,

I checked that and collection data will be erased only when I will use 
clean=true and optimize=true (first query).

1. clean=true ; optimize=true
webapp=/solr path=/dataimport 
params={core=example_collection&optimize=true&indent=on&commit=true&name=dataimport&clean=true&wt=json&command=full-import&_=1657098443936&verbose=true} 
status=0 QTime=5

2. clean=true
webapp=/solr path=/dataimport 
params={core=example_collection&indent=on&commit=true&name=dataimport&clean=true&wt=json&command=full-import&_=1657098443936&verbose=true} 
status=0 QTime=4

3. clean=false ; optimize=true
webapp=/solr path=/dataimport 
params={core=example_collection&optimize=true&indent=on&commit=true&name=dataimport&clean=false&wt=json&command=full-import&_=1657098443936&verbose=true} 
status=0 QTime=5

Does this should work like that? I know that in Solr 7.1 that situation 
never happened.

Best Regards,
Michal

On 7/4/22 14:23, Mikhail Khludnev wrote:
> Hello, Michal.
> I don't think so. I'd rather bark on clean=true. I suppose you can find a
> detailed answer in the log.
>
> On Mon, Jul 4, 2022 at 2:51 PM Michał Świątkowski
> <mi...@stopzone.pl.invalid> wrote:
>
>> Hello All,
>>
>> I'm using Solr-8.11.1 in cloud mode with OpenJDK8. From Solr-8 I noticed
>> that full-update with option 'optimize' (e.g.
>> /dataimport?command=full-import&clean=true&commit=true&wt=json&*optimize=true*)
>>
>> removes all collection data and during the update collection is empty.
>> Does anyone know is this is a bug or new feature?
>>
>> Thanks,
>> Michal
>>
>

Re: Data Import Handler problem in Solr 8

Posted by Mikhail Khludnev <mk...@apache.org>.
Hello, Michal.
I don't think so. I'd rather bark on clean=true. I suppose you can find a
detailed answer in the log.

On Mon, Jul 4, 2022 at 2:51 PM Michał Świątkowski
<mi...@stopzone.pl.invalid> wrote:

> Hello All,
>
> I'm using Solr-8.11.1 in cloud mode with OpenJDK8. From Solr-8 I noticed
> that full-update with option 'optimize' (e.g.
> /dataimport?command=full-import&clean=true&commit=true&wt=json&*optimize=true*)
>
> removes all collection data and during the update collection is empty.
> Does anyone know is this is a bug or new feature?
>
> Thanks,
> Michal
>


-- 
Sincerely yours
Mikhail Khludnev