You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Xavier VILRET (JIRA)" <ji...@apache.org> on 2017/05/19 07:25:04 UTC

[jira] [Comment Edited] (SOLR-7952) Change DeltaImport from HashSet to LinkedHashSet.

    [ https://issues.apache.org/jira/browse/SOLR-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16017011#comment-16017011 ] 

Xavier VILRET edited comment on SOLR-7952 at 5/19/17 7:24 AM:
--------------------------------------------------------------

When joining parent and children entities with join="zipper", this is actually required that SOLR maintains the order of the PKs.
Currently, this is not the case as HashSet and HashMap are used.

In SOLR 5.1, in org.apache.solr.handler.dataimport.DocBuilder, in collectDelta method, two key collections needs to be modified in order to retain the insertion order:

line 783: Set<Map<String, Object>> myModifiedPks = new LinkedHashSet<>();
line 794: Map<String, Map<String, Object>> deltaSet = new LinkedHashMap<>();

I checked the code in 6.5.1 and the bug is still there although line numbers have changed to lines 785 and 798 respectively.
I didn't check but I guess all the SOLR releases in between are impacted as well.

This is an easy fix but definitely not minor as delta-import seems broken when using join="zipper".


was (Author: witchking):
When joining parent and children entities with join="zipper", this is actually required that SOLR maintains the order of the PKs.
Currently, this is not the case as HashSet and HashMap are used.

In SOLR 5.1, in org.apache.solr.handler.dataimport.DocBuilder, in collectDelta method, two key collections needs to be modified in order to retain the insertion order:

line 783: Set<Map<String, Object>> myModifiedPks = new LinkedHashSet<>();
line 794: Map<String, Map<String, Object>> deltaSet = new LinkedHashMap<>();

I checked the code in 6.5.1 and the bug is still there although line numbers have changed to lines 785 and 798 respectively.
I didn't check but I guess all the SOLR releases in between are impacted as well.

> Change DeltaImport from HashSet to LinkedHashSet.
> -------------------------------------------------
>
>                 Key: SOLR-7952
>                 URL: https://issues.apache.org/jira/browse/SOLR-7952
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 5.2.1
>            Reporter: Pablo Lozano
>            Priority: Minor
>              Labels: easyfix
>
> This is only a minor modification which on some cases might be useful for certain custom DataSources or ImportHandlers.
> The way my imports work is by fetching on batches, So I need to store those batches on a disk cache for a certain time as they are not required on the mean time.
> And also use some lazy loading as my batches are not initialized by my custom iterators until the time they are iterated for the first time,
> My issue comes from that the order in which I pass the ids of my documents to the ImporHandler during the "FIND_DELTA" step is not the same order they are tried to be fetch during the DELTA_DUMP step. It causes my batches to be initialized when only one of them could be done at a time.
> What I would like is to simply change the HashSet used on the "collectDelta" method to a LinkedHashSet. This would help as we would obtain a predictable order of documents.
> This may be a very specific case but the change is simple and shouldn't impact on anything.
> The second option would be to create a "deltaImportQuery" like that would work like:" select * from table where last_modified &gt; '${dih.last_index_time}'".
> I can issue the patch for this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org