You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mikhail Khludnev (Created) (JIRA)" <ji...@apache.org> on 2011/12/04 20:38:40 UTC

[jira] [Created] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

DIH caching bug - EntityRunner destroys child entity processor
--------------------------------------------------------------

                 Key: SOLR-2947
                 URL: https://issues.apache.org/jira/browse/SOLR-2947
             Project: Solr
          Issue Type: Sub-task
          Components: contrib - DataImportHandler
    Affects Versions: 4.0
            Reporter: Mikhail Khludnev


[line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entity.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: [jira] [Commented] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

Posted by Erick Erickson <er...@gmail.com>.
James:

One thing to watch out for (it got me and someone else new recently) is that
the 3.x code line needs to be compiled/tested under Java 1.5, not Java 1.6.

I think my first check-in used a 1.6 feature and...er...broke the
build. Uwe kindly
fixed it up for me, so I'm trying to return the favor by keeping him from having
to do it again <G>....

Best
Erick

On Fri, Feb 10, 2012 at 2:04 PM, James Dyer (Commented) (JIRA)
<ji...@apache.org> wrote:
>
>    [ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205644#comment-13205644 ]
>
> James Dyer commented on SOLR-2947:
> ----------------------------------
>
> I not set up yet and not sure how long it'll be.  But I do want to get the DIH bugs taken care of asap.
>
>> DIH caching bug - EntityRunner destroys child entity processor
>> --------------------------------------------------------------
>>            Key: SOLR-2947
>>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>>             Project: Solr
>>          Issue Type: Sub-task
>>          Components: contrib - DataImportHandler
>>    Affects Versions: 4.0
>>            Reporter: Mikhail Khludnev
>>              Labels: noob
>>             Fix For: 4.0
>>
>>         Attachments: SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>>
>>
>> My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, it breaks the pretty basic functionality: parent-child join.
>> the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.
>> see attachement comments for more details.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

Posted by "Mikhail Khludnev (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Khludnev updated SOLR-2947:
-----------------------------------

    Attachment: SOLR-2947.patch

Ok. here is the patch, which fixes issue with destroy() and problem with multiple threads and CachedSqlEntityProcessor.

h3.Code

h4.Context.java, ContextImpl.java 

* removed SCOPE_DOC constant. I can't find any usages. Old impl isn't thread safe. We can implement it thread safe if you want. Let me know if it's necessary.
* Pay attention that ContextImpl.putVal() *ignores the scope provided*. It should be tracked separately let me know if you like me to raise it.

h4.DataImporter.java 

I added DocBuilder.destroy() to stop thread pool after all work is done. I'm bothered by testCase's warns about "thread leaks"

h4.DIHCacheSupport.java

it just introduces a getter. But I generated diff against uncommitted SOLR-2961, so line numbers can be wrong, let me know I re-diff it.

h4.DocBuilder.java 

* EntityRunner stops create EntityProcessors and obtains it from constructor args
* proper destroying EntityProcessors
* EntityRunner.docWrapper is removed as not-thread-safe. it's passed explicitly by method arguments
* EntityRunner.entityEnded was't thread-safe too. moved into ThreadedEntityProcessorWrapper
* object instantiating was drastically amended to be threadsafe 
** single EntityRunner per Entity
** single EntityProcessor per EntityRunner
** N ThreadedEntityProcessorWrapper per EntityRunner uses its' EntityProcessor as delegate
** where N is number of threads specified at root entity (threads attr is prohibited for child entities)
** ThreadedEntityProcessorWrapper are numbered by their positions in EntityRunner's tepw list
** parent entity's ThreadedEntityProcessorWrapper always hits children's tepw with the same number as its' own
* parent entity's ThreadedEntityProcessorWrapper always hits children's tepw by plain Java synchronous call (w/o thread pool)

h4.EntityProcessor.java,EntityProcessorBase.java 
isPaged() property has been introduced

h4.EntityProcessorWrapper.java
protected transformRow() has been extracted from applyTransformer(). I need to reuse transformers logic for the paged flow but applyTransformer() has side-effect on rowcache field.

h4.ThreadedEntityProcessorWrapper.java 
in addition to all refactorings above (instantiating and field move). it contains the core idea of multithred cached entity processor:
* after tepw obtains access to thread-unaware delegate entityProcessor it need to pull whole page - all children records belong to the current parent, 
* whole page is transformed and put into tepw.rowcahce, where they will be pulled later by the parent entity tepw

h3.Tests

h4.TestThreaded.java 
added full space test for CachedSqlEP for no, 1, 2, 10 (keep in mind 1 thread don't equal to no-threads)

h4.TestEphemeralCache.java 
add double destroy() check EntityProcessors

h4.dataimport-cache-ephemeral.xml
specifies 10 threads and add double destroy() EntityProcessors

 


                
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, it breaks the pretty basic functionality: parent-child join.
> the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

Posted by "Mikhail Khludnev (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Khludnev updated SOLR-2947:
-----------------------------------

    Description: 
My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, it breaks the pretty basic functionality: parent-child join.
the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.

see attachement comments for more details. 

  was:
My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, pretty basic functionality - parent child joining is broken.
the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.

see attachement comments for more details. 

    
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, it breaks the pretty basic functionality: parent-child join.
> the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

Posted by "James Dyer (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer resolved SOLR-2947.
------------------------------

    Resolution: Fixed

Trunk Only:  r1245014 & r1245018.  Thank you Mikhail (now to the next one :) ).
                
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>            Assignee: James Dyer
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, it breaks the pretty basic functionality: parent-child join.
> the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

Posted by "Mikhail Khludnev (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190650#comment-13190650 ] 

Mikhail Khludnev commented on SOLR-2947:
----------------------------------------

bq. Small point here but I prefer the TestEphemeralCache changes I made in the Dec 11, 2011 patch version.

James, 
* my test reads base config from xml file and patches number of thread for every run, as well as your test does. So, there is no much difference here.
* I agree that the random number of threads can be much reasonable than the constant one. Anyway it should be @Ignore ed until you know.
* The really significant advantage of my test is DestroyCountCache which assert once-and-only-once destroy() call for caches. 

Regards
                
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, it breaks the pretty basic functionality: parent-child join.
> the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

Posted by "James Dyer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187889#comment-13187889 ] 

James Dyer commented on SOLR-2947:
----------------------------------

I think this patch does the right thing here, calling "destroy()" down the hierarchy of EntityProcessors, but waiting until after doc-building is complete.  While I had it this way for the single-threaded code, I punted on the multi-threaded case simply hoping that because the unit tests were passing then everything would be alright :) .  I appreciate the effort to improve the DIH multithreaded code.  We really need to get rid of bugs like this and long-term it would pay if we could try and make the code more maintainable, get better test coverage, etc.  

An example is the new "children()" method...using just the first ThreadedEntityProcessorWrapper from the list I think is valid because the "children" will be same on all the threads.  But then again, looking at how this all gets populated in the ThreadedEntityProcessorWrapper constructor, the answer (to me) isn't obvious.  Best I can say is this is probably correct and certainly a vast improvement than what is currently in Trunk.  

Small point here but I prefer the TestEphemeralCache changes I made in the Dec 11, 2011 patch version.  I switched to building the config file on-the-fly and testMultiThreaded() uses a random number of threads instead of always using 10.  Of course, if we go with this then we'd need to add "@Ignore" for testMultiThreaded() until SOLR-3011 can be commited.
                
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, it breaks the pretty basic functionality: parent-child join.
> the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

Posted by "Mikhail Khludnev (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Khludnev updated SOLR-2947:
-----------------------------------

    Description: 
My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, pretty basic functionality - parent child joining is broken.
the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.

see attachement comments for more details. 

  was:[line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entity.

    
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, pretty basic functionality - parent child joining is broken.
> the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

Posted by "Mikhail Khludnev (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Khludnev updated SOLR-2947:
-----------------------------------

    Attachment: SOLR-2947.patch

Clean up comment. Rename patch to conform to HowToContribute
                
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: SOLR-2947.patch, dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, it breaks the pretty basic functionality: parent-child join.
> the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

Posted by "Mikhail Khludnev (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162466#comment-13162466 ] 

Mikhail Khludnev commented on SOLR-2947:
----------------------------------------

a few thoughts
* formally DataImporter.initEntity() shouldn't consider config as multithread if the only single thread is specified
* I'm concerning for overall test coverage for EntityRunners. You see switching to them breaks the essential feature. Are they tested enough? is it worth to check the actual [coverage|https://builds.apache.org/job/Solr-trunk/1694/clover-report/org/apache/solr/handler/dataimport/DocBuilder.html?line=499#src-499]? You see there is not tests, where child adds anything into parent. and
* btw, can't we remove DocBuilder.buildDocument() and switch onto EntityRunners to avoid "second system" issue?
* in TestEphemeralCache I'd like to declare parent as cacheable entity too. Just for check how it works.

after this, the next fight is true multi-threading.
                
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, it breaks the pretty basic functionality: parent-child join.
> the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

Posted by "James Dyer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2947:
-----------------------------

    Attachment: SOLR-2947.patch

I will commit this shortly.
                
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>            Assignee: James Dyer
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, it breaks the pretty basic functionality: parent-child join.
> the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

Posted by "Mikhail Khludnev (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186263#comment-13186263 ] 

Mikhail Khludnev commented on SOLR-2947:
----------------------------------------

bq. ...why is it sufficient to get the children from only the first element of that iterator? 

* yes it is. just because DIH code works only with threads=1. assert entityProcessorWrapper.size()==1. but if we set threads to 2 or more, every TEPW will have own collection of children' entity runners (and every ER will have own EP). I address it in SOLR-3011 see my explanation above "DocBuilder.java/object instantiating". 

bq. are there any situations where the iterator may not return any elements?

* EntityRunner.<init> declares size of this list 1 by default. if user configured it to 0 EntityRunner.run() method will fail due to entityProcessorWrapper.get(0).  

bq.  at first i thought this was because all of the ThreadedEntityProcessorWrapper instances were initialized with identical EntityProcessors

done in SOLR-3011 patch. 
{code}
for (int i = 0; i < threads; i++) {
        thepw = new ThreadedEntityProcessorWrapper(       ...
                childrenRunners, i);           <-- identical EntityProcessors
        entityProcessorWrapper.add(thepw);
}
{code}

bq. Is this something that's just a hack that only works with 1 thread until/unless SOLR-3011 is resolved?

in according to subj it's a fix for a bug introduced by SOLR-2384. threads=2 didn't work before, but now threads=1 doesn't work too. I removed wrong EP destroying, and implement destroying which works now for threads=1, and will work for threads=10 after SOLR-3011.

bq. will this break things for people using multithreaded DIH without nested entities?

test passed, you know. it even will destroy root EP properly because the collecting of ERs in doFullDump() starts from placing root ER into result. 

bq.  Either way: a comment clarifying that monstrosity would be a good idea.

I can't distinguish which monstrosity you refer to. 



                
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, it breaks the pretty basic functionality: parent-child join.
> the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209495#comment-13209495 ] 

Robert Muir commented on SOLR-2947:
-----------------------------------

Sorry James, my mistake, I didn't realize that! 
                
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>            Assignee: James Dyer
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, it breaks the pretty basic functionality: parent-child join.
> the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209477#comment-13209477 ] 

Robert Muir commented on SOLR-2947:
-----------------------------------

Hi James: I think we should add a CHANGES.txt entry for this fix?
                
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>            Assignee: James Dyer
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, it breaks the pretty basic functionality: parent-child join.
> the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

Posted by "James Dyer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209489#comment-13209489 ] 

James Dyer commented on SOLR-2947:
----------------------------------

This bug was caused by SOLR-2382 which is trunk-only.  Do we need a CHANGES.txt entry for that?
                
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>            Assignee: James Dyer
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, it breaks the pretty basic functionality: parent-child join.
> the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

Posted by "Mikhail Khludnev (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Khludnev updated SOLR-2947:
-----------------------------------

    Attachment: dih-cache-destroy-on-threads-fix.patch
                dih-cache-threads-enabling-bug.patch

To reproduce the issue just insert single line in test config by applying dih-cache-threads-enabling-bug.patch. It curiously breaks TestEphemeralCache. 

DocBuilder.doFullDump() considers request as dataImporter.getConfig().isMultiThreaded even the single thread is specified (DataImporter.initEntity() too straightforward). But it forces DocBuilder to choose EntityRunner branch instead of  buildDocument(). 

EntityRunner destroys entity processor (and cleanups the cache) per every run() call, but for child entity the run() is called many times - once per parent. As result the only first child is indexed, and it fails the test.
You can fix it by commenting entityProcessor destroying in EntityRunner.run() at the finally section. But you lose cache releasing at all.
dih-cache-destroy-on-threads-fix.patch fixes entityProcessors destroying and cover it by slightly amended TestEphemeralCache
                
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entity.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

Posted by "Hoss Man (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186000#comment-13186000 ] 

Hoss Man commented on SOLR-2947:
--------------------------------

bq. I decided that my 32K patch from 28/Dec is to huge. I'm attaching the shorter version.

Mikhail: as someone who wants to help get DIH improvements/bug fixes commited, but probably understands less about DIH internals then you and James, i thank you for splitting out these issues and explaining what's going on, and pressing forward with patches that include tests.

The latest "SOLR-2947.patch" looks fairly good to me, and all tests pass (with the caveat that the @Ignored test depends on SOLR-3011 - although that should really be noted in the @Ignore msg) the one thing that really confuses me is the implementation of children(), it smells extremely fishy...

{noformat}
+    Collection<EntityRunner> children(){
+        return entityProcessorWrapper.iterator().next().children.values();
+    } 
{noformat}

...why is it sufficient to get the children from only the first element of that iterator?  are there any situations where the iterator may not return any elements?

I'm guessing the answer to the later is "no" because that would mean 0 threads, correct? (in which case, let's toss a java assert in there for good measure).

As for the first question: at first i thought this was because all of the  ThreadedEntityProcessorWrapper instances were initialized with identical EntityProcessors so any one would due (in which why not just use the entityProcessor handing directly off the parent EntityRunning) -- but poking around ThreadedEntityProcessorWrapper i see that they seem to always create new EntityRunner instance, so i'm at a loss to understand how that "children()" method makes sense.

Is this something that's just a hack that only works with 1 thread until/unless SOLR-3011 is resolved?  will this break things for people using multithreaded DIH _without_ nested entities? (i don't think so -- because that would contradict the premise of there being any child entities that need their entityProcessor to be desctoryed -- but i wanted to sanity check the question).  Either way: a comment clarifying that monstrosity would be a good idea. 

James: are you comfortable with the latest patch?
                
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, it breaks the pretty basic functionality: parent-child join.
> the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

Posted by "James Dyer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205644#comment-13205644 ] 

James Dyer commented on SOLR-2947:
----------------------------------

I not set up yet and not sure how long it'll be.  But I do want to get the DIH bugs taken care of asap.
                
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, it breaks the pretty basic functionality: parent-child join.
> the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Assigned] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

Posted by "James Dyer (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer reassigned SOLR-2947:
--------------------------------

    Assignee: James Dyer
    
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>            Assignee: James Dyer
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, it breaks the pretty basic functionality: parent-child join.
> the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

Posted by "James Dyer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Dyer updated SOLR-2947:
-----------------------------

    Attachment: SOLR-2947.patch

Mikhail,

I appreciate your finding this bug and providing a fix.  Indeed I wasn't sure I was handling "destroy()" correctly for the multi-threaded case in SOLR-2382 but then again all tests were passing.

I think you're absolutely correct we don't have enough threaded tests.  To help with this, I expanded "TestEphemeralCache" (see updated patch) to run it 3 times: first single-threaded, second single-threaded with 'threads=1', and third as truly multi-threaded (2-6 threads randomly chosen).

Obviously the third case still fails but you indicate you plan on fixing that one next?  :)  That would be great if you could.  Let me know if there is any way I can help although I admit I'm a bit unsure of myself in the DIH multi-threaded code.  Is it your impression that DIH caching ever worked threaded, or do you think SOLR-2382 might have broken it?

I also agree that it shouldn't use the "threaded" code if "threads=1"  For now, it provides us with a nice testing scenario to fix the 'destroy()' semantics without having to tackle the multi-threaded cacheing beast.

Although I'm with you that it would be great if we could just combine the single-threaded and multi-threaded cases into the same code.  But I would think before that can happen all of these threaded bugs need to be worked through.

I apologize for taking so long to look at this.  On the other hand, I'm not going to be much help to you in getting this committed.  But if I can help in any other way let me know.
                
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: SOLR-2947.patch, SOLR-2947.patch, dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, it breaks the pretty basic functionality: parent-child join.
> the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

Posted by "Mikhail Khludnev (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Khludnev updated SOLR-2947:
-----------------------------------

    Attachment: SOLR-2947.patch

add comments, assert, and root entity only test. though not really significant update.
                
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, it breaks the pretty basic functionality: parent-child join.
> the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

Posted by "Mikhail Khludnev (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205615#comment-13205615 ] 

Mikhail Khludnev commented on SOLR-2947:
----------------------------------------

James, 
My congratulations for becoming a committer!

Can't you push forward this one and small SOLR-2933 too to unlock brutal SOLR-3011?

Regards
                
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, it breaks the pretty basic functionality: parent-child join.
> the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

Posted by "Mikhail Khludnev (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Khludnev updated SOLR-2947:
-----------------------------------

    Attachment: SOLR-2947.patch

Hello,

I decided that my 32K patch from 28/Dec is to huge. I'm attaching the shorter version. It's quite close to the James's one, but I put @Ignore for threads="10" to keep build green after apply; and I also assert that EntityProcessors are properly destroyed.
I'm going to create sibling issue for DIH multi-threading and put most of my prev patch there. 
                
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, it breaks the pretty basic functionality: parent-child join.
> the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2947) DIH caching bug - EntityRunner destroys child entity processor

Posted by "Mikhail Khludnev (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167755#comment-13167755 ] 

Mikhail Khludnev commented on SOLR-2947:
----------------------------------------

bq.Is it your impression that DIH caching ever worked threaded, or do you think SOLR-2382 might have broken it? 
No. It didn't. Multiple threads don't work with CachedSqlEntityProcessor at 3.4 too.  
bq. But I would think before that can happen all of these threaded bugs need to be worked through.
ok. I'm going to address it.
                
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: SOLR-2947.patch, SOLR-2947.patch, dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. If I enable DocBuilder.EntityRunner flow even for single thread, it breaks the pretty basic functionality: parent-child join.
> the reason is [line 473 entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org