You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Bernd Fehling (Commented) (JIRA)" <ji...@apache.org> on 2012/04/04 11:22:23 UTC

[jira] [Commented] (SOLR-3314) DIH with multi-threading throws exception

    [ https://issues.apache.org/jira/browse/SOLR-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246132#comment-13246132 ] 

Bernd Fehling commented on SOLR-3314:
-------------------------------------


A very dirty workaround for me so far, I have placed the log.info from LogUpdateProcessor.finish() within try/catch and ignore errors, which prevents rollbacks.


                
> DIH with multi-threading throws exception
> -----------------------------------------
>
>                 Key: SOLR-3314
>                 URL: https://issues.apache.org/jira/browse/SOLR-3314
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 3.6
>            Reporter: Bernd Fehling
>            Assignee: James Dyer
>             Fix For: 3.6
>
>
> While loading with DIH in multi-threading mode there are sometimes exceptions.
> {code}
> Apr 4, 2012 10:19:10 AM org.apache.solr.common.SolrException log
> SEVERE: Full Import failed:java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String
> 	at org.apache.solr.common.util.NamedList.getName(NamedList.java:131)
> 	at org.apache.solr.common.util.NamedList.toString(NamedList.java:258)
> 	at java.lang.String.valueOf(String.java:2826)
> 	at java.lang.StringBuilder.append(StringBuilder.java:115)
> 	at org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:188)
> 	at org.apache.solr.update.processor.UpdateRequestProcessor.finish(UpdateRequestProcessor.java:78)
> 	at org.apache.solr.handler.dataimport.SolrWriter.close(SolrWriter.java:53)
> 	at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:268)
> 	at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375)
> 	at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445)
> 	at org.apache.solr.handler.dataimport.DataImporter$3.run(DataImporter.java:426)
> Apr 4, 2012 10:19:10 AM org.apache.solr.update.DirectUpdateHandler2 rollback
> INFO: start rollback
> Apr 4, 2012 10:19:10 AM org.apache.solr.update.DirectUpdateHandler2 rollback
> INFO: end_rollback
> {code}
> Analysis:
> After loading the LogUpdateProcessor produces the logs by writing the content of "toLog" and the elapsed time.
> {code}
>     log.info( "" + toLog + " 0 " + (elapsed) );
> {code}
> "toLog" is a NamedList of org.apache.solr.common.util.NamedList which will be prepared for printing with methods "toString", "getName" and "getVal". The NamedList consists of name/value pairs, where the name must always be a String. As the exceptions points out it somehow happens that the name can be an ArrayList.
> To trace this further down I modified org.apache.solr.common.util.NamedList the method "getName" as following:
> {code}
>   public String getName(int idx) {
>     if (nvPairs.get(idx << 1).getClass().getName().equals("java.util.ArrayList")) {
>       System.out.println( "<Object>>" + nvPairs.get(idx << 1).toString() + "<" );
>     }
>     return (String)nvPairs.get(idx << 1);
>   }
> {code}
> After several tries I could procude an exception and the output was:
> {code}
> <Object>>[testdir2_testfile2_record2, testdir2_testfile2_record3, testdir2_testfile2_record2, testdir2_testfile2_record1, testdir2_testfile2_record3, testdir2_testfile2_record1, testdir2_testfile2_record1, testdir2_testfile2_record2, ... (24 adds)]<
> {code}
> What we see here is:
> - we have 2 files in 2 directories each of 3 records but it reports "24 adds", while the index afterwards only has the 6 records (self-healing by uniq IDs in the index)
> - the record IDs are multiple times in the ArrayList
> As a matter of fact something is not thread-safe. The "LogUpdateProcessorFactory"???
> I have no idea how to provide a unit test for this one as it is only in DIH multi-theading mode and only sometimes.
> Nevertheless it would be bad to have a rollback after loading some million records :-(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org