You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2008/01/05 02:53:34 UTC

[jira] Resolved: (LUCENE-1117) Intermittent thread safety issue with EnwikiDocMaker

     [ https://issues.apache.org/jira/browse/LUCENE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-1117.
----------------------------------------

    Resolution: Fixed

> Intermittent thread safety issue with EnwikiDocMaker
> ----------------------------------------------------
>
>                 Key: LUCENE-1117
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1117
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/benchmark
>    Affects Versions: 2.2, 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: LUCENE-1117.patch
>
>
> Intermittent thread safety issue with EnwikiDocMaker
> When I run the conf/wikipediaOneRound.alg, sometimes it gets started
> OK, other times (about 1/3rd the time) I see this:
>      Exception in thread "Thread-0" java.lang.RuntimeException: java.io.IOException: Bad file descriptor
>      	at org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker$Parser.run(EnwikiDocMaker.java:76)
>      	at java.lang.Thread.run(Thread.java:595)
>      Caused by: java.io.IOException: Bad file descriptor
>      	at java.io.FileInputStream.readBytes(Native Method)
>      	at java.io.FileInputStream.read(FileInputStream.java:194)
>      	at org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown Source)
>      	at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
>      	at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
>      	at org.apache.xerces.impl.XMLEntityScanner.scanQName(Unknown Source)
>      	at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
>      	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
>      	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
>      	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>      	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>      	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>      	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>      	at org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker$Parser.run(EnwikiDocMaker.java:60)
>      	... 1 more
> The problem is that the thread that pulls the XML docs is started as
> soon as EnwikiDocMaker class is instantiated.  When it's started, it
> uses the fileIS (FileInputStream) to feed the XML Parser.  But,
> openFile is actually called twice on starting the alg, if you use any
> task deriving from ResetInputsTask, which closes the original fileIS
> that the XML parser may be using.
> I changed the thread to instead start on-demand the first time next()
> is called.  I also removed a redundant resetInputs() call (which was
> opening the file more frequently than needed).  Finally, I added logic
> in the thread to detect that the input stream was closed (because
> LineDocMaker.resetInputs() was called, eg, if we are not running the
> doc maker to exhaustion).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org