You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2009/10/19 21:29:59 UTC

[jira] Created: (LUCENE-1996) EnwikiContentSource isn't thread safe

EnwikiContentSource isn't thread safe
-------------------------------------

                 Key: LUCENE-1996
                 URL: https://issues.apache.org/jira/browse/LUCENE-1996
             Project: Lucene - Java
          Issue Type: Bug
          Components: contrib/benchmark
            Reporter: Michael McCandless
            Priority: Minor
             Fix For: 3.1


When I run this alg:
{code}
analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer

content.source=org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource
docs.file=/x/lucene/enwiki-20090724-pages-articles.xml.bz2
doc.tokenized = false
ram.flush.mb=32.0


doc.stored = false
doc.term.vector = false
log.step.AddDoc=10000

directory=FSDirectory
autocommit=false
compound=false

work.dir=/lucene/work.wiki.nd0.02M

{ "BuildIndex"
  - CreateIndex
  [ { "AddDocs" AddDoc > : 10000 } : 2
  - CloseIndex
}

RepSumByPrefRound BuildIndex
{code}

I hit exceptions in each thread like this:

{code}
Exception in thread "Thread-2" java.lang.RuntimeException: org.xml.sax.SAXParseException: Open quote is expected for attribute "msxi" associated with an  element type  "mdiiki".
	at org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:189)
	at java.lang.Thread.run(Thread.java:613)
Caused by: org.xml.sax.SAXParseException: Open quote is expected for attribute "msxi" associated with an  element type  "mdiiki".
	at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:236)
	at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:215)
	at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:386)
	at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:316)
	at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1441)
	at com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:802)
	at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanAttribute(XMLNSDocumentScannerImpl.java:578)
	at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:222)
	at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook(XMLNSDocumentScannerImpl.java:779)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1794)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:368)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:834)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148)
	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1242)
	at org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:166)
	... 1 more
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1996) EnwikiContentSource isn't thread safe

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-1996.
----------------------------------------

    Resolution: Duplicate

Duh, yes, dup.  Must read email before opening issues ;)

> EnwikiContentSource isn't thread safe
> -------------------------------------
>
>                 Key: LUCENE-1996
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1996
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/benchmark
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1
>
>
> When I run this alg:
> {code}
> analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
> content.source=org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource
> docs.file=/x/lucene/enwiki-20090724-pages-articles.xml.bz2
> doc.tokenized = false
> ram.flush.mb=32.0
> doc.stored = false
> doc.term.vector = false
> log.step.AddDoc=10000
> directory=FSDirectory
> autocommit=false
> compound=false
> work.dir=/lucene/work.wiki.nd0.02M
> { "BuildIndex"
>   - CreateIndex
>   [ { "AddDocs" AddDoc > : 10000 } : 2
>   - CloseIndex
> }
> RepSumByPrefRound BuildIndex
> {code}
> I hit exceptions in each thread like this:
> {code}
> Exception in thread "Thread-2" java.lang.RuntimeException: org.xml.sax.SAXParseException: Open quote is expected for attribute "msxi" associated with an  element type  "mdiiki".
> 	at org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:189)
> 	at java.lang.Thread.run(Thread.java:613)
> Caused by: org.xml.sax.SAXParseException: Open quote is expected for attribute "msxi" associated with an  element type  "mdiiki".
> 	at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:236)
> 	at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:215)
> 	at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:386)
> 	at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:316)
> 	at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1441)
> 	at com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:802)
> 	at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanAttribute(XMLNSDocumentScannerImpl.java:578)
> 	at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:222)
> 	at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook(XMLNSDocumentScannerImpl.java:779)
> 	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1794)
> 	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:368)
> 	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:834)
> 	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
> 	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148)
> 	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1242)
> 	at org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:166)
> 	... 1 more
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1996) EnwikiContentSource isn't thread safe

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767474#action_12767474 ] 

Mark Miller commented on LUCENE-1996:
-------------------------------------

dupe? LUCENE-1994

> EnwikiContentSource isn't thread safe
> -------------------------------------
>
>                 Key: LUCENE-1996
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1996
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/benchmark
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1
>
>
> When I run this alg:
> {code}
> analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
> content.source=org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource
> docs.file=/x/lucene/enwiki-20090724-pages-articles.xml.bz2
> doc.tokenized = false
> ram.flush.mb=32.0
> doc.stored = false
> doc.term.vector = false
> log.step.AddDoc=10000
> directory=FSDirectory
> autocommit=false
> compound=false
> work.dir=/lucene/work.wiki.nd0.02M
> { "BuildIndex"
>   - CreateIndex
>   [ { "AddDocs" AddDoc > : 10000 } : 2
>   - CloseIndex
> }
> RepSumByPrefRound BuildIndex
> {code}
> I hit exceptions in each thread like this:
> {code}
> Exception in thread "Thread-2" java.lang.RuntimeException: org.xml.sax.SAXParseException: Open quote is expected for attribute "msxi" associated with an  element type  "mdiiki".
> 	at org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:189)
> 	at java.lang.Thread.run(Thread.java:613)
> Caused by: org.xml.sax.SAXParseException: Open quote is expected for attribute "msxi" associated with an  element type  "mdiiki".
> 	at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:236)
> 	at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:215)
> 	at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:386)
> 	at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:316)
> 	at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1441)
> 	at com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:802)
> 	at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanAttribute(XMLNSDocumentScannerImpl.java:578)
> 	at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:222)
> 	at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook(XMLNSDocumentScannerImpl.java:779)
> 	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1794)
> 	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:368)
> 	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:834)
> 	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
> 	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148)
> 	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1242)
> 	at org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:166)
> 	... 1 more
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1996) EnwikiContentSource isn't thread safe

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767487#action_12767487 ] 

Mark Miller commented on LUCENE-1996:
-------------------------------------

The scary part is that its been around for some time and we both independently hit it today ... quantum mechanics in action I guess ... 

> EnwikiContentSource isn't thread safe
> -------------------------------------
>
>                 Key: LUCENE-1996
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1996
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/benchmark
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1
>
>
> When I run this alg:
> {code}
> analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
> content.source=org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource
> docs.file=/x/lucene/enwiki-20090724-pages-articles.xml.bz2
> doc.tokenized = false
> ram.flush.mb=32.0
> doc.stored = false
> doc.term.vector = false
> log.step.AddDoc=10000
> directory=FSDirectory
> autocommit=false
> compound=false
> work.dir=/lucene/work.wiki.nd0.02M
> { "BuildIndex"
>   - CreateIndex
>   [ { "AddDocs" AddDoc > : 10000 } : 2
>   - CloseIndex
> }
> RepSumByPrefRound BuildIndex
> {code}
> I hit exceptions in each thread like this:
> {code}
> Exception in thread "Thread-2" java.lang.RuntimeException: org.xml.sax.SAXParseException: Open quote is expected for attribute "msxi" associated with an  element type  "mdiiki".
> 	at org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:189)
> 	at java.lang.Thread.run(Thread.java:613)
> Caused by: org.xml.sax.SAXParseException: Open quote is expected for attribute "msxi" associated with an  element type  "mdiiki".
> 	at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:236)
> 	at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:215)
> 	at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:386)
> 	at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:316)
> 	at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1441)
> 	at com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:802)
> 	at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanAttribute(XMLNSDocumentScannerImpl.java:578)
> 	at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:222)
> 	at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook(XMLNSDocumentScannerImpl.java:779)
> 	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1794)
> 	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:368)
> 	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:834)
> 	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
> 	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148)
> 	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1242)
> 	at org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:166)
> 	... 1 more
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1996) EnwikiContentSource isn't thread safe

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767492#action_12767492 ] 

Michael McCandless commented on LUCENE-1996:
--------------------------------------------

That IS really crazy.

> EnwikiContentSource isn't thread safe
> -------------------------------------
>
>                 Key: LUCENE-1996
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1996
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/benchmark
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1
>
>
> When I run this alg:
> {code}
> analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
> content.source=org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource
> docs.file=/x/lucene/enwiki-20090724-pages-articles.xml.bz2
> doc.tokenized = false
> ram.flush.mb=32.0
> doc.stored = false
> doc.term.vector = false
> log.step.AddDoc=10000
> directory=FSDirectory
> autocommit=false
> compound=false
> work.dir=/lucene/work.wiki.nd0.02M
> { "BuildIndex"
>   - CreateIndex
>   [ { "AddDocs" AddDoc > : 10000 } : 2
>   - CloseIndex
> }
> RepSumByPrefRound BuildIndex
> {code}
> I hit exceptions in each thread like this:
> {code}
> Exception in thread "Thread-2" java.lang.RuntimeException: org.xml.sax.SAXParseException: Open quote is expected for attribute "msxi" associated with an  element type  "mdiiki".
> 	at org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:189)
> 	at java.lang.Thread.run(Thread.java:613)
> Caused by: org.xml.sax.SAXParseException: Open quote is expected for attribute "msxi" associated with an  element type  "mdiiki".
> 	at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:236)
> 	at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:215)
> 	at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:386)
> 	at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:316)
> 	at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1441)
> 	at com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:802)
> 	at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanAttribute(XMLNSDocumentScannerImpl.java:578)
> 	at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:222)
> 	at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook(XMLNSDocumentScannerImpl.java:779)
> 	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1794)
> 	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:368)
> 	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:834)
> 	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
> 	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148)
> 	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1242)
> 	at org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:166)
> 	... 1 more
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org