You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/08/25 01:08:17 UTC
[jira] Created: (SOLR-2088) contrib/extraction fails on a turkish
computer
contrib/extraction fails on a turkish computer
----------------------------------------------
Key: SOLR-2088
URL: https://issues.apache.org/jira/browse/SOLR-2088
Project: Solr
Issue Type: Bug
Components: contrib - Solr Cell (Tika extraction)
Reporter: Robert Muir
Fix For: 3.1, 4.0
reproduce with: ant test -Dtests.locale=tr_TR
{noformat}
test:
[junit] Running org.apache.solr.handler.ExtractingRequestHandlerTest
[junit] xml response was: <?xml version="1.0" encoding="UTF-8"?>
[junit] <response>
[junit] <lst name="responseHeader"><int name="status">0</int><int name="QTime">5</int></lst>
<result name="response" numFound="0" start="0"/>
[junit] </response>
[junit]
[junit] request was: start=0&q=title:Welcome&qt=standard&rows=20&version=2.2)
[junit] Tests run: 8, Failures: 1, Errors: 0, Time elapsed: 3.968 sec
[junit] Test org.apache.solr.handler.ExtractingRequestHandlerTest FAILED
BUILD FAILED
{noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Commented: (SOLR-2088) contrib/extraction fails on a turkish
computer
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902455#action_12902455 ]
Robert Muir commented on SOLR-2088:
-----------------------------------
Looks like the same problem (in this case you got the random locale of 'tr').
the bug is likely a toLowerCase() that should be toLowerCase(Locale.ENGLISH).
All tests used to pass with this locale, definitely as of revision 945343. See LUCENE-2466
Was tika upgraded since then? perhaps the problem is in a dependency?
I did a few quick reviews of the solr code and nothing stood out.
> contrib/extraction fails on a turkish computer
> ----------------------------------------------
>
> Key: SOLR-2088
> URL: https://issues.apache.org/jira/browse/SOLR-2088
> Project: Solr
> Issue Type: Bug
> Components: contrib - Solr Cell (Tika extraction)
> Reporter: Robert Muir
> Fix For: 3.1, 4.0
>
>
> reproduce with: ant test -Dtests.locale=tr_TR
> {noformat}
> test:
> [junit] Running org.apache.solr.handler.ExtractingRequestHandlerTest
> [junit] xml response was: <?xml version="1.0" encoding="UTF-8"?>
> [junit] <response>
> [junit] <lst name="responseHeader"><int name="status">0</int><int name="QTime">5</int></lst>
> <result name="response" numFound="0" start="0"/>
> [junit] </response>
> [junit]
> [junit] request was: start=0&q=title:Welcome&qt=standard&rows=20&version=2.2)
> [junit] Tests run: 8, Failures: 1, Errors: 0, Time elapsed: 3.968 sec
> [junit] Test org.apache.solr.handler.ExtractingRequestHandlerTest FAILED
> BUILD FAILED
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Commented: (SOLR-2088) contrib/extraction fails on a turkish
computer
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902460#action_12902460 ]
Robert Muir commented on SOLR-2088:
-----------------------------------
ok, i'll look at tika with this locale. perhaps one of its own tests will be triggered.
> contrib/extraction fails on a turkish computer
> ----------------------------------------------
>
> Key: SOLR-2088
> URL: https://issues.apache.org/jira/browse/SOLR-2088
> Project: Solr
> Issue Type: Bug
> Components: contrib - Solr Cell (Tika extraction)
> Reporter: Robert Muir
> Fix For: 3.1, 4.0
>
>
> reproduce with: ant test -Dtests.locale=tr_TR
> {noformat}
> test:
> [junit] Running org.apache.solr.handler.ExtractingRequestHandlerTest
> [junit] xml response was: <?xml version="1.0" encoding="UTF-8"?>
> [junit] <response>
> [junit] <lst name="responseHeader"><int name="status">0</int><int name="QTime">5</int></lst>
> <result name="response" numFound="0" start="0"/>
> [junit] </response>
> [junit]
> [junit] request was: start=0&q=title:Welcome&qt=standard&rows=20&version=2.2)
> [junit] Tests run: 8, Failures: 1, Errors: 0, Time elapsed: 3.968 sec
> [junit] Test org.apache.solr.handler.ExtractingRequestHandlerTest FAILED
> BUILD FAILED
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Commented: (SOLR-2088) contrib/extraction fails on a turkish
computer
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902471#action_12902471 ]
Robert Muir commented on SOLR-2088:
-----------------------------------
Well i found one problem in html parsing (TIKA-498) that causes tika's own tests to fail:
But i havent tested yet with rebuilt jars to see if this is the problem causing this issue, too
> contrib/extraction fails on a turkish computer
> ----------------------------------------------
>
> Key: SOLR-2088
> URL: https://issues.apache.org/jira/browse/SOLR-2088
> Project: Solr
> Issue Type: Bug
> Components: contrib - Solr Cell (Tika extraction)
> Reporter: Robert Muir
> Fix For: 3.1, 4.0
>
>
> reproduce with: ant test -Dtests.locale=tr_TR
> {noformat}
> test:
> [junit] Running org.apache.solr.handler.ExtractingRequestHandlerTest
> [junit] xml response was: <?xml version="1.0" encoding="UTF-8"?>
> [junit] <response>
> [junit] <lst name="responseHeader"><int name="status">0</int><int name="QTime">5</int></lst>
> <result name="response" numFound="0" start="0"/>
> [junit] </response>
> [junit]
> [junit] request was: start=0&q=title:Welcome&qt=standard&rows=20&version=2.2)
> [junit] Tests run: 8, Failures: 1, Errors: 0, Time elapsed: 3.968 sec
> [junit] Test org.apache.solr.handler.ExtractingRequestHandlerTest FAILED
> BUILD FAILED
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Commented: (SOLR-2088) contrib/extraction fails on a turkish
computer
Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902453#action_12902453 ]
Mark Miller commented on SOLR-2088:
-----------------------------------
I'm running into this on my hudson box - more info:
Stacktrace
junit.framework.AssertionFailedError: query failed XPath: //*[@numFound='1']
xml response was: <?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">3</int></lst><result name="response" numFound="0" start="0"/>
</response>
request was: start=0&q=title:Welcome&qt=standard&rows=20&version=2.2
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:320)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:310)
at org.apache.solr.handler.ExtractingRequestHandlerTest.testExtraction(ExtractingRequestHandlerTest.java:83)
Standard Output
NOTE: random codec of testcase 'testExtraction' was: MockSep
NOTE: random locale of testcase 'testExtraction' was: tr
NOTE: random timezone of testcase 'testExtraction' was: Africa/Dar_es_Salaam
Standard Error
25.Ağu.2010 08:51:38 org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'a'
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:321)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:120)
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:125)
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:195)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
at org.apache.solr.util.TestHarness.queryAndResponse(TestHarness.java:334)
at org.apache.solr.handler.ExtractingRequestHandlerTest.loadLocal(ExtractingRequestHandlerTest.java:361)
at org.apache.solr.handler.ExtractingRequestHandlerTest.testDefaultField(ExtractingRequestHandlerTest.java:149)
> contrib/extraction fails on a turkish computer
> ----------------------------------------------
>
> Key: SOLR-2088
> URL: https://issues.apache.org/jira/browse/SOLR-2088
> Project: Solr
> Issue Type: Bug
> Components: contrib - Solr Cell (Tika extraction)
> Reporter: Robert Muir
> Fix For: 3.1, 4.0
>
>
> reproduce with: ant test -Dtests.locale=tr_TR
> {noformat}
> test:
> [junit] Running org.apache.solr.handler.ExtractingRequestHandlerTest
> [junit] xml response was: <?xml version="1.0" encoding="UTF-8"?>
> [junit] <response>
> [junit] <lst name="responseHeader"><int name="status">0</int><int name="QTime">5</int></lst>
> <result name="response" numFound="0" start="0"/>
> [junit] </response>
> [junit]
> [junit] request was: start=0&q=title:Welcome&qt=standard&rows=20&version=2.2)
> [junit] Tests run: 8, Failures: 1, Errors: 0, Time elapsed: 3.968 sec
> [junit] Test org.apache.solr.handler.ExtractingRequestHandlerTest FAILED
> BUILD FAILED
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] Commented: (SOLR-2088) contrib/extraction fails on a turkish
computer
Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902458#action_12902458 ]
Mark Miller commented on SOLR-2088:
-----------------------------------
Yes, I think Tika was upgraded fairly recently. To a .8 snapshot I think.
> contrib/extraction fails on a turkish computer
> ----------------------------------------------
>
> Key: SOLR-2088
> URL: https://issues.apache.org/jira/browse/SOLR-2088
> Project: Solr
> Issue Type: Bug
> Components: contrib - Solr Cell (Tika extraction)
> Reporter: Robert Muir
> Fix For: 3.1, 4.0
>
>
> reproduce with: ant test -Dtests.locale=tr_TR
> {noformat}
> test:
> [junit] Running org.apache.solr.handler.ExtractingRequestHandlerTest
> [junit] xml response was: <?xml version="1.0" encoding="UTF-8"?>
> [junit] <response>
> [junit] <lst name="responseHeader"><int name="status">0</int><int name="QTime">5</int></lst>
> <result name="response" numFound="0" start="0"/>
> [junit] </response>
> [junit]
> [junit] request was: start=0&q=title:Welcome&qt=standard&rows=20&version=2.2)
> [junit] Tests run: 8, Failures: 1, Errors: 0, Time elapsed: 3.968 sec
> [junit] Test org.apache.solr.handler.ExtractingRequestHandlerTest FAILED
> BUILD FAILED
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org