You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/08/25 01:08:17 UTC

[jira] Created: (SOLR-2088) contrib/extraction fails on a turkish computer

contrib/extraction fails on a turkish computer
----------------------------------------------

                 Key: SOLR-2088
                 URL: https://issues.apache.org/jira/browse/SOLR-2088
             Project: Solr
          Issue Type: Bug
          Components: contrib - Solr Cell (Tika extraction)
            Reporter: Robert Muir
             Fix For: 3.1, 4.0


reproduce with: ant test -Dtests.locale=tr_TR

{noformat}
test:
    [junit] Running org.apache.solr.handler.ExtractingRequestHandlerTest
    [junit]  xml response was: <?xml version="1.0" encoding="UTF-8"?>
    [junit] <response>
    [junit] <lst name="responseHeader"><int name="status">0</int><int name="QTime">5</int></lst>
<result name="response" numFound="0" start="0"/>
    [junit] </response>
    [junit]
    [junit]  request was: start=0&q=title:Welcome&qt=standard&rows=20&version=2.2)
    [junit] Tests run: 8, Failures: 1, Errors: 0, Time elapsed: 3.968 sec
    [junit] Test org.apache.solr.handler.ExtractingRequestHandlerTest FAILED

BUILD FAILED
{noformat}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2088) contrib/extraction fails on a turkish computer

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902455#action_12902455 ] 

Robert Muir commented on SOLR-2088:
-----------------------------------

Looks like the same problem (in this case you got the random locale of 'tr').

the bug is likely a toLowerCase() that should be toLowerCase(Locale.ENGLISH).

All tests used to pass with this locale, definitely as of revision 945343. See LUCENE-2466

Was tika upgraded since then? perhaps the problem is in a dependency? 
I did a few quick reviews of the solr code and nothing stood out.


> contrib/extraction fails on a turkish computer
> ----------------------------------------------
>
>                 Key: SOLR-2088
>                 URL: https://issues.apache.org/jira/browse/SOLR-2088
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - Solr Cell (Tika extraction)
>            Reporter: Robert Muir
>             Fix For: 3.1, 4.0
>
>
> reproduce with: ant test -Dtests.locale=tr_TR
> {noformat}
> test:
>     [junit] Running org.apache.solr.handler.ExtractingRequestHandlerTest
>     [junit]  xml response was: <?xml version="1.0" encoding="UTF-8"?>
>     [junit] <response>
>     [junit] <lst name="responseHeader"><int name="status">0</int><int name="QTime">5</int></lst>
> <result name="response" numFound="0" start="0"/>
>     [junit] </response>
>     [junit]
>     [junit]  request was: start=0&q=title:Welcome&qt=standard&rows=20&version=2.2)
>     [junit] Tests run: 8, Failures: 1, Errors: 0, Time elapsed: 3.968 sec
>     [junit] Test org.apache.solr.handler.ExtractingRequestHandlerTest FAILED
> BUILD FAILED
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2088) contrib/extraction fails on a turkish computer

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902460#action_12902460 ] 

Robert Muir commented on SOLR-2088:
-----------------------------------

ok, i'll look at tika with this locale. perhaps one of its own tests will be triggered.

> contrib/extraction fails on a turkish computer
> ----------------------------------------------
>
>                 Key: SOLR-2088
>                 URL: https://issues.apache.org/jira/browse/SOLR-2088
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - Solr Cell (Tika extraction)
>            Reporter: Robert Muir
>             Fix For: 3.1, 4.0
>
>
> reproduce with: ant test -Dtests.locale=tr_TR
> {noformat}
> test:
>     [junit] Running org.apache.solr.handler.ExtractingRequestHandlerTest
>     [junit]  xml response was: <?xml version="1.0" encoding="UTF-8"?>
>     [junit] <response>
>     [junit] <lst name="responseHeader"><int name="status">0</int><int name="QTime">5</int></lst>
> <result name="response" numFound="0" start="0"/>
>     [junit] </response>
>     [junit]
>     [junit]  request was: start=0&q=title:Welcome&qt=standard&rows=20&version=2.2)
>     [junit] Tests run: 8, Failures: 1, Errors: 0, Time elapsed: 3.968 sec
>     [junit] Test org.apache.solr.handler.ExtractingRequestHandlerTest FAILED
> BUILD FAILED
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2088) contrib/extraction fails on a turkish computer

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902471#action_12902471 ] 

Robert Muir commented on SOLR-2088:
-----------------------------------

Well i found one problem in html parsing (TIKA-498) that causes tika's own tests to fail:

But i havent tested yet with rebuilt jars to see if this is the problem causing this issue, too


> contrib/extraction fails on a turkish computer
> ----------------------------------------------
>
>                 Key: SOLR-2088
>                 URL: https://issues.apache.org/jira/browse/SOLR-2088
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - Solr Cell (Tika extraction)
>            Reporter: Robert Muir
>             Fix For: 3.1, 4.0
>
>
> reproduce with: ant test -Dtests.locale=tr_TR
> {noformat}
> test:
>     [junit] Running org.apache.solr.handler.ExtractingRequestHandlerTest
>     [junit]  xml response was: <?xml version="1.0" encoding="UTF-8"?>
>     [junit] <response>
>     [junit] <lst name="responseHeader"><int name="status">0</int><int name="QTime">5</int></lst>
> <result name="response" numFound="0" start="0"/>
>     [junit] </response>
>     [junit]
>     [junit]  request was: start=0&q=title:Welcome&qt=standard&rows=20&version=2.2)
>     [junit] Tests run: 8, Failures: 1, Errors: 0, Time elapsed: 3.968 sec
>     [junit] Test org.apache.solr.handler.ExtractingRequestHandlerTest FAILED
> BUILD FAILED
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2088) contrib/extraction fails on a turkish computer

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902453#action_12902453 ] 

Mark Miller commented on SOLR-2088:
-----------------------------------

I'm running into this on my hudson box - more info:

Stacktrace

junit.framework.AssertionFailedError: query failed XPath: //*[@numFound='1']
 xml response was: <?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">3</int></lst><result name="response" numFound="0" start="0"/>
</response>

 request was: start=0&q=title:Welcome&qt=standard&rows=20&version=2.2
	at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:320)
	at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:310)
	at org.apache.solr.handler.ExtractingRequestHandlerTest.testExtraction(ExtractingRequestHandlerTest.java:83)
Standard Output

NOTE: random codec of testcase 'testExtraction' was: MockSep
NOTE: random locale of testcase 'testExtraction' was: tr
NOTE: random timezone of testcase 'testExtraction' was: Africa/Dar_es_Salaam
Standard Error

25.Ağu.2010 08:51:38 org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'a'
	at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:321)
	at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
	at org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:120)
	at org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:125)
	at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:195)
	at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
	at org.apache.solr.util.TestHarness.queryAndResponse(TestHarness.java:334)
	at org.apache.solr.handler.ExtractingRequestHandlerTest.loadLocal(ExtractingRequestHandlerTest.java:361)
	at org.apache.solr.handler.ExtractingRequestHandlerTest.testDefaultField(ExtractingRequestHandlerTest.java:149)

> contrib/extraction fails on a turkish computer
> ----------------------------------------------
>
>                 Key: SOLR-2088
>                 URL: https://issues.apache.org/jira/browse/SOLR-2088
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - Solr Cell (Tika extraction)
>            Reporter: Robert Muir
>             Fix For: 3.1, 4.0
>
>
> reproduce with: ant test -Dtests.locale=tr_TR
> {noformat}
> test:
>     [junit] Running org.apache.solr.handler.ExtractingRequestHandlerTest
>     [junit]  xml response was: <?xml version="1.0" encoding="UTF-8"?>
>     [junit] <response>
>     [junit] <lst name="responseHeader"><int name="status">0</int><int name="QTime">5</int></lst>
> <result name="response" numFound="0" start="0"/>
>     [junit] </response>
>     [junit]
>     [junit]  request was: start=0&q=title:Welcome&qt=standard&rows=20&version=2.2)
>     [junit] Tests run: 8, Failures: 1, Errors: 0, Time elapsed: 3.968 sec
>     [junit] Test org.apache.solr.handler.ExtractingRequestHandlerTest FAILED
> BUILD FAILED
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (SOLR-2088) contrib/extraction fails on a turkish computer

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902458#action_12902458 ] 

Mark Miller commented on SOLR-2088:
-----------------------------------

Yes, I think Tika was upgraded fairly recently. To a .8 snapshot I think.

> contrib/extraction fails on a turkish computer
> ----------------------------------------------
>
>                 Key: SOLR-2088
>                 URL: https://issues.apache.org/jira/browse/SOLR-2088
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - Solr Cell (Tika extraction)
>            Reporter: Robert Muir
>             Fix For: 3.1, 4.0
>
>
> reproduce with: ant test -Dtests.locale=tr_TR
> {noformat}
> test:
>     [junit] Running org.apache.solr.handler.ExtractingRequestHandlerTest
>     [junit]  xml response was: <?xml version="1.0" encoding="UTF-8"?>
>     [junit] <response>
>     [junit] <lst name="responseHeader"><int name="status">0</int><int name="QTime">5</int></lst>
> <result name="response" numFound="0" start="0"/>
>     [junit] </response>
>     [junit]
>     [junit]  request was: start=0&q=title:Welcome&qt=standard&rows=20&version=2.2)
>     [junit] Tests run: 8, Failures: 1, Errors: 0, Time elapsed: 3.968 sec
>     [junit] Test org.apache.solr.handler.ExtractingRequestHandlerTest FAILED
> BUILD FAILED
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org