You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/07/20 02:09:05 UTC

[jira] [Commented] (TIKA-1690) Inconsistent (buggy) behavior when using tika-server

    [ https://issues.apache.org/jira/browse/TIKA-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632985#comment-14632985 ] 

Chris A. Mattmann commented on TIKA-1690:
-----------------------------------------

So I found something pretty annoying with [r1678515|http://svn.apache.org/viewvc?view=revision&revision=1678515]. The part of the code that adds in the temp file thing makes Tika Server 1.9 behave *extremely* buggy on Windows and renders [Tika-Python|http://github.com/chrismattmann/tika-python] pretty much useless without some workarounds and patches discussed in [#54|https://github.com/chrismattmann/tika-python/issues/54#issuecomment-122710752] and in [#44|https://github.com/chrismattmann/tika-python/issues/44]. I am thinking of reverting the part that uses the tmpFile thing. Thoughts, [~tallison@apache.org]?

> Inconsistent (buggy) behavior when using tika-server 
> -----------------------------------------------------
>
>                 Key: TIKA-1690
>                 URL: https://issues.apache.org/jira/browse/TIKA-1690
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Namrata Malarout
>            Assignee: Tim Allison
>
> I am using Tika trunk (1.10-SNAPSHOT) and posting documents there. An example would be the following:
> curl -T MOD09GA.A2014010.h30v12.005.2014012183944.vegetation_fraction.tif  http://localhost:9998/meta --header "Accept: application/json”
> …
> curl -T MOD09GA.A2014010.h30v12.005.2014012183944.vegetation_fraction.tif  http://localhost:9998/meta --header "Accept: application/rdf+xml”
> …
> curl -T MOD09GA.A2014010.h30v12.005.2014012183944.vegetation_fraction.tif  http://localhost:9998/meta --header "Accept: text/csv”
> I am using a python script to iterate through all the files in a folder. It works for about 50% to 80% of the files. For the rest it gives an error 500. When I post a file individually for which it previously failed (using the python script) it sometimes works. When done in an ad hoc manner, it works most of the time but fails sometimes. At times it is successful for application/rdf+xml format but fails for application/json format. The behavior is inconsistent.
> Here is an example trace of when it does not work as expected [0]
> A sample of the data being used can be found here [1]
> Any help would be appreciated. 
> [0] https://paste.apache.org/lbAm
> [1] https://drive.google.com/file/d/0B6wmo4_-H0P2eWJjdTdtYS1HRGs/view?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)