You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "George Kappel (Updated) (JIRA)" <ji...@apache.org> on 2011/12/29 04:26:32 UTC

[jira] [Updated] (TIKA-834) server problem only 1st result is correct additional runs include data from 1st run

     [ https://issues.apache.org/jira/browse/TIKA-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

George Kappel updated TIKA-834:
-------------------------------

    Description: 
-j json shows following behavior but plain text -m is also a problem and just keeps returning the 1st result even with subsequent different pdf files

# Running server to get meta data
java -jar tika-app-1.0.jar -m -j --server 9000
# send pdf document
nc localhost 9000 < test.pdf
# get good result
{"Author":"unknown", 
"Content-Type":"application/pdf", 
"Creation-Date":"2011-12-27T18:21:59Z", 
"Last-Modified":"2011-12-27T18:21:59Z", 
"created":"Tue Dec 27 12:21:59 CST 2011", 
"creator":"PScript5.dll Version 5.2.2", 
"producer":"GPL Ghostscript 9.04", 
"title":"Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", 
"xmpTPg:NPages":3 }
# send pdf document again
nc localhost 9000 < test.pdf
# get bad result with extra values from last run
{ "Author":["unknown", "unknown"], 
"Content-Type":"application/pdf", 
"Creation-Date":"2011-12-27T18:21:59Z", 
"Last-Modified":"2011-12-27T18:21:59Z", 
"created":"Tue Dec 27 12:21:59 CST 2011", 
"creator":["PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2"], 
"producer":["GPL Ghostscript 9.04", "GPL Ghostscript 9.04"], 
"title":["Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com"], 
"xmpTPg:NPages":3 }
# send another pdf document
nc localhost 9000 < ctypes.pdf
# get bad result with extra values from last 2 runs
{ "Author":["unknown", "unknown", "unknown"], 
"Content-Type":"application/pdf", 
"Creation-Date":"2011-12-27T18:03:18Z", 
"Last-Modified":"2011-12-27T18:03:18Z", 
"created":"Tue Dec 27 12:03:18 CST 2011", 
"creator":["PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2"], 
"producer":["GPL Ghostscript 9.04", "GPL Ghostscript 9.04", "GPL Ghostscript 9.04"], 
"title":["Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "(15.17. ctypes \\227 A foreign function library for Python \\227 Python v2.7.2 documentation)"], 



  was:
Problem seem to be -j json specific if that is left off plain test meta data appears to work

# Running server to get meta data
java -jar tika-app-1.0.jar -m -j --server 9000

# send pdf document
nc localhost 9000 < test.pdf

# get good result
{"Author":"unknown", 
"Content-Type":"application/pdf", 
"Creation-Date":"2011-12-27T18:21:59Z", 
"Last-Modified":"2011-12-27T18:21:59Z", 
"created":"Tue Dec 27 12:21:59 CST 2011", 
"creator":"PScript5.dll Version 5.2.2", 
"producer":"GPL Ghostscript 9.04", 
"title":"Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", 
"xmpTPg:NPages":3 }

# send pdf document again
nc localhost 9000 < test.pdf

# get bad result with extra values from last run
{ "Author":["unknown", "unknown"], 
"Content-Type":"application/pdf", 
"Creation-Date":"2011-12-27T18:21:59Z", 
"Last-Modified":"2011-12-27T18:21:59Z", 
"created":"Tue Dec 27 12:21:59 CST 2011", 
"creator":["PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2"], 
"producer":["GPL Ghostscript 9.04", "GPL Ghostscript 9.04"], 
"title":["Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com"], 
"xmpTPg:NPages":3 }

# send another pdf document

nc localhost 9000 < ctypes.pdf

# get bad result with extra values from last 2 runs
{ "Author":["unknown", "unknown", "unknown"], 
"Content-Type":"application/pdf", 
"Creation-Date":"2011-12-27T18:03:18Z", 
"Last-Modified":"2011-12-27T18:03:18Z", 
"created":"Tue Dec 27 12:03:18 CST 2011", 
"creator":["PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2"], 
"producer":["GPL Ghostscript 9.04", "GPL Ghostscript 9.04", "GPL Ghostscript 9.04"], 
"title":["Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "(15.17. ctypes \\227 A foreign function library for Python \\227 Python v2.7.2 documentation)"], 



        Summary: server problem only 1st result is correct additional runs include data from 1st run  (was: server problem only 1st (-m -j) result is correct additional runs include data from previous runs)
    
> server problem only 1st result is correct additional runs include data from 1st run
> -----------------------------------------------------------------------------------
>
>                 Key: TIKA-834
>                 URL: https://issues.apache.org/jira/browse/TIKA-834
>             Project: Tika
>          Issue Type: Bug
>          Components: cli
>    Affects Versions: 1.0
>         Environment: java version "1.6.0_23"
> OpenJDK Runtime Environment (IcedTea6 1.11pre) (6b23~pre11-0ubuntu1.11.10)
> OpenJDK Server VM (build 20.0-b11, mixed mode)
>            Reporter: George Kappel
>
> -j json shows following behavior but plain text -m is also a problem and just keeps returning the 1st result even with subsequent different pdf files
> # Running server to get meta data
> java -jar tika-app-1.0.jar -m -j --server 9000
> # send pdf document
> nc localhost 9000 < test.pdf
> # get good result
> {"Author":"unknown", 
> "Content-Type":"application/pdf", 
> "Creation-Date":"2011-12-27T18:21:59Z", 
> "Last-Modified":"2011-12-27T18:21:59Z", 
> "created":"Tue Dec 27 12:21:59 CST 2011", 
> "creator":"PScript5.dll Version 5.2.2", 
> "producer":"GPL Ghostscript 9.04", 
> "title":"Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", 
> "xmpTPg:NPages":3 }
> # send pdf document again
> nc localhost 9000 < test.pdf
> # get bad result with extra values from last run
> { "Author":["unknown", "unknown"], 
> "Content-Type":"application/pdf", 
> "Creation-Date":"2011-12-27T18:21:59Z", 
> "Last-Modified":"2011-12-27T18:21:59Z", 
> "created":"Tue Dec 27 12:21:59 CST 2011", 
> "creator":["PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2"], 
> "producer":["GPL Ghostscript 9.04", "GPL Ghostscript 9.04"], 
> "title":["Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com"], 
> "xmpTPg:NPages":3 }
> # send another pdf document
> nc localhost 9000 < ctypes.pdf
> # get bad result with extra values from last 2 runs
> { "Author":["unknown", "unknown", "unknown"], 
> "Content-Type":"application/pdf", 
> "Creation-Date":"2011-12-27T18:03:18Z", 
> "Last-Modified":"2011-12-27T18:03:18Z", 
> "created":"Tue Dec 27 12:03:18 CST 2011", 
> "creator":["PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2"], 
> "producer":["GPL Ghostscript 9.04", "GPL Ghostscript 9.04", "GPL Ghostscript 9.04"], 
> "title":["Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "(15.17. ctypes \\227 A foreign function library for Python \\227 Python v2.7.2 documentation)"], 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira