You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "George Kappel (Created) (JIRA)" <ji...@apache.org> on 2011/12/28 23:51:30 UTC
[jira] [Created] (TIKA-834) server problem only 1st (-m -j) result
is correct additional runs include data from previous runs
server problem only 1st (-m -j) result is correct additional runs include data from previous runs
-------------------------------------------------------------------------------------------------
Key: TIKA-834
URL: https://issues.apache.org/jira/browse/TIKA-834
Project: Tika
Issue Type: Bug
Components: cli
Affects Versions: 1.0
Environment: java version "1.6.0_23"
OpenJDK Runtime Environment (IcedTea6 1.11pre) (6b23~pre11-0ubuntu1.11.10)
OpenJDK Server VM (build 20.0-b11, mixed mode)
Reporter: George Kappel
Problem seem to be -j json specific if that is left off plain test meta data appears to work
# Running server to get meta data
java -jar tika-app-1.0.jar -m -j --server 9000
# send pdf document
nc localhost 9000 < test.pdf
# get good result
{"Author":"unknown",
"Content-Type":"application/pdf",
"Creation-Date":"2011-12-27T18:21:59Z",
"Last-Modified":"2011-12-27T18:21:59Z",
"created":"Tue Dec 27 12:21:59 CST 2011",
"creator":"PScript5.dll Version 5.2.2",
"producer":"GPL Ghostscript 9.04",
"title":"Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com",
"xmpTPg:NPages":3 }
# send pdf document again
nc localhost 9000 < test.pdf
# get bad result with extra values from last run
{ "Author":["unknown", "unknown"],
"Content-Type":"application/pdf",
"Creation-Date":"2011-12-27T18:21:59Z",
"Last-Modified":"2011-12-27T18:21:59Z",
"created":"Tue Dec 27 12:21:59 CST 2011",
"creator":["PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2"],
"producer":["GPL Ghostscript 9.04", "GPL Ghostscript 9.04"],
"title":["Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com"],
"xmpTPg:NPages":3 }
# send another pdf document
nc localhost 9000 < ctypes.pdf
# get bad result with extra values from last 2 runs
{ "Author":["unknown", "unknown", "unknown"],
"Content-Type":"application/pdf",
"Creation-Date":"2011-12-27T18:03:18Z",
"Last-Modified":"2011-12-27T18:03:18Z",
"created":"Tue Dec 27 12:03:18 CST 2011",
"creator":["PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2"],
"producer":["GPL Ghostscript 9.04", "GPL Ghostscript 9.04", "GPL Ghostscript 9.04"],
"title":["Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "(15.17. ctypes \\227 A foreign function library for Python \\227 Python v2.7.2 documentation)"],
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TIKA-834) server problem only 1st result is
correct additional runs include data from 1st run
Posted by "George Kappel (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
George Kappel updated TIKA-834:
-------------------------------
Description:
-j json shows following behavior but plain text -m is also a problem and just keeps returning the 1st result even with subsequent different pdf files
# Running server to get meta data
java -jar tika-app-1.0.jar -m -j --server 9000
# send pdf document
nc localhost 9000 < test.pdf
# get good result
{"Author":"unknown",
"Content-Type":"application/pdf",
"Creation-Date":"2011-12-27T18:21:59Z",
"Last-Modified":"2011-12-27T18:21:59Z",
"created":"Tue Dec 27 12:21:59 CST 2011",
"creator":"PScript5.dll Version 5.2.2",
"producer":"GPL Ghostscript 9.04",
"title":"Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com",
"xmpTPg:NPages":3 }
# send pdf document again
nc localhost 9000 < test.pdf
# get bad result with extra values from last run
{ "Author":["unknown", "unknown"],
"Content-Type":"application/pdf",
"Creation-Date":"2011-12-27T18:21:59Z",
"Last-Modified":"2011-12-27T18:21:59Z",
"created":"Tue Dec 27 12:21:59 CST 2011",
"creator":["PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2"],
"producer":["GPL Ghostscript 9.04", "GPL Ghostscript 9.04"],
"title":["Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com"],
"xmpTPg:NPages":3 }
# send another pdf document
nc localhost 9000 < ctypes.pdf
# get bad result with extra values from last 2 runs
{ "Author":["unknown", "unknown", "unknown"],
"Content-Type":"application/pdf",
"Creation-Date":"2011-12-27T18:03:18Z",
"Last-Modified":"2011-12-27T18:03:18Z",
"created":"Tue Dec 27 12:03:18 CST 2011",
"creator":["PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2"],
"producer":["GPL Ghostscript 9.04", "GPL Ghostscript 9.04", "GPL Ghostscript 9.04"],
"title":["Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "(15.17. ctypes \\227 A foreign function library for Python \\227 Python v2.7.2 documentation)"],
was:
Problem seem to be -j json specific if that is left off plain test meta data appears to work
# Running server to get meta data
java -jar tika-app-1.0.jar -m -j --server 9000
# send pdf document
nc localhost 9000 < test.pdf
# get good result
{"Author":"unknown",
"Content-Type":"application/pdf",
"Creation-Date":"2011-12-27T18:21:59Z",
"Last-Modified":"2011-12-27T18:21:59Z",
"created":"Tue Dec 27 12:21:59 CST 2011",
"creator":"PScript5.dll Version 5.2.2",
"producer":"GPL Ghostscript 9.04",
"title":"Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com",
"xmpTPg:NPages":3 }
# send pdf document again
nc localhost 9000 < test.pdf
# get bad result with extra values from last run
{ "Author":["unknown", "unknown"],
"Content-Type":"application/pdf",
"Creation-Date":"2011-12-27T18:21:59Z",
"Last-Modified":"2011-12-27T18:21:59Z",
"created":"Tue Dec 27 12:21:59 CST 2011",
"creator":["PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2"],
"producer":["GPL Ghostscript 9.04", "GPL Ghostscript 9.04"],
"title":["Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com"],
"xmpTPg:NPages":3 }
# send another pdf document
nc localhost 9000 < ctypes.pdf
# get bad result with extra values from last 2 runs
{ "Author":["unknown", "unknown", "unknown"],
"Content-Type":"application/pdf",
"Creation-Date":"2011-12-27T18:03:18Z",
"Last-Modified":"2011-12-27T18:03:18Z",
"created":"Tue Dec 27 12:03:18 CST 2011",
"creator":["PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2"],
"producer":["GPL Ghostscript 9.04", "GPL Ghostscript 9.04", "GPL Ghostscript 9.04"],
"title":["Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "(15.17. ctypes \\227 A foreign function library for Python \\227 Python v2.7.2 documentation)"],
Summary: server problem only 1st result is correct additional runs include data from 1st run (was: server problem only 1st (-m -j) result is correct additional runs include data from previous runs)
> server problem only 1st result is correct additional runs include data from 1st run
> -----------------------------------------------------------------------------------
>
> Key: TIKA-834
> URL: https://issues.apache.org/jira/browse/TIKA-834
> Project: Tika
> Issue Type: Bug
> Components: cli
> Affects Versions: 1.0
> Environment: java version "1.6.0_23"
> OpenJDK Runtime Environment (IcedTea6 1.11pre) (6b23~pre11-0ubuntu1.11.10)
> OpenJDK Server VM (build 20.0-b11, mixed mode)
> Reporter: George Kappel
>
> -j json shows following behavior but plain text -m is also a problem and just keeps returning the 1st result even with subsequent different pdf files
> # Running server to get meta data
> java -jar tika-app-1.0.jar -m -j --server 9000
> # send pdf document
> nc localhost 9000 < test.pdf
> # get good result
> {"Author":"unknown",
> "Content-Type":"application/pdf",
> "Creation-Date":"2011-12-27T18:21:59Z",
> "Last-Modified":"2011-12-27T18:21:59Z",
> "created":"Tue Dec 27 12:21:59 CST 2011",
> "creator":"PScript5.dll Version 5.2.2",
> "producer":"GPL Ghostscript 9.04",
> "title":"Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com",
> "xmpTPg:NPages":3 }
> # send pdf document again
> nc localhost 9000 < test.pdf
> # get bad result with extra values from last run
> { "Author":["unknown", "unknown"],
> "Content-Type":"application/pdf",
> "Creation-Date":"2011-12-27T18:21:59Z",
> "Last-Modified":"2011-12-27T18:21:59Z",
> "created":"Tue Dec 27 12:21:59 CST 2011",
> "creator":["PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2"],
> "producer":["GPL Ghostscript 9.04", "GPL Ghostscript 9.04"],
> "title":["Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com"],
> "xmpTPg:NPages":3 }
> # send another pdf document
> nc localhost 9000 < ctypes.pdf
> # get bad result with extra values from last 2 runs
> { "Author":["unknown", "unknown", "unknown"],
> "Content-Type":"application/pdf",
> "Creation-Date":"2011-12-27T18:03:18Z",
> "Last-Modified":"2011-12-27T18:03:18Z",
> "created":"Tue Dec 27 12:03:18 CST 2011",
> "creator":["PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2"],
> "producer":["GPL Ghostscript 9.04", "GPL Ghostscript 9.04", "GPL Ghostscript 9.04"],
> "title":["Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "(15.17. ctypes \\227 A foreign function library for Python \\227 Python v2.7.2 documentation)"],
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (TIKA-834) server problem only 1st result is
correct additional runs include data from 1st run
Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jukka Zitting resolved TIKA-834.
--------------------------------
Resolution: Fixed
Fix Version/s: 1.2
Assignee: Jukka Zitting
This got fixed as a side-effect of TIKA-934.
> server problem only 1st result is correct additional runs include data from 1st run
> -----------------------------------------------------------------------------------
>
> Key: TIKA-834
> URL: https://issues.apache.org/jira/browse/TIKA-834
> Project: Tika
> Issue Type: Bug
> Components: cli
> Affects Versions: 1.0
> Environment: java version "1.6.0_23"
> OpenJDK Runtime Environment (IcedTea6 1.11pre) (6b23~pre11-0ubuntu1.11.10)
> OpenJDK Server VM (build 20.0-b11, mixed mode)
> Reporter: George Kappel
> Assignee: Jukka Zitting
> Fix For: 1.2
>
>
> -j json shows following behavior but plain text -m is also a problem and just keeps returning the 1st result even with subsequent different pdf files
> # Running server to get meta data
> java -jar tika-app-1.0.jar -m -j --server 9000
> # send pdf document
> nc localhost 9000 < test.pdf
> # get good result
> {"Author":"unknown",
> "Content-Type":"application/pdf",
> "Creation-Date":"2011-12-27T18:21:59Z",
> "Last-Modified":"2011-12-27T18:21:59Z",
> "created":"Tue Dec 27 12:21:59 CST 2011",
> "creator":"PScript5.dll Version 5.2.2",
> "producer":"GPL Ghostscript 9.04",
> "title":"Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com",
> "xmpTPg:NPages":3 }
> # send pdf document again
> nc localhost 9000 < test.pdf
> # get bad result with extra values from last run
> { "Author":["unknown", "unknown"],
> "Content-Type":"application/pdf",
> "Creation-Date":"2011-12-27T18:21:59Z",
> "Last-Modified":"2011-12-27T18:21:59Z",
> "created":"Tue Dec 27 12:21:59 CST 2011",
> "creator":["PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2"],
> "producer":["GPL Ghostscript 9.04", "GPL Ghostscript 9.04"],
> "title":["Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com"],
> "xmpTPg:NPages":3 }
> # send another pdf document
> nc localhost 9000 < ctypes.pdf
> # get bad result with extra values from last 2 runs
> { "Author":["unknown", "unknown", "unknown"],
> "Content-Type":"application/pdf",
> "Creation-Date":"2011-12-27T18:03:18Z",
> "Last-Modified":"2011-12-27T18:03:18Z",
> "created":"Tue Dec 27 12:03:18 CST 2011",
> "creator":["PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2", "PScript5.dll Version 5.2.2"],
> "producer":["GPL Ghostscript 9.04", "GPL Ghostscript 9.04", "GPL Ghostscript 9.04"],
> "title":["Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "Aaron Rodgers the clear-cut MVP after dismantling Chicago Bears - Peter King - SI.com", "(15.17. ctypes \\227 A foreign function library for Python \\227 Python v2.7.2 documentation)"],
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira