You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Mr.Keuz (JIRA)" <ji...@apache.org> on 2016/05/21 01:31:12 UTC
[jira] [Created] (CONNECTORS-1317) Hang parsing on some ZIP
document
Mr.Keuz created CONNECTORS-1317:
-----------------------------------
Summary: Hang parsing on some ZIP document
Key: CONNECTORS-1317
URL: https://issues.apache.org/jira/browse/CONNECTORS-1317
Project: ManifoldCF
Issue Type: Bug
Components: File system connector
Affects Versions: ManifoldCF 2.3
Environment:
Ubuntu 14.04 Linux 3.13.0-86-generic i686 i686
java version "1.8.0_31"
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
DB: Postgres 9.5.1
Reporter: Mr.Keuz
I use ManifolCF as file crawler. But I found, that crawling process hangs on some zip files. Although some files parsing normally.
Steps:
1. Run ManfoldCF by "example/start.sh" and Posgres as DB
2. Create manifold pipeline: File -> Tika -> Solr
3. Put zip file in folder (in attach below)
4. Run job
Here zip file that should reproduce bug:
"ManifoldCF_ISSUE_Dive.Into.Python.3.Mark.Pilgrim.2009.zip"
https://yadi.sk/d/0uSdrR5GrsgmG
Note:
As I investigated (by strace) - crawler process tries to open and parse same zip file again and again (it seems from different workers threads). And It seems that document not removes from queue.
I am newbie in ManifoldCF, so it is hard task to me to find problem in source code.
I can send some additional info if needed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)