You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Lance Norskog (JIRA)" <ji...@apache.org> on 2010/09/09 02:23:33 UTC
[jira] Updated: (SOLR-2116) TikaEntityProcessor does not find
parser by default
[ https://issues.apache.org/jira/browse/SOLR-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lance Norskog updated SOLR-2116:
--------------------------------
Attachment: pdflist.xml
pdflist-data-config.xml
> TikaEntityProcessor does not find parser by default
> ---------------------------------------------------
>
> Key: SOLR-2116
> URL: https://issues.apache.org/jira/browse/SOLR-2116
> Project: Solr
> Issue Type: Bug
> Components: contrib - DataImportHandler, contrib - Solr Cell (Tika extraction)
> Affects Versions: 3.1, 4.0
> Reporter: Lance Norskog
> Attachments: pdflist-data-config.xml, pdflist.xml
>
>
> The TikaEntityProcessor does not find the correct document parser by default.
> This is in a two-level DIH config file. I have attached pdflist-data-config.xml and pdflist.xml, the XML file list supplying. To test this, you will need the current 3.x branch or 4.0 trunk.
> # Set up a Tika-enabled Solr
> # copy any PDF file to /tmp/testfile.pdf
> # copy the pdflist-data-config.xml to your solr/conf
> # and add this snippet to your solrconfig.xml
> {code:xml}
> <requestHandler name="/pdflist"
> class="org.apache.solr.handler.dataimport.DataImportHandler">
> <lst name="defaults">
> <str name="config">pdflist-data-config.xml</str>
> </lst>
> </requestHandler>
> {code}
> [http://localhost:8983/solr/pdflist?command=full-import] will make one document with the id and text fields populated. If you remove this line:
> {code}
> parser="org.apache.tika.parser.pdf.PDFParser"
> {code}
> from the TikaEntityProcessor entity, the parser will not be found and you will get a document with the "id" field and nothing else.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org