You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Grant Ingersoll (JIRA)" <ji...@apache.org> on 2009/08/08 03:41:14 UTC
[jira] Resolved: (SOLR-1274) Provide multiple output formats in
extract-only mode for tika handler
[ https://issues.apache.org/jira/browse/SOLR-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Grant Ingersoll resolved SOLR-1274.
-----------------------------------
Resolution: Fixed
I committed this patch, plus a test for it.
> Provide multiple output formats in extract-only mode for tika handler
> ---------------------------------------------------------------------
>
> Key: SOLR-1274
> URL: https://issues.apache.org/jira/browse/SOLR-1274
> Project: Solr
> Issue Type: New Feature
> Affects Versions: 1.4
> Reporter: Peter Wolanin
> Assignee: Grant Ingersoll
> Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-1274.patch, SOLR-1274.patch
>
>
> The proposed feature is to accept a URL parameter when using extract-only mode to specify an output format. This parameter might just overload the existing "ext.extract.only" so that one can optionally specify a format, e.g. false|true|xml|text where true and xml give the same response (i.e. xml remains the default)
> I had been assuming that I could choose among possible tika output
> formats when using the extracting request handler in extract-only mode
> as if from the CLI with the tika jar:
> -x or --xml Output XHTML content (default)
> -h or --html Output HTML content
> -t or --text Output plain text content
> -m or --metadata Output only metadata
> However, looking at the docs and source, it seems that only the xml
> option is available (hard-coded) in ExtractingDocumentLoader.java
> {code}
> serializer = new XMLSerializer(writer, new OutputFormat("XML", "UTF-8", true));
> {code}
> Providing at least a plain-text response seems to work if you change the serializer to a TextSerializer (org.apache.xml.serialize.TextSerializer).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.