You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Grant Ingersoll (JIRA)" <ji...@apache.org> on 2009/08/08 03:41:14 UTC

[jira] Resolved: (SOLR-1274) Provide multiple output formats in extract-only mode for tika handler

     [ https://issues.apache.org/jira/browse/SOLR-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll resolved SOLR-1274.
-----------------------------------

    Resolution: Fixed

I committed this patch, plus a test for it.

> Provide multiple output formats in extract-only mode for tika handler
> ---------------------------------------------------------------------
>
>                 Key: SOLR-1274
>                 URL: https://issues.apache.org/jira/browse/SOLR-1274
>             Project: Solr
>          Issue Type: New Feature
>    Affects Versions: 1.4
>            Reporter: Peter Wolanin
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-1274.patch, SOLR-1274.patch
>
>
> The proposed feature is to accept a URL parameter when using extract-only mode to specify an output format.  This parameter might just overload the existing "ext.extract.only" so that one can optionally specify a format, e.g. false|true|xml|text  where true and xml give the same response (i.e. xml remains the default)
> I had been assuming that I could choose among possible tika output
> formats when using the extracting request handler in extract-only mode
> as if from the CLI with the tika jar:
>    -x or --xml        Output XHTML content (default)
>    -h or --html       Output HTML content
>    -t or --text       Output plain text content
>    -m or --metadata   Output only metadata
> However, looking at the docs and source, it seems that only the xml
> option is available (hard-coded) in ExtractingDocumentLoader.java
> {code}
> serializer = new XMLSerializer(writer, new OutputFormat("XML", "UTF-8", true));
> {code}
> Providing at least a plain-text response seems to work if you change the serializer to a TextSerializer (org.apache.xml.serialize.TextSerializer).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.