You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Rida Benjelloun <ri...@doculibre.com> on 2007/10/10 18:28:36 UTC

Tika Xml Outputter

Hi,
Do you think that we should have a XmlOutputter that save the extracted
content and metadata in XML file ? This will simplify integration with other
technologies like Solr for example.
The XmlOutputter will process File (File or Directory recursively) and Url.
Will use XSLT as a filter to masque or display the elements needed and an
output encoding :
Example
TikaXmlOutputter txo = new TikaXmlOutputter()
txo.output(File|URL input, File xmlOutput, File xsltFilter, String
encoding);

Regards.

Re: Tika Xml Outputter

Posted by Rida Benjelloun <ri...@doculibre.com>.
Hi Chris,
Thanks for this information, I will take a look on it and I will communicate
with you.
Regards.

2007/10/10, Chris Mattmann <ch...@jpl.nasa.gov>:
>
> Hi Rida,
>
> I agree totally! You should take a look at the MarkupLanguageProposal
> (within Nutch http://wiki.apache.org/nutch/MarkupLanguageParserProposal)
> and
> the work done in Frutch
> (http://www.krugle.com/kse/files?query=frutch%20parse%20out) on the
> ParseXml
> plugin.
>
> I'd love to chat with you more about this. Let me know what you think.
>
> Thanks,
> Chris
>
>
>
> On 10/10/07 9:28 AM, "Rida Benjelloun" <ri...@doculibre.com>
> wrote:
>
> > Hi,
> > Do you think that we should have a XmlOutputter that save the extracted
> > content and metadata in XML file ? This will simplify integration with
> other
> > technologies like Solr for example.
> > The XmlOutputter will process File (File or Directory recursively) and
> Url.
> > Will use XSLT as a filter to masque or display the elements needed and
> an
> > output encoding :
> > Example
> > TikaXmlOutputter txo = new TikaXmlOutputter()
> > txo.output(File|URL input, File xmlOutput, File xsltFilter, String
> > encoding);
> >
> > Regards.
>
> ______________________________________________
> Chris Mattmann, Ph.D.
> Chris.Mattmann@jpl.nasa.gov
> Cognizant Development Engineer
> Early Detection Research Network Project
>
> _________________________________________________
> Jet Propulsion Laboratory            Pasadena, CA
> Office: 171-266B                     Mailstop:  171-246
> _______________________________________________________
>
> Disclaimer:  The opinions presented within are my own and do not reflect
> those of either NASA, JPL, or the California Institute of Technology.
>
>
>

Re: Tika Xml Outputter

Posted by Chris Mattmann <ch...@jpl.nasa.gov>.
Hi Rida,

 I agree totally! You should take a look at the MarkupLanguageProposal
(within Nutch http://wiki.apache.org/nutch/MarkupLanguageParserProposal) and
the work done in Frutch
(http://www.krugle.com/kse/files?query=frutch%20parse%20out) on the ParseXml
plugin.

 I'd love to chat with you more about this. Let me know what you think.

Thanks,
 Chris



On 10/10/07 9:28 AM, "Rida Benjelloun" <ri...@doculibre.com>
wrote:

> Hi,
> Do you think that we should have a XmlOutputter that save the extracted
> content and metadata in XML file ? This will simplify integration with other
> technologies like Solr for example.
> The XmlOutputter will process File (File or Directory recursively) and Url.
> Will use XSLT as a filter to masque or display the elements needed and an
> output encoding :
> Example
> TikaXmlOutputter txo = new TikaXmlOutputter()
> txo.output(File|URL input, File xmlOutput, File xsltFilter, String
> encoding);
> 
> Regards.

______________________________________________
Chris Mattmann, Ph.D.
Chris.Mattmann@jpl.nasa.gov
Cognizant Development Engineer
Early Detection Research Network Project

_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                     Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.