You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by kbennett <kb...@bbsinc.biz> on 2007/10/05 22:22:30 UTC

Which parsers support title properties?

All -

Can anyone tell me which parsers support extracting titles and which do not? 
Here is the list of parsers:

HTML
Excel
Powerpoint
Word
OpenOffice
PDF
RTF
TXT (obviously not)
XML

...and do they all fit with Content's text/xpath/regex select strings?

Wouldn't we need to modify this so that we have a strategy per parser,
rather than one per string lookup type (text, xpath, regex)?

- Keith

-- 
View this message in context: http://www.nabble.com/Which-parsers-support-title-properties--tf4577427.html#a13066766
Sent from the Apache Tika - Development mailing list archive at Nabble.com.


Re: Which parsers support title properties?

Posted by ri...@doculibre.com.
Hi Keith
All parsers support title extraction except RTF and TXT.
Regards

On 10/5/07, kbennett <kb...@bbsinc.biz> wrote:
>
> All -
>
> Can anyone tell me which parsers support extracting titles and which do not?
> Here is the list of parsers:
>
> HTML
> Excel
> Powerpoint
> Word
> OpenOffice
> PDF
> RTF
> TXT (obviously not)
> XML
>
> ...and do they all fit with Content's text/xpath/regex select strings?
>
> Wouldn't we need to modify this so that we have a strategy per parser,
> rather than one per string lookup type (text, xpath, regex)?
>
> - Keith
>
> --
> View this message in context:
> http://www.nabble.com/Which-parsers-support-title-properties--tf4577427.html#a13066766
> Sent from the Apache Tika - Development mailing list archive at Nabble.com.
>
>


-- 
---------------------------------------------------------
Rida Benjelloun
Doculibre inc.
ridabenjelloun@apache.org
rida.benjelloun@doculibre.com
Cel: 418-262-3222
Tel: 418-353-3390
Site Web : http://www.doculibre.com
---------------------------------------------------------