You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by mi...@TaosNet.com on 2011/05/18 21:46:09 UTC

parse methods with tika

I am looking at tika to parse a large excel file. I have code as below.
Drawing an equivalent to Sax XML parsing, I'm looking for methods the are
called by events and where to override those methods. E.g. in XML parsing
I would have:
@Override
public void startElement(String uri, String localName, String rawName,
Attributes attributes)	throws SAXException {
  .........................
}

but I am not finding the equivalent for tika.

final AutoDetectParser myLibraryParser = new AutoDetectParser();
ContentHandler contentHandler = new BodyContentHandler();
Metadata metadata = new Metadata();
ParseContext parseContext = new ParseContext();
try {
	myLibraryParser.parse(inputStream, contentHandler, metadata, parseContext);
}
catch (IOException e) {
	e.printStackTrace();
}
catch (SAXException e) {
	e.printStackTrace();
}
catch (TikaException e) {
	e.printStackTrace();
}

Can someone point me to an example?


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Re: parse methods with tika

Posted by Nick Burch <ni...@alfresco.com>.
On Wed, 18 May 2011, mickeydog@TaosNet.com wrote:
> I am looking at tika to parse a large excel file. I have code as below.
> Drawing an equivalent to Sax XML parsing, I'm looking for methods the are
> called by events and where to override those methods. E.g. in XML parsing
> I would have:
> @Override
> public void startElement(String uri, String localName, String rawName,
> Attributes attributes)	throws SAXException {
>  .........................
> }

This is all on the ContentHandler. If you want to customise this, pass in 
your own class, rather than BodyContentHandler as you do now

> final AutoDetectParser myLibraryParser = new AutoDetectParser();
> ContentHandler contentHandler = new BodyContentHandler();
> Metadata metadata = new Metadata();
> ParseContext parseContext = new ParseContext();
> myLibraryParser.parse(inputStream, contentHandler, metadata, parseContext);

Change this to have your custom content handler and you should be set

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org