You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by mi...@TaosNet.com on 2011/05/18 21:46:09 UTC
parse methods with tika
I am looking at tika to parse a large excel file. I have code as below.
Drawing an equivalent to Sax XML parsing, I'm looking for methods the are
called by events and where to override those methods. E.g. in XML parsing
I would have:
@Override
public void startElement(String uri, String localName, String rawName,
Attributes attributes) throws SAXException {
.........................
}
but I am not finding the equivalent for tika.
final AutoDetectParser myLibraryParser = new AutoDetectParser();
ContentHandler contentHandler = new BodyContentHandler();
Metadata metadata = new Metadata();
ParseContext parseContext = new ParseContext();
try {
myLibraryParser.parse(inputStream, contentHandler, metadata, parseContext);
}
catch (IOException e) {
e.printStackTrace();
}
catch (SAXException e) {
e.printStackTrace();
}
catch (TikaException e) {
e.printStackTrace();
}
Can someone point me to an example?
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org
Re: parse methods with tika
Posted by Nick Burch <ni...@alfresco.com>.
On Wed, 18 May 2011, mickeydog@TaosNet.com wrote:
> I am looking at tika to parse a large excel file. I have code as below.
> Drawing an equivalent to Sax XML parsing, I'm looking for methods the are
> called by events and where to override those methods. E.g. in XML parsing
> I would have:
> @Override
> public void startElement(String uri, String localName, String rawName,
> Attributes attributes) throws SAXException {
> .........................
> }
This is all on the ContentHandler. If you want to customise this, pass in
your own class, rather than BodyContentHandler as you do now
> final AutoDetectParser myLibraryParser = new AutoDetectParser();
> ContentHandler contentHandler = new BodyContentHandler();
> Metadata metadata = new Metadata();
> ParseContext parseContext = new ParseContext();
> myLibraryParser.parse(inputStream, contentHandler, metadata, parseContext);
Change this to have your custom content handler and you should be set
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org