You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by ni...@apache.org on 2010/09/10 19:19:03 UTC
svn commit: r995880 -
/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java
Author: nick
Date: Fri Sep 10 17:19:03 2010
New Revision: 995880
URL: http://svn.apache.org/viewvc?rev=995880&view=rev
Log:
We don't need to wrap our stream in a BufferedInputStream for mark/reset to work if it is already one (identified in TIKA-509 work)
Modified:
tika/trunk/tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java
Modified: tika/trunk/tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java
URL: http://svn.apache.org/viewvc/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java?rev=995880&r1=995879&r2=995880&view=diff
==============================================================================
--- tika/trunk/tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java (original)
+++ tika/trunk/tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java Fri Sep 10 17:19:03 2010
@@ -24,6 +24,7 @@ import org.apache.tika.config.TikaConfig
import org.apache.tika.detect.Detector;
import org.apache.tika.exception.TikaException;
import org.apache.tika.io.CountingInputStream;
+import org.apache.tika.io.TikaInputStream;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.mime.MediaType;
import org.apache.tika.sax.SecureContentHandler;
@@ -94,10 +95,14 @@ public class AutoDetectParser extends Co
InputStream stream, ContentHandler handler,
Metadata metadata, ParseContext context)
throws IOException, SAXException, TikaException {
- // We need (reliable!) mark support for type detection before parsing
- stream = new BufferedInputStream(stream);
+ if(stream instanceof TikaInputStream || stream instanceof BufferedInputStream) {
+ // Input stream can be trusted for type detection
+ } else {
+ // We need (reliable!) mark support for type detection before parsing
+ stream = new BufferedInputStream(stream);
+ }
- // Automatically detect the MIME type of the document
+ // Automatically detect the MIME type of the document
MediaType type = detector.detect(stream, metadata);
metadata.set(Metadata.CONTENT_TYPE, type.toString());
Re: svn commit: r995880 -
/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/AutoDetectParse
r.java
Posted by Nick Burch <ni...@alfresco.com>.
On Fri, 10 Sep 2010, Jukka Zitting wrote:
> Nice, good point! Even better, I'd simplify this to:
>
> // Ensure reliable mark support for type detection before parsing
> stream = TikaInputStream.get(stream);
That would mean that BufferedInputStreams will end up double wrapped
though? I'm tempted to say that for the non buffered input stream case, we
wrap in a TikaInputStream instead of the current BufferedInputStream
though.
Nick
Re: svn commit: r995880 - /tika/trunk/tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java
Posted by Jukka Zitting <ju...@gmail.com>.
Hi,
On Fri, Sep 10, 2010 at 7:19 PM, <ni...@apache.org> wrote:
> - // We need (reliable!) mark support for type detection before parsing
> - stream = new BufferedInputStream(stream);
> + if(stream instanceof TikaInputStream || stream instanceof BufferedInputStream) {
> + // Input stream can be trusted for type detection
> + } else {
> + // We need (reliable!) mark support for type detection before parsing
> + stream = new BufferedInputStream(stream);
> + }
Nice, good point! Even better, I'd simplify this to:
// Ensure reliable mark support for type detection before parsing
stream = TikaInputStream.get(stream);
BR,
Jukka Zitting