You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by ni...@apache.org on 2010/09/10 19:19:03 UTC

svn commit: r995880 - /tika/trunk/tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java

Author: nick
Date: Fri Sep 10 17:19:03 2010
New Revision: 995880

URL: http://svn.apache.org/viewvc?rev=995880&view=rev
Log:
We don't need to wrap our stream in a BufferedInputStream for mark/reset to work if it is already one (identified in TIKA-509 work)

Modified:
    tika/trunk/tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java

Modified: tika/trunk/tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java
URL: http://svn.apache.org/viewvc/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java?rev=995880&r1=995879&r2=995880&view=diff
==============================================================================
--- tika/trunk/tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java (original)
+++ tika/trunk/tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java Fri Sep 10 17:19:03 2010
@@ -24,6 +24,7 @@ import org.apache.tika.config.TikaConfig
 import org.apache.tika.detect.Detector;
 import org.apache.tika.exception.TikaException;
 import org.apache.tika.io.CountingInputStream;
+import org.apache.tika.io.TikaInputStream;
 import org.apache.tika.metadata.Metadata;
 import org.apache.tika.mime.MediaType;
 import org.apache.tika.sax.SecureContentHandler;
@@ -94,10 +95,14 @@ public class AutoDetectParser extends Co
             InputStream stream, ContentHandler handler,
             Metadata metadata, ParseContext context)
             throws IOException, SAXException, TikaException {
-        // We need (reliable!) mark support for type detection before parsing
-        stream = new BufferedInputStream(stream);
+        if(stream instanceof TikaInputStream || stream instanceof BufferedInputStream) {
+           // Input stream can be trusted for type detection
+        } else {
+           // We need (reliable!) mark support for type detection before parsing
+           stream = new BufferedInputStream(stream);
+        }
 
-        // Automatically detect the MIME type of the document 
+        // Automatically detect the MIME type of the document
         MediaType type = detector.detect(stream, metadata);
         metadata.set(Metadata.CONTENT_TYPE, type.toString());
 



Re: svn commit: r995880 - /tika/trunk/tika-core/src/main/java/org/apache/tika/parser/AutoDetectParse r.java

Posted by Nick Burch <ni...@alfresco.com>.
On Fri, 10 Sep 2010, Jukka Zitting wrote:
> Nice, good point! Even better, I'd simplify this to:
>
>    // Ensure reliable mark support for type detection before parsing
>    stream = TikaInputStream.get(stream);

That would mean that BufferedInputStreams will end up double wrapped 
though? I'm tempted to say that for the non buffered input stream case, we 
wrap in a TikaInputStream instead of the current BufferedInputStream 
though.

Nick

Re: svn commit: r995880 - /tika/trunk/tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Fri, Sep 10, 2010 at 7:19 PM,  <ni...@apache.org> wrote:
> -        // We need (reliable!) mark support for type detection before parsing
> -        stream = new BufferedInputStream(stream);
> +        if(stream instanceof TikaInputStream || stream instanceof BufferedInputStream) {
> +           // Input stream can be trusted for type detection
> +        } else {
> +           // We need (reliable!) mark support for type detection before parsing
> +           stream = new BufferedInputStream(stream);
> +        }

Nice, good point! Even better, I'd simplify this to:

    // Ensure reliable mark support for type detection before parsing
    stream = TikaInputStream.get(stream);

BR,

Jukka Zitting