You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by ni...@apache.org on 2011/04/21 17:59:42 UTC

svn commit: r1095760 - in /tika/trunk/tika-core/src/main/java/org/apache/tika: io/TaggedInputStream.java io/TikaInputStream.java parser/CompositeParser.java parser/NetworkParser.java

Author: nick
Date: Thu Apr 21 15:59:42 2011
New Revision: 1095760

URL: http://svn.apache.org/viewvc?rev=1095760&view=rev
Log:
TIKA-643 - Change TagginedInputStream to work like TikaInputStream for creation, with a static get, to avoid double wrapping. Also adds toString methods on the two

Modified:
    tika/trunk/tika-core/src/main/java/org/apache/tika/io/TaggedInputStream.java
    tika/trunk/tika-core/src/main/java/org/apache/tika/io/TikaInputStream.java
    tika/trunk/tika-core/src/main/java/org/apache/tika/parser/CompositeParser.java
    tika/trunk/tika-core/src/main/java/org/apache/tika/parser/NetworkParser.java

Modified: tika/trunk/tika-core/src/main/java/org/apache/tika/io/TaggedInputStream.java
URL: http://svn.apache.org/viewvc/tika/trunk/tika-core/src/main/java/org/apache/tika/io/TaggedInputStream.java?rev=1095760&r1=1095759&r2=1095760&view=diff
==============================================================================
--- tika/trunk/tika-core/src/main/java/org/apache/tika/io/TaggedInputStream.java (original)
+++ tika/trunk/tika-core/src/main/java/org/apache/tika/io/TaggedInputStream.java Thu Apr 21 15:59:42 2011
@@ -63,9 +63,22 @@ public class TaggedInputStream extends P
      *
      * @param proxy input stream to be decorated
      */
-    public TaggedInputStream(InputStream proxy) {
+    private TaggedInputStream(InputStream proxy) {
         super(proxy);
     }
+    
+    /**
+     * Casts or wraps the given stream to a TaggedInputStream instance.
+     *
+     * @param stream normal input stream
+     * @return a TaggedInputStream instance
+     */
+    public static TaggedInputStream get(InputStream proxy) {
+       if(proxy instanceof TaggedInputStream) {
+          return (TaggedInputStream)proxy;
+       }
+       return new TaggedInputStream(proxy);
+    }
 
     /**
      * Tests if the given exception was caused by this stream.
@@ -113,4 +126,7 @@ public class TaggedInputStream extends P
         throw new TaggedIOException(e, this);
     }
 
+    public String toString() {
+        return "Tika Tagged InputStream wrapping " + in.toString(); 
+    }
 }

Modified: tika/trunk/tika-core/src/main/java/org/apache/tika/io/TikaInputStream.java
URL: http://svn.apache.org/viewvc/tika/trunk/tika-core/src/main/java/org/apache/tika/io/TikaInputStream.java?rev=1095760&r1=1095759&r2=1095760&view=diff
==============================================================================
--- tika/trunk/tika-core/src/main/java/org/apache/tika/io/TikaInputStream.java (original)
+++ tika/trunk/tika-core/src/main/java/org/apache/tika/io/TikaInputStream.java Thu Apr 21 15:59:42 2011
@@ -625,4 +625,13 @@ public class TikaInputStream extends Pro
         }
     }
 
+    public String toString() {
+       String str = "TikaInputStream of ";
+       if(hasFile()) {
+          str += file.toString();
+       } else {
+          str += in.toString();
+       }
+       return str;
+    }
 }

Modified: tika/trunk/tika-core/src/main/java/org/apache/tika/parser/CompositeParser.java
URL: http://svn.apache.org/viewvc/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/CompositeParser.java?rev=1095760&r1=1095759&r2=1095760&view=diff
==============================================================================
--- tika/trunk/tika-core/src/main/java/org/apache/tika/parser/CompositeParser.java (original)
+++ tika/trunk/tika-core/src/main/java/org/apache/tika/parser/CompositeParser.java Thu Apr 21 15:59:42 2011
@@ -210,7 +210,7 @@ public class CompositeParser extends Abs
             Metadata metadata, ParseContext context)
             throws IOException, SAXException, TikaException {
         Parser parser = getParser(metadata);
-        TaggedInputStream taggedStream = new TaggedInputStream(stream);
+        TaggedInputStream taggedStream = TaggedInputStream.get(stream);
         TaggedContentHandler taggedHandler = new TaggedContentHandler(handler);
         try {
             parser.parse(taggedStream, taggedHandler, metadata, context);

Modified: tika/trunk/tika-core/src/main/java/org/apache/tika/parser/NetworkParser.java
URL: http://svn.apache.org/viewvc/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/NetworkParser.java?rev=1095760&r1=1095759&r2=1095760&view=diff
==============================================================================
--- tika/trunk/tika-core/src/main/java/org/apache/tika/parser/NetworkParser.java (original)
+++ tika/trunk/tika-core/src/main/java/org/apache/tika/parser/NetworkParser.java Thu Apr 21 15:59:42 2011
@@ -102,7 +102,7 @@ public class NetworkParser extends Abstr
         private volatile Exception exception = null;
 
         public ParsingTask(InputStream input, OutputStream output) {
-            this.input = new TaggedInputStream(input);
+            this.input = TaggedInputStream.get(input);
             this.output = output;
         }
 



Re: svn commit: r1095760 - in /tika/trunk/tika-core/src/main/java/org/apache/tika: io/TaggedInputStream.java io/TikaInputStream.java parser/CompositeParser.java parser/NetworkParser.java

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Tue, May 17, 2011 at 9:30 PM, Nick Burch <ni...@alfresco.com> wrote:
> On Tue, 17 May 2011, Jukka Zitting wrote:
>> Not sure if that's worth the trouble, as the overhead of TaggedInputStream
>> is insignificant compared to any IO operations, unlike in TikaInputStream
>> where the overhead can be huge for example when a temporary file gets
>> created.
>
> It was more that the double wrapping was making TIKA-645 worse, as there was
> another input stream in the way!

Oh yes, good point. I'm now working on getting rid of the extra wrapping layers.

BR,

Jukka Zitting

Re: svn commit: r1095760 - in /tika/trunk/tika-core/src/main/java/org/apache/tika: io/TaggedInputStream.java io/TikaInputStream.java parser/CompositeParser.java parser/NetworkParser.java

Posted by Nick Burch <ni...@alfresco.com>.
On Tue, 17 May 2011, Jukka Zitting wrote:
> Not sure if that's worth the trouble, as the overhead of 
> TaggedInputStream is insignificant compared to any IO operations, unlike 
> in TikaInputStream where the overhead can be huge for example when a 
> temporary file gets created.

It was more that the double wrapping was making TIKA-645 worse, as there 
was another input stream in the way!

Nick

Re: svn commit: r1095760 - in /tika/trunk/tika-core/src/main/java/org/apache/tika: io/TaggedInputStream.java io/TikaInputStream.java parser/CompositeParser.java parser/NetworkParser.java

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Thu, Apr 21, 2011 at 5:59 PM,  <ni...@apache.org> wrote:
> TIKA-643 - Change TagginedInputStream to work like TikaInputStream for creation,
> with a static get, to avoid double wrapping.

Not sure if that's worth the trouble, as the overhead of
TaggedInputStream is insignificant compared to any IO operations,
unlike in TikaInputStream where the overhead can be huge for example
when a temporary file gets created.

No need to change this back as there's little downside (except no
subclassing and the inherent ugliness of instanceof :-) to the current
approach, just bringing this up as I was going through some of the
recent changes and started wondering about the rationale here.

BR,

Jukka Zitting