You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tika.apache.org by mastcheshmi <me...@gmail.com> on 2009/10/17 14:43:26 UTC

MarkUnsupportedException

hi all.
I use Tika
for all document this exception occured.
org/apache/poi/hpsf/MarkUnsupportedException

my code is:
package tikatest;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import org.apache.tika.metadata.*;
import org.apache.tika.sax.*;
import org.xml.sax.ContentHandler;
import org.apache.tika.parser.microsoft.*;

/**
 *
 * @author mehran
 */
public class Main {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) throws Exception {
      // File f = new File( "/home/mehran/Documents/saat.ods" );
       InputStream in = Main.class.getResourceAsStream("resource/test.xls");
      // InputStream in = new FileInputStream(f);
        //f.createNewFile();
       try{
       
        Metadata metadata = new Metadata();
        ContentHandler handler = new BodyContentHandler();
        new OfficeParser().parse(in, handler, metadata);
        
        System.out.println(handler.toString());
       }
       finally{
         in.close();
       }
    }

}

please help me.
-- 
View this message in context: http://www.nabble.com/MarkUnsupportedException-tp25937979p25937979.html
Sent from the Apache Tika - Development mailing list archive at Nabble.com.

Re: MarkUnsupportedException

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Sat, Oct 17, 2009 at 2:43 PM, mastcheshmi
<me...@gmail.com> wrote:
> I use Tika
> for all document this exception occured.
> org/apache/poi/hpsf/MarkUnsupportedException

That seems like a bug. I guess POI expects the given document stream
to support the mark feature, so Tika should explicitly wrap the stream
into a java.io.BufferedInputStream if it does not already support
marks.

Can you please file a bug report [1] about this? Meanwhile, as a
workaround you can wrap the input stream yourself:

    new OfficeParser().parse(new BufferedInputStream(in), handler, metadata);

[1] https://issues.apache.org/jira/browse/TIKA

BR,

Jukka Zitting