You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by 肖金伟 <ji...@gmail.com> on 2013/04/09 07:48:47 UTC

Re: Tika 1.3 mistakes eml file as text by AutoDetectParser

2013/4/9 肖金伟 <ji...@gmail.com>

> Hello all,
>
> I am using Tika 1.3 JAVA API to extract text from a eml, I use the
> following code like this:
>
>         String fileName = "7.eml";
>         Parser parser = new AutoDetectParser();
>         ContentHandler body = new BodyContentHandler();
>         Metadata metadata = new Metadata();
>
>         metadata.set(Metadata.RESOURCE_NAME_KEY,fileName);
>
>         ParseContext context = new ParseContext();
>         context.set(Parser.class, parser);
>         InputStream stream = new FileInputStream(fileName);
>
>         try
>         {
>             parser.parse(stream, body, metadata, context);
>             System.out.println(body.toString());
>         }
>         catch (Exception e)
>         {
>             // TODO Auto-generated catch block
>             e.printStackTrace();
>         }
>
> And I get the text which seems to be produced by TXTParser, I couldn't
> figure out   the reason for this.
>
> In order to recreate this problem,the test eml file will be enclosed.
>
> Thanks.
> --
> *姓名* : Tinyxiao       * Email* : jinwei.xiao@gmail.com
>
> **
>
>
>
>



-- 
*姓名* : Tinyxiao       * Email* : jinwei.xiao@gmail.com