You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by 肖金伟 <ji...@gmail.com> on 2013/04/09 07:48:47 UTC
Re: Tika 1.3 mistakes eml file as text by AutoDetectParser
2013/4/9 肖金伟 <ji...@gmail.com>
> Hello all,
>
> I am using Tika 1.3 JAVA API to extract text from a eml, I use the
> following code like this:
>
> String fileName = "7.eml";
> Parser parser = new AutoDetectParser();
> ContentHandler body = new BodyContentHandler();
> Metadata metadata = new Metadata();
>
> metadata.set(Metadata.RESOURCE_NAME_KEY,fileName);
>
> ParseContext context = new ParseContext();
> context.set(Parser.class, parser);
> InputStream stream = new FileInputStream(fileName);
>
> try
> {
> parser.parse(stream, body, metadata, context);
> System.out.println(body.toString());
> }
> catch (Exception e)
> {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }
>
> And I get the text which seems to be produced by TXTParser, I couldn't
> figure out the reason for this.
>
> In order to recreate this problem,the test eml file will be enclosed.
>
> Thanks.
> --
> *姓名* : Tinyxiao * Email* : jinwei.xiao@gmail.com
>
> **
>
>
>
>
--
*姓名* : Tinyxiao * Email* : jinwei.xiao@gmail.com