You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Hong-Thai Nguyen (JIRA)" <ji...@apache.org> on 2014/03/05 10:30:57 UTC
[jira] [Comment Edited] (TIKA-623) Add support for Outlook PST
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920692#comment-13920692 ]
Hong-Thai Nguyen edited comment on TIKA-623 at 3/5/14 9:30 AM:
---------------------------------------------------------------
java-libpst-0.7 has been uploaded to oss sonatype nexus: https://issues.sonatype.org/browse/OSSRH-8965
If there's no objection, I'll refactory attached parser and provide output as:
{code}
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="Content-Length" content="271360" />
<meta name="isValid" content="true" />
<meta name="Content-Type" content="application/vnd.ms-outlook" />
<title></title>
</head>
<body>
<div class="email-folder">
<h1>Début du fichier de données Outlook</h1>
<div class="email-entry">
<h1><530D9CAC.5080901@gmail.com></h1>
<meta subject="Re: Feature Generators" />
<meta internetMessageId="<530D9CAC.5080901@gmail.com>" />
<meta descriptorNodeId="2097188" />
<meta lastModificationTime="1393418263291" />
<meta senderName="Jörn Kottmann" />
<meta senderEmailAddress="kottmann@gmail.com" />
<meta recipients="No recipients table!" />
<p>mail content</p>
</div>
<div class="email-folder">
<h1>Éléments supprimés</h1>
</div>
</div>
<div class="email-folder">
<h1>Racine (pour la recherche)</h1>
</div>
<div class="email-folder">
<h1>SPAM Search Folder 2</h1>
</div>
</body>
</html>
{code}
was (Author: thaichat04):
java-libpst-0.7 has been uploaded to oss sonatype nexus. If there's no objection, I'll refactory attached parser and provide output as:
{code}
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="Content-Length" content="271360" />
<meta name="isValid" content="true" />
<meta name="Content-Type" content="application/vnd.ms-outlook" />
<title></title>
</head>
<body>
<div class="email-folder">
<h1>Début du fichier de données Outlook</h1>
<div class="email-entry">
<h1><530D9CAC.5080901@gmail.com></h1>
<meta subject="Re: Feature Generators" />
<meta internetMessageId="<530D9CAC.5080901@gmail.com>" />
<meta descriptorNodeId="2097188" />
<meta lastModificationTime="1393418263291" />
<meta senderName="Jörn Kottmann" />
<meta senderEmailAddress="kottmann@gmail.com" />
<meta recipients="No recipients table!" />
<p>mail content</p>
</div>
<div class="email-folder">
<h1>Éléments supprimés</h1>
</div>
</div>
<div class="email-folder">
<h1>Racine (pour la recherche)</h1>
</div>
<div class="email-folder">
<h1>SPAM Search Folder 2</h1>
</div>
</body>
</html>
{code}
> Add support for Outlook PST
> ---------------------------
>
> Key: TIKA-623
> URL: https://issues.apache.org/jira/browse/TIKA-623
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Reporter: Tran Nam Quang
> Assignee: Hong-Thai Nguyen
> Fix For: 1.6
>
> Attachments: OutlookPSTParser.java
>
>
> Hello everyone,
> As you might know, Outlook stores its mails and other stuff in a single PST file. There's a relatively new Java library called java-libpst for reading Outlook PST files. It is licensed under the LGPL and available over here: http://code.google.com/p/java-libpst/
> I have tested the library on Outlook 2000 and Outlook 2003, with good results. It would be great if the library could be integrated into Tika.
> Best regards
> Tran Nam Quang
--
This message was sent by Atlassian JIRA
(v6.2#6252)