You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Hong-Thai Nguyen (JIRA)" <ji...@apache.org> on 2014/03/05 10:30:57 UTC

[jira] [Comment Edited] (TIKA-623) Add support for Outlook PST

    [ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920692#comment-13920692 ] 

Hong-Thai Nguyen edited comment on TIKA-623 at 3/5/14 9:30 AM:
---------------------------------------------------------------

java-libpst-0.7 has been uploaded to oss sonatype nexus: https://issues.sonatype.org/browse/OSSRH-8965
If there's no objection, I'll refactory attached parser and provide output as:
{code}
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="Content-Length" content="271360" />
<meta name="isValid" content="true" />
<meta name="Content-Type" content="application/vnd.ms-outlook" />
<title></title>
</head>
<body>
	<div class="email-folder">
		<h1>Début du fichier de données Outlook</h1>
		<div class="email-entry">
			<h1>&lt;530D9CAC.5080901@gmail.com&gt;</h1>
			<meta subject="Re: Feature Generators" />
			<meta internetMessageId="&lt;530D9CAC.5080901@gmail.com&gt;" />
			<meta descriptorNodeId="2097188" />
			<meta lastModificationTime="1393418263291" />
			<meta senderName="Jörn Kottmann" />
			<meta senderEmailAddress="kottmann@gmail.com" />
			<meta recipients="No recipients table!" />
			<p>mail content</p>
		</div>
		<div class="email-folder">
			<h1>Éléments supprimés</h1>
		</div>
	</div>
	<div class="email-folder">
		<h1>Racine (pour la recherche)</h1>
	</div>
	<div class="email-folder">
		<h1>SPAM Search Folder 2</h1>
	</div>
</body>
</html>
{code}


was (Author: thaichat04):
java-libpst-0.7 has been uploaded to oss sonatype nexus. If there's no objection, I'll refactory attached parser and provide output as:
{code}
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="Content-Length" content="271360" />
<meta name="isValid" content="true" />
<meta name="Content-Type" content="application/vnd.ms-outlook" />
<title></title>
</head>
<body>
	<div class="email-folder">
		<h1>Début du fichier de données Outlook</h1>
		<div class="email-entry">
			<h1>&lt;530D9CAC.5080901@gmail.com&gt;</h1>
			<meta subject="Re: Feature Generators" />
			<meta internetMessageId="&lt;530D9CAC.5080901@gmail.com&gt;" />
			<meta descriptorNodeId="2097188" />
			<meta lastModificationTime="1393418263291" />
			<meta senderName="Jörn Kottmann" />
			<meta senderEmailAddress="kottmann@gmail.com" />
			<meta recipients="No recipients table!" />
			<p>mail content</p>
		</div>
		<div class="email-folder">
			<h1>Éléments supprimés</h1>
		</div>
	</div>
	<div class="email-folder">
		<h1>Racine (pour la recherche)</h1>
	</div>
	<div class="email-folder">
		<h1>SPAM Search Folder 2</h1>
	</div>
</body>
</html>
{code}

> Add support for Outlook PST
> ---------------------------
>
>                 Key: TIKA-623
>                 URL: https://issues.apache.org/jira/browse/TIKA-623
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Tran Nam Quang
>            Assignee: Hong-Thai Nguyen
>             Fix For: 1.6
>
>         Attachments: OutlookPSTParser.java
>
>
> Hello everyone,
> As you might know, Outlook stores its mails and other stuff in a single PST file. There's a relatively new Java library called java-libpst for reading Outlook PST files. It is licensed under the LGPL and available over here: http://code.google.com/p/java-libpst/
> I have tested the library on Outlook 2000 and Outlook 2003, with good results. It would be great if the library could be integrated into Tika.
> Best regards
> Tran Nam Quang



--
This message was sent by Atlassian JIRA
(v6.2#6252)