You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mime4j-dev@james.apache.org by Noss Benoit <be...@secu.lu> on 2011/01/24 07:18:51 UTC

Headless mail renderer

I don't want to spam you with this question, but I would like to make an 
headless PDF mail renderer.
In my project, I want to batch process incoming mails and inject them in 
a content management DB as PDF.
Am I on the right way if I use your MimeStreamParser combined with a 
custom handler to make rendering?
Can I access the content? Do you suggest something else for this?

Many thanks in advance.

Benoît NOSS


 



Re: Headless mail renderer

Posted by Noss Benoit <be...@secu.lu>.
Hello Stefano,
after having read your mail, I think I can keep a lot of the code I wrote.
I didn't know a better tool than tidy to generate the XTML and it did 
often fail, so I resigned to render HTML.
In fact, if I can replace the use of tidy by validator.nu to transform 
the HTML and then let iText render the XHTML output to a PDF,
I mostly solved my problem, I don't need to make 2 different processes 
to render a mail to PDF, and I can stay in JAVA.

Many thanks,

I understand quickly when you explain slowly ;-)

Benoît




On 25.01.2011 10:30, Stefano Bagnara wrote:
> 2011/1/25 Noss Benoit<be...@secu.lu>:
>> Hi, after your comments, I know think I have to split my project in two
>> parts
>>
>> 1/ The first part has to parse the message and write an html or xhtml page
>> representing the output I want for the message
>> 2/ The second part has to render the html I precedently generated to PDF
> I do that in a single step because of the content-id "cid:" image references.
> BTW logically you need to separate components: parser and renderer.
>
>> I tried flying saucer in the past, it can generate PDF, but it needed strict
>> XHTML for the input, and lots of mails are not strict XHTML
> I've had very good results parsing the html with validator.nu parser:
> http://about.validator.nu/htmlparser/
>
> I parsed thousands of HTML email and tested most html parser out there
> and validator.nu was the only one parsing them all.
>
>> On the one hand, I think I can improve my parser to get the html I want for
>> most of the mails I have to transform.
>> On the other hand, I don't know the openoffice SDK, webkit and Mozilla, and
>> html rendering will be the hardest part....
> If you used flying saucer in past then go ahead with that.
>
> Stefano


 



Re: Headless mail renderer

Posted by Tony Zakula <to...@gmail.com>.
Hi Robert.  Thank you for the detailed response and links.

On Tue, May 10, 2011 at 3:00 AM, Robert Burrell Donkin
<ro...@gmail.com> wrote:
> On Sun, May 8, 2011 at 2:44 PM, Tony Zakula <to...@gmail.com> wrote:
>> Not sure on where the project leaders want to go,
>
> Projects are community led here at Apache (see eg [1][2][3][4]). If
> there's development interest from the community and it's in scope for
> the project, then that's a direction the code will move in.
>
>> but I think being
>> able to store messages in different formats to be able to plugin to
>> systems would be great.  Instead of each person writing their own
>> parser, most people would just plugin the larger piece to their system
>> and start there.
>
> +1
>
> This vision seems to fit with the work over at Tika [5] and Lucene [6].
>
>> I did not see where you specified what you are thinking about for
>> summer.  Is that a link somewhere yet?
>
> The mailing lists (see [7] and eg [8]) are the primary tools we use
> here at Apache. Stuff only tends to get written down later, if at all.
> We've been throwing ideas around on the lists, hoping that people
> might pick some of them up and run with them ;-)
>
> Robert
>
> [1] http://www.apache.org/foundation/how-it-works.html
> [2] http://www.apache.org/foundation/getinvolved.html
> [3] http://jakarta.apache.org/site/contributing.html
> [4] http://www.apache.org/dev/contributors.html
> [5] http://tika.apache.org/
> [6] http://lucene.apache.org/
> [7] http://www.apache.org/dev/#mail
> [8] http://www.apache.org/dev/contrib-email-tips.html
>

I am on a few Apache lists.  However, I guess I did not see on this
list ideas being tossed about for where to go with Mime4J this summer.
 Maybe I missed it or the discussion was on a list I am not on
presently.   I would love to become a contributor to an Apache project
if I can make time to do it.

If you have a recommendation for me to join a list besides Mime4J,
please let me know.

Thanks,

Tony Z

Re: Headless mail renderer

Posted by Tony Zakula <to...@gmail.com>.
Hi Eric,

>
> We've got also the apache-extras [1] which is mercurial (hosted by google
> for addons to apache projects).

That good.  I will check it out.  I prefer mercurial.

>
> I was wondering if the parser can be used without the full myna server. I
> mean, we simply need to parse mail, not to configure a full server with
> users,... ([2]).

The core Java part of Myna to run the JS is only 4 Java classes to
wrap the Mozilla Rhino runtime.  All of the rest of the functionality
is provided via JavaScript.  I will carbon copy the main developer of
Myna on this message, but I am pretty sure we can create a trimmed
down version very easily.  Worst case scenario if you could not
integrate it, a Host object for Rhino could be written or Myna's could
be tweaked.  I do reuse some JavaScript Myna libraries, but that could
be dropped in anywhere.

>
> Also, is the parser component packaged as a maven module to be easily used
> as dependency somewhere else?

Ah well, I said it was rough around the edges.  :-)  My use case
scenario right now is to drop this on a server running a commercial
mail server to be an API gateway to web applications to specialized
functionality.  A bridge to the commercial mail server so to speak.
It was a project I am working on for a client specific to their specs.
 This tiny lightweight runtime allows me drop this on a server and
have have a bridge to that system.  When I make updates, I pull all
the new JS code with Mercurial to all the servers.  The project is not
complete either.  Just the parsing and a couple of API calls so far.
So that kind of stuff will be up to you.

>
> Whatever the response to the 2 above questions, feel free to push it where
> you like. I will look at it :)

Kind of let me know what you think about the above to help me package
it, but also make sure you are still interested.

Thanks.

Tony

>
> Tks,
> - Eric
>
> [1] http://code.google.com/a/apache-extras.org/hosting/
> [2] http://www.mynajs.org/site/article/MynaPermissionsAdministrator.ejs
>
>
> On 13/05/2011 01:51, Tony Zakula wrote:
>>
>> Hi Eric,
>>
>> JavaScript is also extremely flexible.  With Rhino, you get the full
>> power of Java plus a lot of flexibility.
>>
>> What is the preferred spot of release?  Github, Bitbuckit, or is there
>> another?
>>
>> Tony
>>
>>
>> On Thu, May 12, 2011 at 8:19 AM, Eric Charles<er...@apache.org>  wrote:
>>>
>>> Hi Tony,
>>>
>>> Javascript in james server would be a primeur.
>>> But why not... there is more and more JS "on the other side" (thinking to
>>> Node.js...). I'm using today Jackson to manipulate JSON in Java, but
>>> Javascript has a more natural fit, so I understand why you choose it.
>>>
>>> We will start around end-May with MAILBOX-44, but there today no
>>> discussions/decisions on the chosen format to persist mail.
>>>
>>> I would say "don't hurry, don't put pressure on you, and keep us updated
>>> when you think to release it" :)
>>>
>>> Tks,
>>> - Eric
>>>
>>> On 12/05/2011 14:35, Tony Zakula wrote:
>>>>
>>>> Hi Eric,
>>>>
>>>> I would be more than happy to release the code now even though it is
>>>> not entirely finished if you are interested.  The parsing part is
>>>> pretty good, and I am using it in a production project right now.  I
>>>> am not sure it will fit your bill though as I am using it on a mail
>>>> server to do message list bounce processing.  Although I write Java
>>>> code for a living, I wanted to make it easy to modify this utility on
>>>> a running server so I used an open source project I contribute to at
>>>> Mynajs.org which is built on top of the Mozilla Rhino project.  I use
>>>> Mime4J, but my code is written in JavaScript.
>>>>
>>>> I would be more than happy to release the code now if you are
>>>> interested.
>>>>
>>>> Please let me know.
>>>>
>>>> Thanks!
>>>>
>>>> Tony
>>>>
>>>> On Wed, May 11, 2011 at 11:21 PM, Eric Charles<er...@apache.org>
>>>>  wrote:
>>>>>
>>>>> Hi Tony,
>>>>>
>>>>> We are starting to work on MAILBOX-44 "Design and implement a
>>>>> distributed
>>>>> mailbox using Hadoop" [1]
>>>>>
>>>>> We will need to store the mail in hadoop and the JSON format (in avro
>>>>> file)
>>>>> may be a option.
>>>>>
>>>>> You said you are "still polishing for release" your JSON transformer.
>>>>> Have you got any plan to release it in opensource so we could use it ?
>>>>>
>>>>> Tks,
>>>>> Eric
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/MAILBOX-44
>>>>>
>>>>>
>>>>> On 10/05/2011 10:00, Robert Burrell Donkin wrote:
>>>>>>
>>>>>> On Sun, May 8, 2011 at 2:44 PM, Tony Zakula<to...@gmail.com>
>>>>>>  wrote:
>>>>>>>
>>>>>>> Not sure on where the project leaders want to go,
>>>>>>
>>>>>> Projects are community led here at Apache (see eg [1][2][3][4]). If
>>>>>> there's development interest from the community and it's in scope for
>>>>>> the project, then that's a direction the code will move in.
>>>>>>
>>>>>>> but I think being
>>>>>>> able to store messages in different formats to be able to plugin to
>>>>>>> systems would be great.  Instead of each person writing their own
>>>>>>> parser, most people would just plugin the larger piece to their
>>>>>>> system
>>>>>>> and start there.
>>>>>>
>>>>>> +1
>>>>>>
>>>>>> This vision seems to fit with the work over at Tika [5] and Lucene
>>>>>> [6].
>>>>>>
>>>>>>> I did not see where you specified what you are thinking about for
>>>>>>> summer.  Is that a link somewhere yet?
>>>>>>
>>>>>> The mailing lists (see [7] and eg [8]) are the primary tools we use
>>>>>> here at Apache. Stuff only tends to get written down later, if at all.
>>>>>> We've been throwing ideas around on the lists, hoping that people
>>>>>> might pick some of them up and run with them ;-)
>>>>>>
>>>>>> Robert
>>>>>>
>>>>>> [1] http://www.apache.org/foundation/how-it-works.html
>>>>>> [2] http://www.apache.org/foundation/getinvolved.html
>>>>>> [3] http://jakarta.apache.org/site/contributing.html
>>>>>> [4] http://www.apache.org/dev/contributors.html
>>>>>> [5] http://tika.apache.org/
>>>>>> [6] http://lucene.apache.org/
>>>>>> [7] http://www.apache.org/dev/#mail
>>>>>> [8] http://www.apache.org/dev/contrib-email-tips.html
>>>>>
>>>>>
>>>
>>>
>
>

Re: JSON parsing in mynajs (was Re: Headless mail renderer)

Posted by Tony Zakula <to...@gmail.com>.
Hi Eric,

I will put it on my list, but the main work is being done in - [1]

The method probably for your purposes to look at is at the bottom of
that file and is:

var parseMessage = function parse_message(fileName) {
  // This function parses the message and gets it back in JSON
  var object = null;
  var fis = new Myna.File(fileName).getInputStream();
  //Create message with stream from file
  try {
    var mimeMsg = new org.apache.james.mime4j.message.Message(fis);
  }
  catch (e) {
    // if error it is from parser, just delete message.
    //Myna.println("Mime4J error parsing message")
  }
  object = files.createEntityNode(mimeMsg);
  fis.close();
  return object;
}

The create entity node located at the following link is really the
recursive function that does the parsing - [2]

Things are setup through index.ejs to be able to be called from an
http connection as Myna is deployed as a Java webapp.  To call from a
Java main method, we would need to customize Myna or create a Rhino
host object as Mark described [3].

Tony Zakula
Direct Line (906) 364-8082

[1] https://bitbucket.org/tzakula/javascript-email-bounce-processor/src/65b933d28c1f/bounceProcessor/processBounces.sjs
[2] https://bitbucket.org/tzakula/javascript-email-bounce-processor/src/65b933d28c1f/bounceProcessor/createEntityNode.sjs
[3] https://groups.google.com/forum/#!topic/mynajs-general/UnMTvxHNE



On Mon, May 16, 2011 at 4:47 AM, Eric Charles <er...@apache.org> wrote:
> Hi Tony,
>
> Cool! Tks for this :)
>
> Event if index.ejs gives some clues on how to run, it would be great to have
> a short README that explains how to use it (and how to call it from a Java
> main class).
>
> Tks,
> Eric
>
>
> On 16/05/2011 04:45, Tony Zakula wrote:
>>
>> Hi Eric and all,
>>
>> I have posted some basic code for parsing emails using Mime4J and
>> JavaScript at
>>
>> https://bitbucket.org/tzakula/javascript-email-bounce-processor
>>
>> Thanks.
>>
>> Tony Z
>>
>>
>>
>> On Thu, May 12, 2011 at 10:28 PM, Eric Charles<er...@apache.org>  wrote:
>>>
>>> Hi Tony,
>>>
>>> We've got also the apache-extras [1] which is mercurial (hosted by google
>>> for addons to apache projects).
>>>
>>> I was wondering if the parser can be used without the full myna server. I
>>> mean, we simply need to parse mail, not to configure a full server with
>>> users,... ([2]).
>>>
>>> Also, is the parser component packaged as a maven module to be easily
>>> used
>>> as dependency somewhere else?
>>>
>>> Whatever the response to the 2 above questions, feel free to push it
>>> where
>>> you like. I will look at it :)
>>>
>>> Tks,
>>> - Eric
>>>
>>> [1] http://code.google.com/a/apache-extras.org/hosting/
>>> [2] http://www.mynajs.org/site/article/MynaPermissionsAdministrator.ejs
>>>
>>>
>>> On 13/05/2011 01:51, Tony Zakula wrote:
>>>>
>>>> Hi Eric,
>>>>
>>>> JavaScript is also extremely flexible.  With Rhino, you get the full
>>>> power of Java plus a lot of flexibility.
>>>>
>>>> What is the preferred spot of release?  Github, Bitbuckit, or is there
>>>> another?
>>>>
>>>> Tony
>>>>
>>>>
>>>> On Thu, May 12, 2011 at 8:19 AM, Eric Charles<er...@apache.org>    wrote:
>>>>>
>>>>> Hi Tony,
>>>>>
>>>>> Javascript in james server would be a primeur.
>>>>> But why not... there is more and more JS "on the other side" (thinking
>>>>> to
>>>>> Node.js...). I'm using today Jackson to manipulate JSON in Java, but
>>>>> Javascript has a more natural fit, so I understand why you choose it.
>>>>>
>>>>> We will start around end-May with MAILBOX-44, but there today no
>>>>> discussions/decisions on the chosen format to persist mail.
>>>>>
>>>>> I would say "don't hurry, don't put pressure on you, and keep us
>>>>> updated
>>>>> when you think to release it" :)
>>>>>
>>>>> Tks,
>>>>> - Eric
>>>>>
>>>>> On 12/05/2011 14:35, Tony Zakula wrote:
>>>>>>
>>>>>> Hi Eric,
>>>>>>
>>>>>> I would be more than happy to release the code now even though it is
>>>>>> not entirely finished if you are interested.  The parsing part is
>>>>>> pretty good, and I am using it in a production project right now.  I
>>>>>> am not sure it will fit your bill though as I am using it on a mail
>>>>>> server to do message list bounce processing.  Although I write Java
>>>>>> code for a living, I wanted to make it easy to modify this utility on
>>>>>> a running server so I used an open source project I contribute to at
>>>>>> Mynajs.org which is built on top of the Mozilla Rhino project.  I use
>>>>>> Mime4J, but my code is written in JavaScript.
>>>>>>
>>>>>> I would be more than happy to release the code now if you are
>>>>>> interested.
>>>>>>
>>>>>> Please let me know.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Tony
>>>>>>
>>>>>> On Wed, May 11, 2011 at 11:21 PM, Eric Charles<er...@apache.org>
>>>>>>  wrote:
>>>>>>>
>>>>>>> Hi Tony,
>>>>>>>
>>>>>>> We are starting to work on MAILBOX-44 "Design and implement a
>>>>>>> distributed
>>>>>>> mailbox using Hadoop" [1]
>>>>>>>
>>>>>>> We will need to store the mail in hadoop and the JSON format (in avro
>>>>>>> file)
>>>>>>> may be a option.
>>>>>>>
>>>>>>> You said you are "still polishing for release" your JSON transformer.
>>>>>>> Have you got any plan to release it in opensource so we could use it
>>>>>>> ?
>>>>>>>
>>>>>>> Tks,
>>>>>>> Eric
>>>>>>>
>>>>>>> [1] https://issues.apache.org/jira/browse/MAILBOX-44
>>>>>>>
>>>>>>>
>>>>>>> On 10/05/2011 10:00, Robert Burrell Donkin wrote:
>>>>>>>>
>>>>>>>> On Sun, May 8, 2011 at 2:44 PM, Tony Zakula<to...@gmail.com>
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>> Not sure on where the project leaders want to go,
>>>>>>>>
>>>>>>>> Projects are community led here at Apache (see eg [1][2][3][4]). If
>>>>>>>> there's development interest from the community and it's in scope
>>>>>>>> for
>>>>>>>> the project, then that's a direction the code will move in.
>>>>>>>>
>>>>>>>>> but I think being
>>>>>>>>> able to store messages in different formats to be able to plugin to
>>>>>>>>> systems would be great.  Instead of each person writing their own
>>>>>>>>> parser, most people would just plugin the larger piece to their
>>>>>>>>> system
>>>>>>>>> and start there.
>>>>>>>>
>>>>>>>> +1
>>>>>>>>
>>>>>>>> This vision seems to fit with the work over at Tika [5] and Lucene
>>>>>>>> [6].
>>>>>>>>
>>>>>>>>> I did not see where you specified what you are thinking about for
>>>>>>>>> summer.  Is that a link somewhere yet?
>>>>>>>>
>>>>>>>> The mailing lists (see [7] and eg [8]) are the primary tools we use
>>>>>>>> here at Apache. Stuff only tends to get written down later, if at
>>>>>>>> all.
>>>>>>>> We've been throwing ideas around on the lists, hoping that people
>>>>>>>> might pick some of them up and run with them ;-)
>>>>>>>>
>>>>>>>> Robert
>>>>>>>>
>>>>>>>> [1] http://www.apache.org/foundation/how-it-works.html
>>>>>>>> [2] http://www.apache.org/foundation/getinvolved.html
>>>>>>>> [3] http://jakarta.apache.org/site/contributing.html
>>>>>>>> [4] http://www.apache.org/dev/contributors.html
>>>>>>>> [5] http://tika.apache.org/
>>>>>>>> [6] http://lucene.apache.org/
>>>>>>>> [7] http://www.apache.org/dev/#mail
>>>>>>>> [8] http://www.apache.org/dev/contrib-email-tips.html
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>>
>
>

JSON parsing in mynajs (was Re: Headless mail renderer)

Posted by Eric Charles <er...@apache.org>.
Hi Tony,

Cool! Tks for this :)

Event if index.ejs gives some clues on how to run, it would be great to 
have a short README that explains how to use it (and how to call it from 
a Java main class).

Tks,
Eric


On 16/05/2011 04:45, Tony Zakula wrote:
> Hi Eric and all,
>
> I have posted some basic code for parsing emails using Mime4J and JavaScript at
>
> https://bitbucket.org/tzakula/javascript-email-bounce-processor
>
> Thanks.
>
> Tony Z
>
>
>
> On Thu, May 12, 2011 at 10:28 PM, Eric Charles<er...@apache.org>  wrote:
>> Hi Tony,
>>
>> We've got also the apache-extras [1] which is mercurial (hosted by google
>> for addons to apache projects).
>>
>> I was wondering if the parser can be used without the full myna server. I
>> mean, we simply need to parse mail, not to configure a full server with
>> users,... ([2]).
>>
>> Also, is the parser component packaged as a maven module to be easily used
>> as dependency somewhere else?
>>
>> Whatever the response to the 2 above questions, feel free to push it where
>> you like. I will look at it :)
>>
>> Tks,
>> - Eric
>>
>> [1] http://code.google.com/a/apache-extras.org/hosting/
>> [2] http://www.mynajs.org/site/article/MynaPermissionsAdministrator.ejs
>>
>>
>> On 13/05/2011 01:51, Tony Zakula wrote:
>>>
>>> Hi Eric,
>>>
>>> JavaScript is also extremely flexible.  With Rhino, you get the full
>>> power of Java plus a lot of flexibility.
>>>
>>> What is the preferred spot of release?  Github, Bitbuckit, or is there
>>> another?
>>>
>>> Tony
>>>
>>>
>>> On Thu, May 12, 2011 at 8:19 AM, Eric Charles<er...@apache.org>    wrote:
>>>>
>>>> Hi Tony,
>>>>
>>>> Javascript in james server would be a primeur.
>>>> But why not... there is more and more JS "on the other side" (thinking to
>>>> Node.js...). I'm using today Jackson to manipulate JSON in Java, but
>>>> Javascript has a more natural fit, so I understand why you choose it.
>>>>
>>>> We will start around end-May with MAILBOX-44, but there today no
>>>> discussions/decisions on the chosen format to persist mail.
>>>>
>>>> I would say "don't hurry, don't put pressure on you, and keep us updated
>>>> when you think to release it" :)
>>>>
>>>> Tks,
>>>> - Eric
>>>>
>>>> On 12/05/2011 14:35, Tony Zakula wrote:
>>>>>
>>>>> Hi Eric,
>>>>>
>>>>> I would be more than happy to release the code now even though it is
>>>>> not entirely finished if you are interested.  The parsing part is
>>>>> pretty good, and I am using it in a production project right now.  I
>>>>> am not sure it will fit your bill though as I am using it on a mail
>>>>> server to do message list bounce processing.  Although I write Java
>>>>> code for a living, I wanted to make it easy to modify this utility on
>>>>> a running server so I used an open source project I contribute to at
>>>>> Mynajs.org which is built on top of the Mozilla Rhino project.  I use
>>>>> Mime4J, but my code is written in JavaScript.
>>>>>
>>>>> I would be more than happy to release the code now if you are
>>>>> interested.
>>>>>
>>>>> Please let me know.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Tony
>>>>>
>>>>> On Wed, May 11, 2011 at 11:21 PM, Eric Charles<er...@apache.org>
>>>>>   wrote:
>>>>>>
>>>>>> Hi Tony,
>>>>>>
>>>>>> We are starting to work on MAILBOX-44 "Design and implement a
>>>>>> distributed
>>>>>> mailbox using Hadoop" [1]
>>>>>>
>>>>>> We will need to store the mail in hadoop and the JSON format (in avro
>>>>>> file)
>>>>>> may be a option.
>>>>>>
>>>>>> You said you are "still polishing for release" your JSON transformer.
>>>>>> Have you got any plan to release it in opensource so we could use it ?
>>>>>>
>>>>>> Tks,
>>>>>> Eric
>>>>>>
>>>>>> [1] https://issues.apache.org/jira/browse/MAILBOX-44
>>>>>>
>>>>>>
>>>>>> On 10/05/2011 10:00, Robert Burrell Donkin wrote:
>>>>>>>
>>>>>>> On Sun, May 8, 2011 at 2:44 PM, Tony Zakula<to...@gmail.com>
>>>>>>>   wrote:
>>>>>>>>
>>>>>>>> Not sure on where the project leaders want to go,
>>>>>>>
>>>>>>> Projects are community led here at Apache (see eg [1][2][3][4]). If
>>>>>>> there's development interest from the community and it's in scope for
>>>>>>> the project, then that's a direction the code will move in.
>>>>>>>
>>>>>>>> but I think being
>>>>>>>> able to store messages in different formats to be able to plugin to
>>>>>>>> systems would be great.  Instead of each person writing their own
>>>>>>>> parser, most people would just plugin the larger piece to their
>>>>>>>> system
>>>>>>>> and start there.
>>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> This vision seems to fit with the work over at Tika [5] and Lucene
>>>>>>> [6].
>>>>>>>
>>>>>>>> I did not see where you specified what you are thinking about for
>>>>>>>> summer.  Is that a link somewhere yet?
>>>>>>>
>>>>>>> The mailing lists (see [7] and eg [8]) are the primary tools we use
>>>>>>> here at Apache. Stuff only tends to get written down later, if at all.
>>>>>>> We've been throwing ideas around on the lists, hoping that people
>>>>>>> might pick some of them up and run with them ;-)
>>>>>>>
>>>>>>> Robert
>>>>>>>
>>>>>>> [1] http://www.apache.org/foundation/how-it-works.html
>>>>>>> [2] http://www.apache.org/foundation/getinvolved.html
>>>>>>> [3] http://jakarta.apache.org/site/contributing.html
>>>>>>> [4] http://www.apache.org/dev/contributors.html
>>>>>>> [5] http://tika.apache.org/
>>>>>>> [6] http://lucene.apache.org/
>>>>>>> [7] http://www.apache.org/dev/#mail
>>>>>>> [8] http://www.apache.org/dev/contrib-email-tips.html
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>


Re: Headless mail renderer

Posted by Tony Zakula <to...@gmail.com>.
Hi Eric and all,

I have posted some basic code for parsing emails using Mime4J and JavaScript at

https://bitbucket.org/tzakula/javascript-email-bounce-processor

Thanks.

Tony Z



On Thu, May 12, 2011 at 10:28 PM, Eric Charles <er...@apache.org> wrote:
> Hi Tony,
>
> We've got also the apache-extras [1] which is mercurial (hosted by google
> for addons to apache projects).
>
> I was wondering if the parser can be used without the full myna server. I
> mean, we simply need to parse mail, not to configure a full server with
> users,... ([2]).
>
> Also, is the parser component packaged as a maven module to be easily used
> as dependency somewhere else?
>
> Whatever the response to the 2 above questions, feel free to push it where
> you like. I will look at it :)
>
> Tks,
> - Eric
>
> [1] http://code.google.com/a/apache-extras.org/hosting/
> [2] http://www.mynajs.org/site/article/MynaPermissionsAdministrator.ejs
>
>
> On 13/05/2011 01:51, Tony Zakula wrote:
>>
>> Hi Eric,
>>
>> JavaScript is also extremely flexible.  With Rhino, you get the full
>> power of Java plus a lot of flexibility.
>>
>> What is the preferred spot of release?  Github, Bitbuckit, or is there
>> another?
>>
>> Tony
>>
>>
>> On Thu, May 12, 2011 at 8:19 AM, Eric Charles<er...@apache.org>  wrote:
>>>
>>> Hi Tony,
>>>
>>> Javascript in james server would be a primeur.
>>> But why not... there is more and more JS "on the other side" (thinking to
>>> Node.js...). I'm using today Jackson to manipulate JSON in Java, but
>>> Javascript has a more natural fit, so I understand why you choose it.
>>>
>>> We will start around end-May with MAILBOX-44, but there today no
>>> discussions/decisions on the chosen format to persist mail.
>>>
>>> I would say "don't hurry, don't put pressure on you, and keep us updated
>>> when you think to release it" :)
>>>
>>> Tks,
>>> - Eric
>>>
>>> On 12/05/2011 14:35, Tony Zakula wrote:
>>>>
>>>> Hi Eric,
>>>>
>>>> I would be more than happy to release the code now even though it is
>>>> not entirely finished if you are interested.  The parsing part is
>>>> pretty good, and I am using it in a production project right now.  I
>>>> am not sure it will fit your bill though as I am using it on a mail
>>>> server to do message list bounce processing.  Although I write Java
>>>> code for a living, I wanted to make it easy to modify this utility on
>>>> a running server so I used an open source project I contribute to at
>>>> Mynajs.org which is built on top of the Mozilla Rhino project.  I use
>>>> Mime4J, but my code is written in JavaScript.
>>>>
>>>> I would be more than happy to release the code now if you are
>>>> interested.
>>>>
>>>> Please let me know.
>>>>
>>>> Thanks!
>>>>
>>>> Tony
>>>>
>>>> On Wed, May 11, 2011 at 11:21 PM, Eric Charles<er...@apache.org>
>>>>  wrote:
>>>>>
>>>>> Hi Tony,
>>>>>
>>>>> We are starting to work on MAILBOX-44 "Design and implement a
>>>>> distributed
>>>>> mailbox using Hadoop" [1]
>>>>>
>>>>> We will need to store the mail in hadoop and the JSON format (in avro
>>>>> file)
>>>>> may be a option.
>>>>>
>>>>> You said you are "still polishing for release" your JSON transformer.
>>>>> Have you got any plan to release it in opensource so we could use it ?
>>>>>
>>>>> Tks,
>>>>> Eric
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/MAILBOX-44
>>>>>
>>>>>
>>>>> On 10/05/2011 10:00, Robert Burrell Donkin wrote:
>>>>>>
>>>>>> On Sun, May 8, 2011 at 2:44 PM, Tony Zakula<to...@gmail.com>
>>>>>>  wrote:
>>>>>>>
>>>>>>> Not sure on where the project leaders want to go,
>>>>>>
>>>>>> Projects are community led here at Apache (see eg [1][2][3][4]). If
>>>>>> there's development interest from the community and it's in scope for
>>>>>> the project, then that's a direction the code will move in.
>>>>>>
>>>>>>> but I think being
>>>>>>> able to store messages in different formats to be able to plugin to
>>>>>>> systems would be great.  Instead of each person writing their own
>>>>>>> parser, most people would just plugin the larger piece to their
>>>>>>> system
>>>>>>> and start there.
>>>>>>
>>>>>> +1
>>>>>>
>>>>>> This vision seems to fit with the work over at Tika [5] and Lucene
>>>>>> [6].
>>>>>>
>>>>>>> I did not see where you specified what you are thinking about for
>>>>>>> summer.  Is that a link somewhere yet?
>>>>>>
>>>>>> The mailing lists (see [7] and eg [8]) are the primary tools we use
>>>>>> here at Apache. Stuff only tends to get written down later, if at all.
>>>>>> We've been throwing ideas around on the lists, hoping that people
>>>>>> might pick some of them up and run with them ;-)
>>>>>>
>>>>>> Robert
>>>>>>
>>>>>> [1] http://www.apache.org/foundation/how-it-works.html
>>>>>> [2] http://www.apache.org/foundation/getinvolved.html
>>>>>> [3] http://jakarta.apache.org/site/contributing.html
>>>>>> [4] http://www.apache.org/dev/contributors.html
>>>>>> [5] http://tika.apache.org/
>>>>>> [6] http://lucene.apache.org/
>>>>>> [7] http://www.apache.org/dev/#mail
>>>>>> [8] http://www.apache.org/dev/contrib-email-tips.html
>>>>>
>>>>>
>>>
>>>
>
>

Re: Headless mail renderer

Posted by Eric Charles <er...@apache.org>.
Hi Tony,

We've got also the apache-extras [1] which is mercurial (hosted by 
google for addons to apache projects).

I was wondering if the parser can be used without the full myna server. 
I mean, we simply need to parse mail, not to configure a full server 
with users,... ([2]).

Also, is the parser component packaged as a maven module to be easily 
used as dependency somewhere else?

Whatever the response to the 2 above questions, feel free to push it 
where you like. I will look at it :)

Tks,
- Eric

[1] http://code.google.com/a/apache-extras.org/hosting/
[2] http://www.mynajs.org/site/article/MynaPermissionsAdministrator.ejs


On 13/05/2011 01:51, Tony Zakula wrote:
> Hi Eric,
>
> JavaScript is also extremely flexible.  With Rhino, you get the full
> power of Java plus a lot of flexibility.
>
> What is the preferred spot of release?  Github, Bitbuckit, or is there another?
>
> Tony
>
>
> On Thu, May 12, 2011 at 8:19 AM, Eric Charles<er...@apache.org>  wrote:
>> Hi Tony,
>>
>> Javascript in james server would be a primeur.
>> But why not... there is more and more JS "on the other side" (thinking to
>> Node.js...). I'm using today Jackson to manipulate JSON in Java, but
>> Javascript has a more natural fit, so I understand why you choose it.
>>
>> We will start around end-May with MAILBOX-44, but there today no
>> discussions/decisions on the chosen format to persist mail.
>>
>> I would say "don't hurry, don't put pressure on you, and keep us updated
>> when you think to release it" :)
>>
>> Tks,
>> - Eric
>>
>> On 12/05/2011 14:35, Tony Zakula wrote:
>>>
>>> Hi Eric,
>>>
>>> I would be more than happy to release the code now even though it is
>>> not entirely finished if you are interested.  The parsing part is
>>> pretty good, and I am using it in a production project right now.  I
>>> am not sure it will fit your bill though as I am using it on a mail
>>> server to do message list bounce processing.  Although I write Java
>>> code for a living, I wanted to make it easy to modify this utility on
>>> a running server so I used an open source project I contribute to at
>>> Mynajs.org which is built on top of the Mozilla Rhino project.  I use
>>> Mime4J, but my code is written in JavaScript.
>>>
>>> I would be more than happy to release the code now if you are interested.
>>>
>>> Please let me know.
>>>
>>> Thanks!
>>>
>>> Tony
>>>
>>> On Wed, May 11, 2011 at 11:21 PM, Eric Charles<er...@apache.org>    wrote:
>>>>
>>>> Hi Tony,
>>>>
>>>> We are starting to work on MAILBOX-44 "Design and implement a distributed
>>>> mailbox using Hadoop" [1]
>>>>
>>>> We will need to store the mail in hadoop and the JSON format (in avro
>>>> file)
>>>> may be a option.
>>>>
>>>> You said you are "still polishing for release" your JSON transformer.
>>>> Have you got any plan to release it in opensource so we could use it ?
>>>>
>>>> Tks,
>>>> Eric
>>>>
>>>> [1] https://issues.apache.org/jira/browse/MAILBOX-44
>>>>
>>>>
>>>> On 10/05/2011 10:00, Robert Burrell Donkin wrote:
>>>>>
>>>>> On Sun, May 8, 2011 at 2:44 PM, Tony Zakula<to...@gmail.com>
>>>>>   wrote:
>>>>>>
>>>>>> Not sure on where the project leaders want to go,
>>>>>
>>>>> Projects are community led here at Apache (see eg [1][2][3][4]). If
>>>>> there's development interest from the community and it's in scope for
>>>>> the project, then that's a direction the code will move in.
>>>>>
>>>>>> but I think being
>>>>>> able to store messages in different formats to be able to plugin to
>>>>>> systems would be great.  Instead of each person writing their own
>>>>>> parser, most people would just plugin the larger piece to their system
>>>>>> and start there.
>>>>>
>>>>> +1
>>>>>
>>>>> This vision seems to fit with the work over at Tika [5] and Lucene [6].
>>>>>
>>>>>> I did not see where you specified what you are thinking about for
>>>>>> summer.  Is that a link somewhere yet?
>>>>>
>>>>> The mailing lists (see [7] and eg [8]) are the primary tools we use
>>>>> here at Apache. Stuff only tends to get written down later, if at all.
>>>>> We've been throwing ideas around on the lists, hoping that people
>>>>> might pick some of them up and run with them ;-)
>>>>>
>>>>> Robert
>>>>>
>>>>> [1] http://www.apache.org/foundation/how-it-works.html
>>>>> [2] http://www.apache.org/foundation/getinvolved.html
>>>>> [3] http://jakarta.apache.org/site/contributing.html
>>>>> [4] http://www.apache.org/dev/contributors.html
>>>>> [5] http://tika.apache.org/
>>>>> [6] http://lucene.apache.org/
>>>>> [7] http://www.apache.org/dev/#mail
>>>>> [8] http://www.apache.org/dev/contrib-email-tips.html
>>>>
>>>>
>>
>>


Re: Headless mail renderer

Posted by Tony Zakula <to...@gmail.com>.
Hi Eric,

JavaScript is also extremely flexible.  With Rhino, you get the full
power of Java plus a lot of flexibility.

What is the preferred spot of release?  Github, Bitbuckit, or is there another?

Tony


On Thu, May 12, 2011 at 8:19 AM, Eric Charles <er...@apache.org> wrote:
> Hi Tony,
>
> Javascript in james server would be a primeur.
> But why not... there is more and more JS "on the other side" (thinking to
> Node.js...). I'm using today Jackson to manipulate JSON in Java, but
> Javascript has a more natural fit, so I understand why you choose it.
>
> We will start around end-May with MAILBOX-44, but there today no
> discussions/decisions on the chosen format to persist mail.
>
> I would say "don't hurry, don't put pressure on you, and keep us updated
> when you think to release it" :)
>
> Tks,
> - Eric
>
> On 12/05/2011 14:35, Tony Zakula wrote:
>>
>> Hi Eric,
>>
>> I would be more than happy to release the code now even though it is
>> not entirely finished if you are interested.  The parsing part is
>> pretty good, and I am using it in a production project right now.  I
>> am not sure it will fit your bill though as I am using it on a mail
>> server to do message list bounce processing.  Although I write Java
>> code for a living, I wanted to make it easy to modify this utility on
>> a running server so I used an open source project I contribute to at
>> Mynajs.org which is built on top of the Mozilla Rhino project.  I use
>> Mime4J, but my code is written in JavaScript.
>>
>> I would be more than happy to release the code now if you are interested.
>>
>> Please let me know.
>>
>> Thanks!
>>
>> Tony
>>
>> On Wed, May 11, 2011 at 11:21 PM, Eric Charles<er...@apache.org>  wrote:
>>>
>>> Hi Tony,
>>>
>>> We are starting to work on MAILBOX-44 "Design and implement a distributed
>>> mailbox using Hadoop" [1]
>>>
>>> We will need to store the mail in hadoop and the JSON format (in avro
>>> file)
>>> may be a option.
>>>
>>> You said you are "still polishing for release" your JSON transformer.
>>> Have you got any plan to release it in opensource so we could use it ?
>>>
>>> Tks,
>>> Eric
>>>
>>> [1] https://issues.apache.org/jira/browse/MAILBOX-44
>>>
>>>
>>> On 10/05/2011 10:00, Robert Burrell Donkin wrote:
>>>>
>>>> On Sun, May 8, 2011 at 2:44 PM, Tony Zakula<to...@gmail.com>
>>>>  wrote:
>>>>>
>>>>> Not sure on where the project leaders want to go,
>>>>
>>>> Projects are community led here at Apache (see eg [1][2][3][4]). If
>>>> there's development interest from the community and it's in scope for
>>>> the project, then that's a direction the code will move in.
>>>>
>>>>> but I think being
>>>>> able to store messages in different formats to be able to plugin to
>>>>> systems would be great.  Instead of each person writing their own
>>>>> parser, most people would just plugin the larger piece to their system
>>>>> and start there.
>>>>
>>>> +1
>>>>
>>>> This vision seems to fit with the work over at Tika [5] and Lucene [6].
>>>>
>>>>> I did not see where you specified what you are thinking about for
>>>>> summer.  Is that a link somewhere yet?
>>>>
>>>> The mailing lists (see [7] and eg [8]) are the primary tools we use
>>>> here at Apache. Stuff only tends to get written down later, if at all.
>>>> We've been throwing ideas around on the lists, hoping that people
>>>> might pick some of them up and run with them ;-)
>>>>
>>>> Robert
>>>>
>>>> [1] http://www.apache.org/foundation/how-it-works.html
>>>> [2] http://www.apache.org/foundation/getinvolved.html
>>>> [3] http://jakarta.apache.org/site/contributing.html
>>>> [4] http://www.apache.org/dev/contributors.html
>>>> [5] http://tika.apache.org/
>>>> [6] http://lucene.apache.org/
>>>> [7] http://www.apache.org/dev/#mail
>>>> [8] http://www.apache.org/dev/contrib-email-tips.html
>>>
>>>
>
>

Re: Headless mail renderer

Posted by Eric Charles <er...@apache.org>.
Hi Tony,

Javascript in james server would be a primeur.
But why not... there is more and more JS "on the other side" (thinking 
to Node.js...). I'm using today Jackson to manipulate JSON in Java, but 
Javascript has a more natural fit, so I understand why you choose it.

We will start around end-May with MAILBOX-44, but there today no 
discussions/decisions on the chosen format to persist mail.

I would say "don't hurry, don't put pressure on you, and keep us updated 
when you think to release it" :)

Tks,
- Eric

On 12/05/2011 14:35, Tony Zakula wrote:
> Hi Eric,
>
> I would be more than happy to release the code now even though it is
> not entirely finished if you are interested.  The parsing part is
> pretty good, and I am using it in a production project right now.  I
> am not sure it will fit your bill though as I am using it on a mail
> server to do message list bounce processing.  Although I write Java
> code for a living, I wanted to make it easy to modify this utility on
> a running server so I used an open source project I contribute to at
> Mynajs.org which is built on top of the Mozilla Rhino project.  I use
> Mime4J, but my code is written in JavaScript.
>
> I would be more than happy to release the code now if you are interested.
>
> Please let me know.
>
> Thanks!
>
> Tony
>
> On Wed, May 11, 2011 at 11:21 PM, Eric Charles<er...@apache.org>  wrote:
>> Hi Tony,
>>
>> We are starting to work on MAILBOX-44 "Design and implement a distributed
>> mailbox using Hadoop" [1]
>>
>> We will need to store the mail in hadoop and the JSON format (in avro file)
>> may be a option.
>>
>> You said you are "still polishing for release" your JSON transformer.
>> Have you got any plan to release it in opensource so we could use it ?
>>
>> Tks,
>> Eric
>>
>> [1] https://issues.apache.org/jira/browse/MAILBOX-44
>>
>>
>> On 10/05/2011 10:00, Robert Burrell Donkin wrote:
>>>
>>> On Sun, May 8, 2011 at 2:44 PM, Tony Zakula<to...@gmail.com>    wrote:
>>>>
>>>> Not sure on where the project leaders want to go,
>>>
>>> Projects are community led here at Apache (see eg [1][2][3][4]). If
>>> there's development interest from the community and it's in scope for
>>> the project, then that's a direction the code will move in.
>>>
>>>> but I think being
>>>> able to store messages in different formats to be able to plugin to
>>>> systems would be great.  Instead of each person writing their own
>>>> parser, most people would just plugin the larger piece to their system
>>>> and start there.
>>>
>>> +1
>>>
>>> This vision seems to fit with the work over at Tika [5] and Lucene [6].
>>>
>>>> I did not see where you specified what you are thinking about for
>>>> summer.  Is that a link somewhere yet?
>>>
>>> The mailing lists (see [7] and eg [8]) are the primary tools we use
>>> here at Apache. Stuff only tends to get written down later, if at all.
>>> We've been throwing ideas around on the lists, hoping that people
>>> might pick some of them up and run with them ;-)
>>>
>>> Robert
>>>
>>> [1] http://www.apache.org/foundation/how-it-works.html
>>> [2] http://www.apache.org/foundation/getinvolved.html
>>> [3] http://jakarta.apache.org/site/contributing.html
>>> [4] http://www.apache.org/dev/contributors.html
>>> [5] http://tika.apache.org/
>>> [6] http://lucene.apache.org/
>>> [7] http://www.apache.org/dev/#mail
>>> [8] http://www.apache.org/dev/contrib-email-tips.html
>>
>>


Re: Headless mail renderer

Posted by Tony Zakula <to...@gmail.com>.
Hi Eric,

I would be more than happy to release the code now even though it is
not entirely finished if you are interested.  The parsing part is
pretty good, and I am using it in a production project right now.  I
am not sure it will fit your bill though as I am using it on a mail
server to do message list bounce processing.  Although I write Java
code for a living, I wanted to make it easy to modify this utility on
a running server so I used an open source project I contribute to at
Mynajs.org which is built on top of the Mozilla Rhino project.  I use
Mime4J, but my code is written in JavaScript.

I would be more than happy to release the code now if you are interested.

Please let me know.

Thanks!

Tony

On Wed, May 11, 2011 at 11:21 PM, Eric Charles <er...@apache.org> wrote:
> Hi Tony,
>
> We are starting to work on MAILBOX-44 "Design and implement a distributed
> mailbox using Hadoop" [1]
>
> We will need to store the mail in hadoop and the JSON format (in avro file)
> may be a option.
>
> You said you are "still polishing for release" your JSON transformer.
> Have you got any plan to release it in opensource so we could use it ?
>
> Tks,
> Eric
>
> [1] https://issues.apache.org/jira/browse/MAILBOX-44
>
>
> On 10/05/2011 10:00, Robert Burrell Donkin wrote:
>>
>> On Sun, May 8, 2011 at 2:44 PM, Tony Zakula<to...@gmail.com>  wrote:
>>>
>>> Not sure on where the project leaders want to go,
>>
>> Projects are community led here at Apache (see eg [1][2][3][4]). If
>> there's development interest from the community and it's in scope for
>> the project, then that's a direction the code will move in.
>>
>>> but I think being
>>> able to store messages in different formats to be able to plugin to
>>> systems would be great.  Instead of each person writing their own
>>> parser, most people would just plugin the larger piece to their system
>>> and start there.
>>
>> +1
>>
>> This vision seems to fit with the work over at Tika [5] and Lucene [6].
>>
>>> I did not see where you specified what you are thinking about for
>>> summer.  Is that a link somewhere yet?
>>
>> The mailing lists (see [7] and eg [8]) are the primary tools we use
>> here at Apache. Stuff only tends to get written down later, if at all.
>> We've been throwing ideas around on the lists, hoping that people
>> might pick some of them up and run with them ;-)
>>
>> Robert
>>
>> [1] http://www.apache.org/foundation/how-it-works.html
>> [2] http://www.apache.org/foundation/getinvolved.html
>> [3] http://jakarta.apache.org/site/contributing.html
>> [4] http://www.apache.org/dev/contributors.html
>> [5] http://tika.apache.org/
>> [6] http://lucene.apache.org/
>> [7] http://www.apache.org/dev/#mail
>> [8] http://www.apache.org/dev/contrib-email-tips.html
>
>

Re: Headless mail renderer

Posted by Eric Charles <er...@apache.org>.
Hi Tony,

We are starting to work on MAILBOX-44 "Design and implement a 
distributed mailbox using Hadoop" [1]

We will need to store the mail in hadoop and the JSON format (in avro 
file) may be a option.

You said you are "still polishing for release" your JSON transformer.
Have you got any plan to release it in opensource so we could use it ?

Tks,
Eric

[1] https://issues.apache.org/jira/browse/MAILBOX-44


On 10/05/2011 10:00, Robert Burrell Donkin wrote:
> On Sun, May 8, 2011 at 2:44 PM, Tony Zakula<to...@gmail.com>  wrote:
>> Not sure on where the project leaders want to go,
>
> Projects are community led here at Apache (see eg [1][2][3][4]). If
> there's development interest from the community and it's in scope for
> the project, then that's a direction the code will move in.
>
>> but I think being
>> able to store messages in different formats to be able to plugin to
>> systems would be great.  Instead of each person writing their own
>> parser, most people would just plugin the larger piece to their system
>> and start there.
>
> +1
>
> This vision seems to fit with the work over at Tika [5] and Lucene [6].
>
>> I did not see where you specified what you are thinking about for
>> summer.  Is that a link somewhere yet?
>
> The mailing lists (see [7] and eg [8]) are the primary tools we use
> here at Apache. Stuff only tends to get written down later, if at all.
> We've been throwing ideas around on the lists, hoping that people
> might pick some of them up and run with them ;-)
>
> Robert
>
> [1] http://www.apache.org/foundation/how-it-works.html
> [2] http://www.apache.org/foundation/getinvolved.html
> [3] http://jakarta.apache.org/site/contributing.html
> [4] http://www.apache.org/dev/contributors.html
> [5] http://tika.apache.org/
> [6] http://lucene.apache.org/
> [7] http://www.apache.org/dev/#mail
> [8] http://www.apache.org/dev/contrib-email-tips.html


Re: Headless mail renderer

Posted by Robert Burrell Donkin <ro...@gmail.com>.
On Sun, May 8, 2011 at 2:44 PM, Tony Zakula <to...@gmail.com> wrote:
> Not sure on where the project leaders want to go,

Projects are community led here at Apache (see eg [1][2][3][4]). If
there's development interest from the community and it's in scope for
the project, then that's a direction the code will move in.

> but I think being
> able to store messages in different formats to be able to plugin to
> systems would be great.  Instead of each person writing their own
> parser, most people would just plugin the larger piece to their system
> and start there.

+1

This vision seems to fit with the work over at Tika [5] and Lucene [6].

> I did not see where you specified what you are thinking about for
> summer.  Is that a link somewhere yet?

The mailing lists (see [7] and eg [8]) are the primary tools we use
here at Apache. Stuff only tends to get written down later, if at all.
We've been throwing ideas around on the lists, hoping that people
might pick some of them up and run with them ;-)

Robert

[1] http://www.apache.org/foundation/how-it-works.html
[2] http://www.apache.org/foundation/getinvolved.html
[3] http://jakarta.apache.org/site/contributing.html
[4] http://www.apache.org/dev/contributors.html
[5] http://tika.apache.org/
[6] http://lucene.apache.org/
[7] http://www.apache.org/dev/#mail
[8] http://www.apache.org/dev/contrib-email-tips.html

Re: Headless mail renderer

Posted by Tony Zakula <to...@gmail.com>.
Not sure on where the project leaders want to go, but I think being
able to store messages in different formats to be able to plugin to
systems would be great.  Instead of each person writing their own
parser, most people would just plugin the larger piece to their system
and start there.

I did not see where you specified what you are thinking about for
summer.  Is that a link somewhere yet?

Tony Z



On Fri, May 6, 2011 at 3:32 PM, Robert Burrell Donkin
<ro...@gmail.com> wrote:
> On Fri, May 6, 2011 at 12:05 PM, Tony Zakula <to...@gmail.com> wrote:
>> Hey,
>>
>> That is a cool project!  Congratulations!  I have one where that I am
>> still polishing for release that transforms messages into JSON format
>> and then stores the JSON.  My initial benchmarks on non-optimized code
>> is an average of 25,000 messages an hour with the main bottle neck
>> being the IO.  Cool to see what other people are doing.
>
> Transforming mail into JSON (or XML) seems a hot topic. It'll probably
> be needed for some of the mailbox (Hadoop, Casandra, CouchDB etc) and
> machine learning stuff we hope to experiment with this summer. I think
> that transformation modules building on mime4j would be a good fit for
> Mime4J.
>
> Opinions?
>
> Robert
>

Re: Headless mail renderer

Posted by Robert Burrell Donkin <ro...@gmail.com>.
On Fri, May 6, 2011 at 12:05 PM, Tony Zakula <to...@gmail.com> wrote:
> Hey,
>
> That is a cool project!  Congratulations!  I have one where that I am
> still polishing for release that transforms messages into JSON format
> and then stores the JSON.  My initial benchmarks on non-optimized code
> is an average of 25,000 messages an hour with the main bottle neck
> being the IO.  Cool to see what other people are doing.

Transforming mail into JSON (or XML) seems a hot topic. It'll probably
be needed for some of the mailbox (Hadoop, Casandra, CouchDB etc) and
machine learning stuff we hope to experiment with this summer. I think
that transformation modules building on mime4j would be a good fit for
Mime4J.

Opinions?

Robert

Re: Headless mail renderer

Posted by Tony Zakula <to...@gmail.com>.
Hey,

That is a cool project!  Congratulations!  I have one where that I am
still polishing for release that transforms messages into JSON format
and then stores the JSON.  My initial benchmarks on non-optimized code
is an average of 25,000 messages an hour with the main bottle neck
being the IO.  Cool to see what other people are doing.

Tony Z


On Fri, May 6, 2011 at 3:35 AM, Noss Benoit <be...@secu.lu> wrote:
> Hello Stefano and all the other who helped me,
>
> I worked with two students on a headless mail renderer (written in JAVA)
> I recently opened a project on SourceForge to share this experience
> (http://sourceforge.net/projects/mailtopdf/)
>
> Purpose is to render allmost all mails (body + attachments) into one or more
> PDFs. Focus was not set on a "sexy" rendition but on a rendition at all.
> Mails are read through imap or from a directory, renderer and saved as PDF
> in an output directory. It uses OpenOffice and JAI in background (for the
> attachments)
> I'm quite happy with the first results : it renders 98% of the mails with
> their attachments (mean pdf rendition value per mail =300ms on a normal
> machine)
>
> Just to let you know it and to thank again
>
>
> Benoīt NOSS
>
>
>
>
> On 25.01.2011 10:30, Stefano Bagnara wrote:
>>
>> 2011/1/25 Noss Benoit<be...@secu.lu>:
>>>
>>> Hi, after your comments, I know think I have to split my project in two
>>> parts
>>>
>>> 1/ The first part has to parse the message and write an html or xhtml
>>> page
>>> representing the output I want for the message
>>> 2/ The second part has to render the html I precedently generated to PDF
>>
>> I do that in a single step because of the content-id "cid:" image
>> references.
>> BTW logically you need to separate components: parser and renderer.
>>
>>> I tried flying saucer in the past, it can generate PDF, but it needed
>>> strict
>>> XHTML for the input, and lots of mails are not strict XHTML
>>
>> I've had very good results parsing the html with validator.nu parser:
>> http://about.validator.nu/htmlparser/
>>
>> I parsed thousands of HTML email and tested most html parser out there
>> and validator.nu was the only one parsing them all.
>>
>>> On the one hand, I think I can improve my parser to get the html I want
>>> for
>>> most of the mails I have to transform.
>>> On the other hand, I don't know the openoffice SDK, webkit and Mozilla,
>>> and
>>> html rendering will be the hardest part....
>>
>> If you used flying saucer in past then go ahead with that.
>>
>> Stefano
>>
>
>
>
>
>
>

Re: Headless mail renderer

Posted by Noss Benoit <be...@secu.lu>.
Hi Eric,

    * OpenOffice is used to render MicrosoftOffice and OpenOffice
      attachments into PDF
      OpenOffice badly renders html into PDF
    * iText is used to render XHTML to PDF.
      Like Stefano proposed, render html into XHTML with nu.validator
      (or with jtidy in my case) and then use flying saucer to make a
      PDF out of XHTML
      The flying saucer project internally uses iText 2.x (2.0.8 in my
      case) + iText5.0.6

Benoît

On 06.05.2011 11:54, Eric Charles wrote:
> I Benoït,
>
> Many tks for feedback and contribution.
>
> I just downloaded your zip and saw jodconverter (and associated 
> uno..., ju.. jars from openoffice sdk) and itext libs.
>
> You also import jdoconverter and itext classes in PDFConverterJAVA.
>
> What would you advice for any html/text pdf convertion based on your 
> experience?
>
> Tks,
> - Eric
>
> On 6/05/2011 10:35, Noss Benoit wrote:
>> Hello Stefano and all the other who helped me,
>>
>> I worked with two students on a headless mail renderer (written in JAVA)
>> I recently opened a project on SourceForge to share this experience
>> (http://sourceforge.net/projects/mailtopdf/)
>>
>> Purpose is to render allmost all mails (body + attachments) into one or
>> more PDFs. Focus was not set on a "sexy" rendition but on a rendition at
>> all. Mails are read through imap or from a directory, renderer and saved
>> as PDF in an output directory. It uses OpenOffice and JAI in background
>> (for the attachments)
>> I'm quite happy with the first results : it renders 98% of the mails
>> with their attachments (mean pdf rendition value per mail =300ms on a
>> normal machine)
>>
>> Just to let you know it and to thank again
>>
>>
>> Benoît NOSS
>>
>>
>>
>>
>> On 25.01.2011 10:30, Stefano Bagnara wrote:
>>> 2011/1/25 Noss Benoit<be...@secu.lu>:
>>>> Hi, after your comments, I know think I have to split my project in 
>>>> two
>>>> parts
>>>>
>>>> 1/ The first part has to parse the message and write an html or xhtml
>>>> page
>>>> representing the output I want for the message
>>>> 2/ The second part has to render the html I precedently generated 
>>>> to PDF
>>> I do that in a single step because of the content-id "cid:" image
>>> references.
>>> BTW logically you need to separate components: parser and renderer.
>>>
>>>> I tried flying saucer in the past, it can generate PDF, but it needed
>>>> strict
>>>> XHTML for the input, and lots of mails are not strict XHTML
>>> I've had very good results parsing the html with validator.nu parser:
>>> http://about.validator.nu/htmlparser/
>>>
>>> I parsed thousands of HTML email and tested most html parser out there
>>> and validator.nu was the only one parsing them all.
>>>
>>>> On the one hand, I think I can improve my parser to get the html I
>>>> want for
>>>> most of the mails I have to transform.
>>>> On the other hand, I don't know the openoffice SDK, webkit and
>>>> Mozilla, and
>>>> html rendering will be the hardest part....
>>> If you used flying saucer in past then go ahead with that.
>>>
>>> Stefano
>>>
>>
>>
>>
>>
>>


 

Re: Headless mail renderer

Posted by Eric Charles <er...@apache.org>.
I Benoït,

Many tks for feedback and contribution.

I just downloaded your zip and saw jodconverter (and associated uno..., 
ju.. jars from openoffice sdk) and itext libs.

You also import jdoconverter and itext classes in PDFConverterJAVA.

What would you advice for any html/text pdf convertion based on your 
experience?

Tks,
- Eric

On 6/05/2011 10:35, Noss Benoit wrote:
> Hello Stefano and all the other who helped me,
>
> I worked with two students on a headless mail renderer (written in JAVA)
> I recently opened a project on SourceForge to share this experience
> (http://sourceforge.net/projects/mailtopdf/)
>
> Purpose is to render allmost all mails (body + attachments) into one or
> more PDFs. Focus was not set on a "sexy" rendition but on a rendition at
> all. Mails are read through imap or from a directory, renderer and saved
> as PDF in an output directory. It uses OpenOffice and JAI in background
> (for the attachments)
> I'm quite happy with the first results : it renders 98% of the mails
> with their attachments (mean pdf rendition value per mail =300ms on a
> normal machine)
>
> Just to let you know it and to thank again
>
>
> Benoît NOSS
>
>
>
>
> On 25.01.2011 10:30, Stefano Bagnara wrote:
>> 2011/1/25 Noss Benoit<be...@secu.lu>:
>>> Hi, after your comments, I know think I have to split my project in two
>>> parts
>>>
>>> 1/ The first part has to parse the message and write an html or xhtml
>>> page
>>> representing the output I want for the message
>>> 2/ The second part has to render the html I precedently generated to PDF
>> I do that in a single step because of the content-id "cid:" image
>> references.
>> BTW logically you need to separate components: parser and renderer.
>>
>>> I tried flying saucer in the past, it can generate PDF, but it needed
>>> strict
>>> XHTML for the input, and lots of mails are not strict XHTML
>> I've had very good results parsing the html with validator.nu parser:
>> http://about.validator.nu/htmlparser/
>>
>> I parsed thousands of HTML email and tested most html parser out there
>> and validator.nu was the only one parsing them all.
>>
>>> On the one hand, I think I can improve my parser to get the html I
>>> want for
>>> most of the mails I have to transform.
>>> On the other hand, I don't know the openoffice SDK, webkit and
>>> Mozilla, and
>>> html rendering will be the hardest part....
>> If you used flying saucer in past then go ahead with that.
>>
>> Stefano
>>
>
>
>
>
>

Re: Headless mail renderer

Posted by Noss Benoit <be...@secu.lu>.
Hello Stefano and all the other who helped me,

I worked with two students on a headless mail renderer (written in JAVA)
I recently opened a project on SourceForge to share this experience 
(http://sourceforge.net/projects/mailtopdf/)

Purpose is to render allmost all mails (body + attachments) into one or 
more PDFs. Focus was not set on a "sexy" rendition but on a rendition at 
all. Mails are read through imap or from a directory, renderer and saved 
as PDF in an output directory. It uses OpenOffice and JAI in background 
(for the attachments)
I'm quite happy with the first results : it renders 98% of the mails 
with their attachments (mean pdf rendition value per mail =300ms on a 
normal machine)

Just to let you know it and to thank again


Benoît NOSS




On 25.01.2011 10:30, Stefano Bagnara wrote:
> 2011/1/25 Noss Benoit<be...@secu.lu>:
>> Hi, after your comments, I know think I have to split my project in two
>> parts
>>
>> 1/ The first part has to parse the message and write an html or xhtml page
>> representing the output I want for the message
>> 2/ The second part has to render the html I precedently generated to PDF
> I do that in a single step because of the content-id "cid:" image references.
> BTW logically you need to separate components: parser and renderer.
>
>> I tried flying saucer in the past, it can generate PDF, but it needed strict
>> XHTML for the input, and lots of mails are not strict XHTML
> I've had very good results parsing the html with validator.nu parser:
> http://about.validator.nu/htmlparser/
>
> I parsed thousands of HTML email and tested most html parser out there
> and validator.nu was the only one parsing them all.
>
>> On the one hand, I think I can improve my parser to get the html I want for
>> most of the mails I have to transform.
>> On the other hand, I don't know the openoffice SDK, webkit and Mozilla, and
>> html rendering will be the hardest part....
> If you used flying saucer in past then go ahead with that.
>
> Stefano
>


 



Re: Headless mail renderer

Posted by Stefano Bagnara <ap...@bago.org>.
2011/1/25 Noss Benoit <be...@secu.lu>:
> Hi, after your comments, I know think I have to split my project in two
> parts
>
> 1/ The first part has to parse the message and write an html or xhtml page
> representing the output I want for the message
> 2/ The second part has to render the html I precedently generated to PDF

I do that in a single step because of the content-id "cid:" image references.
BTW logically you need to separate components: parser and renderer.

> I tried flying saucer in the past, it can generate PDF, but it needed strict
> XHTML for the input, and lots of mails are not strict XHTML

I've had very good results parsing the html with validator.nu parser:
http://about.validator.nu/htmlparser/

I parsed thousands of HTML email and tested most html parser out there
and validator.nu was the only one parsing them all.

> On the one hand, I think I can improve my parser to get the html I want for
> most of the mails I have to transform.
> On the other hand, I don't know the openoffice SDK, webkit and Mozilla, and
> html rendering will be the hardest part....

If you used flying saucer in past then go ahead with that.

Stefano

Re: Headless mail renderer

Posted by Noss Benoit <be...@secu.lu>.
Hi, after your comments, I know think I have to split my project in two 
parts

1/ The first part has to parse the message and write an html or xhtml 
page representing the output I want for the message
2/ The second part has to render the html I precedently generated to PDF

I tried flying saucer in the past, it can generate PDF, but it needed 
strict XHTML for the input, and lots of mails are not strict XHTML
On the one hand, I think I can improve my parser to get the html I want 
for most of the mails I have to transform.
On the other hand, I don't know the openoffice SDK, webkit and Mozilla, 
and html rendering will be the hardest part....

Thanks,
Benoît

On 24.01.2011 18:16, Eric Charles wrote:
> Hi,
>
> fyi
> I also used java/mozilla integration via javaxpcom which needs 
> investment from developer (API changes,...). An alternative is to use 
> an html to pdf add-on and call it from xul with a java/xulrunner 
> integration.
> I also used Flying Saucer but didn't know it was able to generate PDF.
> For your use case, there's also the openoffice SDK which is really 
> well documented and supports a wide range of input/output document 
> format (html, pdf,...).
>
> Tks,
>
> Eric
>
>
> On 24/01/2011 15:09, Noss Benoit wrote:
>> thanks for your comments Stefano, I will look in the directions you 
>> suggested and keep you informed (if you want to)
>>
>> Benoît
>>
>>
>> On 24.01.2011 11:57, Stefano Bagnara wrote:
>>> 2011/1/24 Noss Benoit<be...@secu.lu>:
>>>> Hi Stefano,
>>>> thanks for your answer. In the past, I already tried to do this 
>>>> with the
>>>> javax.mail.Message class.
>>>> it was not a big success..., and found lots of issues due to the 
>>>> variety of
>>>> incoming mails, so couldn't get in production.
>>> You can tweak javamail with some system property to let it parse some
>>> more malformed message.
>>> I say this because I think javamail is ok for this work, too.
>>> Mime4j may be a little simpler, but I'm not sure it worth porting your
>>> code if you already have javamail code ready.
>>>
>>> With both you will have anyway to manually deal with mime parts and
>>> decide what to do with each part (mime4j removes the complexity of the
>>> activation framework and automatic object decoding done by javamail).
>>>
>>>> With each parsed Message, I tried to build in parallel a xhtml page
>>>> representing its content (From: To: Subject: Date: and body content)
>>>> When the attachement was a message, I recursively went into it and 
>>>> appended
>>>> info found in the xhtml I previously created
>>>> When I found html, I tried to transform it to XHTML with tidy, then 
>>>> to PDF
>>>> with iText
>>>>                                     when XHTML transformation 
>>>> failed and had
>>>> a multipart/alternative, I then rendered txt to PDF
>>>> When I found attached images, I rendered them to PDF
>>>> When I found office documents I didn't transform them
>>>> After that I merged all created PDF in one big PDF and checked it 
>>>> in to
>>>> Documentum DB (for one message, one pdf)
>>> For xhtml to pdf rendering you may want to evaluate xhtmlrenderer (aka
>>> Flying Saucer).
>>> It is the best pure java xhtml renderer out there: it is not near to
>>> real web browsers but much better than other java rendering I tested.
>>>
>>>> The aim of the project is not to have a pretty rendering of all 
>>>> mail, it's
>>>> just to keep track of messages our client sent.
>>>>
>>>> I faced three big issues :
>>>> **************************
>>>> 0/ multipart/mixed with inline image content in "cid:...."
>>> Sure, you have to do manual work with this. Look for parts with
>>> Content-ID and alter references in the html urls to link to this
>>> objects.
>>> Depending on your rendering engine you should be able to plug your own
>>> url resolver and intercept cid: urls to provide the streams from the
>>> appropriate mime parts (I do that using Flying Sourcer)
>>>
>>>> 1/ like you said html to pdf rendering is difficult and (tidy+iText or
>>>> multipart/alternative) was not always working.
>>>>     If only I could use the Mozilla components to render it, but my
>>>> understanding of it is not high enough
>>> You can use mozilla components or even webkit: just google and you
>>> will find informations. I preferred Flying Saucer because I don't want
>>> to run X (even xvfb) on my servers for this task.
>>>
>>>> 2/ Special caracters and encoding pb in headers and attached file 
>>>> names
>>> I've had issues only with oriental encodings: they are difficult to
>>> support in flying saucer. No problems with european encodings.
>>>
>>> Stefano
>>
>>
>>
>>
>>
>


 



Re: Headless mail renderer

Posted by Eric Charles <er...@apache.org>.
Hi,

fyi
I also used java/mozilla integration via javaxpcom which needs 
investment from developer (API changes,...). An alternative is to use an 
html to pdf add-on and call it from xul with a java/xulrunner integration.
I also used Flying Saucer but didn't know it was able to generate PDF.
For your use case, there's also the openoffice SDK which is really well 
documented and supports a wide range of input/output document format 
(html, pdf,...).

Tks,

Eric


On 24/01/2011 15:09, Noss Benoit wrote:
> thanks for your comments Stefano, I will look in the directions you 
> suggested and keep you informed (if you want to)
>
> Benoît
>
>
> On 24.01.2011 11:57, Stefano Bagnara wrote:
>> 2011/1/24 Noss Benoit<be...@secu.lu>:
>>> Hi Stefano,
>>> thanks for your answer. In the past, I already tried to do this with 
>>> the
>>> javax.mail.Message class.
>>> it was not a big success..., and found lots of issues due to the 
>>> variety of
>>> incoming mails, so couldn't get in production.
>> You can tweak javamail with some system property to let it parse some
>> more malformed message.
>> I say this because I think javamail is ok for this work, too.
>> Mime4j may be a little simpler, but I'm not sure it worth porting your
>> code if you already have javamail code ready.
>>
>> With both you will have anyway to manually deal with mime parts and
>> decide what to do with each part (mime4j removes the complexity of the
>> activation framework and automatic object decoding done by javamail).
>>
>>> With each parsed Message, I tried to build in parallel a xhtml page
>>> representing its content (From: To: Subject: Date: and body content)
>>> When the attachement was a message, I recursively went into it and 
>>> appended
>>> info found in the xhtml I previously created
>>> When I found html, I tried to transform it to XHTML with tidy, then 
>>> to PDF
>>> with iText
>>>                                     when XHTML transformation failed 
>>> and had
>>> a multipart/alternative, I then rendered txt to PDF
>>> When I found attached images, I rendered them to PDF
>>> When I found office documents I didn't transform them
>>> After that I merged all created PDF in one big PDF and checked it in to
>>> Documentum DB (for one message, one pdf)
>> For xhtml to pdf rendering you may want to evaluate xhtmlrenderer (aka
>> Flying Saucer).
>> It is the best pure java xhtml renderer out there: it is not near to
>> real web browsers but much better than other java rendering I tested.
>>
>>> The aim of the project is not to have a pretty rendering of all 
>>> mail, it's
>>> just to keep track of messages our client sent.
>>>
>>> I faced three big issues :
>>> **************************
>>> 0/ multipart/mixed with inline image content in "cid:...."
>> Sure, you have to do manual work with this. Look for parts with
>> Content-ID and alter references in the html urls to link to this
>> objects.
>> Depending on your rendering engine you should be able to plug your own
>> url resolver and intercept cid: urls to provide the streams from the
>> appropriate mime parts (I do that using Flying Sourcer)
>>
>>> 1/ like you said html to pdf rendering is difficult and (tidy+iText or
>>> multipart/alternative) was not always working.
>>>     If only I could use the Mozilla components to render it, but my
>>> understanding of it is not high enough
>> You can use mozilla components or even webkit: just google and you
>> will find informations. I preferred Flying Saucer because I don't want
>> to run X (even xvfb) on my servers for this task.
>>
>>> 2/ Special caracters and encoding pb in headers and attached file names
>> I've had issues only with oriental encodings: they are difficult to
>> support in flying saucer. No problems with european encodings.
>>
>> Stefano
>
>
>
>
>


Re: Headless mail renderer

Posted by Noss Benoit <be...@secu.lu>.
thanks for your comments Stefano, I will look in the directions you 
suggested and keep you informed (if you want to)

Benoît


On 24.01.2011 11:57, Stefano Bagnara wrote:
> 2011/1/24 Noss Benoit<be...@secu.lu>:
>> Hi Stefano,
>> thanks for your answer. In the past, I already tried to do this with the
>> javax.mail.Message class.
>> it was not a big success..., and found lots of issues due to the variety of
>> incoming mails, so couldn't get in production.
> You can tweak javamail with some system property to let it parse some
> more malformed message.
> I say this because I think javamail is ok for this work, too.
> Mime4j may be a little simpler, but I'm not sure it worth porting your
> code if you already have javamail code ready.
>
> With both you will have anyway to manually deal with mime parts and
> decide what to do with each part (mime4j removes the complexity of the
> activation framework and automatic object decoding done by javamail).
>
>> With each parsed Message, I tried to build in parallel a xhtml page
>> representing its content (From: To: Subject: Date: and body content)
>> When the attachement was a message, I recursively went into it and appended
>> info found in the xhtml I previously created
>> When I found html, I tried to transform it to XHTML with tidy, then to PDF
>> with iText
>>                                     when XHTML transformation failed and had
>> a multipart/alternative, I then rendered txt to PDF
>> When I found attached images, I rendered them to PDF
>> When I found office documents I didn't transform them
>> After that I merged all created PDF in one big PDF and checked it in to
>> Documentum DB (for one message, one pdf)
> For xhtml to pdf rendering you may want to evaluate xhtmlrenderer (aka
> Flying Saucer).
> It is the best pure java xhtml renderer out there: it is not near to
> real web browsers but much better than other java rendering I tested.
>
>> The aim of the project is not to have a pretty rendering of all mail, it's
>> just to keep track of messages our client sent.
>>
>> I faced three big issues :
>> **************************
>> 0/ multipart/mixed with inline image content in "cid:...."
> Sure, you have to do manual work with this. Look for parts with
> Content-ID and alter references in the html urls to link to this
> objects.
> Depending on your rendering engine you should be able to plug your own
> url resolver and intercept cid: urls to provide the streams from the
> appropriate mime parts (I do that using Flying Sourcer)
>
>> 1/ like you said html to pdf rendering is difficult and (tidy+iText or
>> multipart/alternative) was not always working.
>>     If only I could use the Mozilla components to render it, but my
>> understanding of it is not high enough
> You can use mozilla components or even webkit: just google and you
> will find informations. I preferred Flying Saucer because I don't want
> to run X (even xvfb) on my servers for this task.
>
>> 2/ Special caracters and encoding pb in headers and attached file names
> I've had issues only with oriental encodings: they are difficult to
> support in flying saucer. No problems with european encodings.
>
> Stefano


 



Re: Headless mail renderer

Posted by Stefano Bagnara <ap...@bago.org>.
2011/1/24 Noss Benoit <be...@secu.lu>:
> Hi Stefano,
> thanks for your answer. In the past, I already tried to do this with the
> javax.mail.Message class.
> it was not a big success..., and found lots of issues due to the variety of
> incoming mails, so couldn't get in production.

You can tweak javamail with some system property to let it parse some
more malformed message.
I say this because I think javamail is ok for this work, too.
Mime4j may be a little simpler, but I'm not sure it worth porting your
code if you already have javamail code ready.

With both you will have anyway to manually deal with mime parts and
decide what to do with each part (mime4j removes the complexity of the
activation framework and automatic object decoding done by javamail).

> With each parsed Message, I tried to build in parallel a xhtml page
> representing its content (From: To: Subject: Date: and body content)
> When the attachement was a message, I recursively went into it and appended
> info found in the xhtml I previously created
> When I found html, I tried to transform it to XHTML with tidy, then to PDF
> with iText
>                                    when XHTML transformation failed and had
> a multipart/alternative, I then rendered txt to PDF
> When I found attached images, I rendered them to PDF
> When I found office documents I didn't transform them
> After that I merged all created PDF in one big PDF and checked it in to
> Documentum DB (for one message, one pdf)

For xhtml to pdf rendering you may want to evaluate xhtmlrenderer (aka
Flying Saucer).
It is the best pure java xhtml renderer out there: it is not near to
real web browsers but much better than other java rendering I tested.

> The aim of the project is not to have a pretty rendering of all mail, it's
> just to keep track of messages our client sent.
>
> I faced three big issues :
> **************************
> 0/ multipart/mixed with inline image content in "cid:...."

Sure, you have to do manual work with this. Look for parts with
Content-ID and alter references in the html urls to link to this
objects.
Depending on your rendering engine you should be able to plug your own
url resolver and intercept cid: urls to provide the streams from the
appropriate mime parts (I do that using Flying Sourcer)

> 1/ like you said html to pdf rendering is difficult and (tidy+iText or
> multipart/alternative) was not always working.
>    If only I could use the Mozilla components to render it, but my
> understanding of it is not high enough

You can use mozilla components or even webkit: just google and you
will find informations. I preferred Flying Saucer because I don't want
to run X (even xvfb) on my servers for this task.

> 2/ Special caracters and encoding pb in headers and attached file names

I've had issues only with oriental encodings: they are difficult to
support in flying saucer. No problems with european encodings.

Stefano

Re: Headless mail renderer

Posted by Noss Benoit <be...@secu.lu>.
Hi Stefano,
thanks for your answer. In the past, I already tried to do this with the 
javax.mail.Message class.
it was not a big success..., and found lots of issues due to the variety 
of incoming mails, so couldn't get in production.
With each parsed Message, I tried to build in parallel a xhtml page 
representing its content (From: To: Subject: Date: and body content)
When the attachement was a message, I recursively went into it and 
appended info found in the xhtml I previously created
When I found html, I tried to transform it to XHTML with tidy, then to 
PDF with iText
                                     when XHTML transformation failed 
and had a multipart/alternative, I then rendered txt to PDF
When I found attached images, I rendered them to PDF
When I found office documents I didn't transform them
After that I merged all created PDF in one big PDF and checked it in to 
Documentum DB (for one message, one pdf)

The aim of the project is not to have a pretty rendering of all mail, 
it's just to keep track of messages our client sent.

I faced three big issues :
**************************
0/ multipart/mixed with inline image content in "cid:...."
1/ like you said html to pdf rendering is difficult and (tidy+iText or 
multipart/alternative) was not always working.
     If only I could use the Mozilla components to render it, but my 
understanding of it is not high enough
2/ Special caracters and encoding pb in headers and attached file names

Benoît.

On 24.01.2011 09:34, Stefano Bagnara wrote:
> 2011/1/24 Noss Benoit<be...@secu.lu>:
>> I don't want to spam you with this question, but I would like to make an
>> headless PDF mail renderer.
>> In my project, I want to batch process incoming mails and inject them in a
>> content management DB as PDF.
>> Am I on the right way if I use your MimeStreamParser combined with a custom
>> handler to make rendering?
>> Can I access the content? Do you suggest something else for this?
> You can use mime4j but you will have to manually deal with
> attachments, multiparts/alternative to decide what part to render,
> multipart/mixed to get inline images streams to be placed in the html.
> And how do you plan to do headless html to pdf rendering? I think this
> is the difficult task. Parsing with mime4j is easy: just look at some
> example.
>
> Stefano


 



Re: Headless mail renderer

Posted by Stefano Bagnara <ap...@bago.org>.
2011/1/24 Noss Benoit <be...@secu.lu>:
> I don't want to spam you with this question, but I would like to make an
> headless PDF mail renderer.
> In my project, I want to batch process incoming mails and inject them in a
> content management DB as PDF.
> Am I on the right way if I use your MimeStreamParser combined with a custom
> handler to make rendering?
> Can I access the content? Do you suggest something else for this?

You can use mime4j but you will have to manually deal with
attachments, multiparts/alternative to decide what part to render,
multipart/mixed to get inline images streams to be placed in the html.
And how do you plan to do headless html to pdf rendering? I think this
is the difficult task. Parsing with mime4j is easy: just look at some
example.

Stefano