You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mime4j-dev@james.apache.org by Noss Benoit <be...@secu.lu> on 2011/05/06 10:35:36 UTC

Re: Headless mail renderer

Hello Stefano and all the other who helped me,

I worked with two students on a headless mail renderer (written in JAVA)
I recently opened a project on SourceForge to share this experience 
(http://sourceforge.net/projects/mailtopdf/)

Purpose is to render allmost all mails (body + attachments) into one or 
more PDFs. Focus was not set on a "sexy" rendition but on a rendition at 
all. Mails are read through imap or from a directory, renderer and saved 
as PDF in an output directory. It uses OpenOffice and JAI in background 
(for the attachments)
I'm quite happy with the first results : it renders 98% of the mails 
with their attachments (mean pdf rendition value per mail =300ms on a 
normal machine)

Just to let you know it and to thank again


Benoît NOSS




On 25.01.2011 10:30, Stefano Bagnara wrote:
> 2011/1/25 Noss Benoit<be...@secu.lu>:
>> Hi, after your comments, I know think I have to split my project in two
>> parts
>>
>> 1/ The first part has to parse the message and write an html or xhtml page
>> representing the output I want for the message
>> 2/ The second part has to render the html I precedently generated to PDF
> I do that in a single step because of the content-id "cid:" image references.
> BTW logically you need to separate components: parser and renderer.
>
>> I tried flying saucer in the past, it can generate PDF, but it needed strict
>> XHTML for the input, and lots of mails are not strict XHTML
> I've had very good results parsing the html with validator.nu parser:
> http://about.validator.nu/htmlparser/
>
> I parsed thousands of HTML email and tested most html parser out there
> and validator.nu was the only one parsing them all.
>
>> On the one hand, I think I can improve my parser to get the html I want for
>> most of the mails I have to transform.
>> On the other hand, I don't know the openoffice SDK, webkit and Mozilla, and
>> html rendering will be the hardest part....
> If you used flying saucer in past then go ahead with that.
>
> Stefano
>


 



Re: Headless mail renderer

Posted by Tony Zakula <to...@gmail.com>.
Hi Robert.  Thank you for the detailed response and links.

On Tue, May 10, 2011 at 3:00 AM, Robert Burrell Donkin
<ro...@gmail.com> wrote:
> On Sun, May 8, 2011 at 2:44 PM, Tony Zakula <to...@gmail.com> wrote:
>> Not sure on where the project leaders want to go,
>
> Projects are community led here at Apache (see eg [1][2][3][4]). If
> there's development interest from the community and it's in scope for
> the project, then that's a direction the code will move in.
>
>> but I think being
>> able to store messages in different formats to be able to plugin to
>> systems would be great.  Instead of each person writing their own
>> parser, most people would just plugin the larger piece to their system
>> and start there.
>
> +1
>
> This vision seems to fit with the work over at Tika [5] and Lucene [6].
>
>> I did not see where you specified what you are thinking about for
>> summer.  Is that a link somewhere yet?
>
> The mailing lists (see [7] and eg [8]) are the primary tools we use
> here at Apache. Stuff only tends to get written down later, if at all.
> We've been throwing ideas around on the lists, hoping that people
> might pick some of them up and run with them ;-)
>
> Robert
>
> [1] http://www.apache.org/foundation/how-it-works.html
> [2] http://www.apache.org/foundation/getinvolved.html
> [3] http://jakarta.apache.org/site/contributing.html
> [4] http://www.apache.org/dev/contributors.html
> [5] http://tika.apache.org/
> [6] http://lucene.apache.org/
> [7] http://www.apache.org/dev/#mail
> [8] http://www.apache.org/dev/contrib-email-tips.html
>

I am on a few Apache lists.  However, I guess I did not see on this
list ideas being tossed about for where to go with Mime4J this summer.
 Maybe I missed it or the discussion was on a list I am not on
presently.   I would love to become a contributor to an Apache project
if I can make time to do it.

If you have a recommendation for me to join a list besides Mime4J,
please let me know.

Thanks,

Tony Z

Re: Headless mail renderer

Posted by Tony Zakula <to...@gmail.com>.
Hi Eric,

>
> We've got also the apache-extras [1] which is mercurial (hosted by google
> for addons to apache projects).

That good.  I will check it out.  I prefer mercurial.

>
> I was wondering if the parser can be used without the full myna server. I
> mean, we simply need to parse mail, not to configure a full server with
> users,... ([2]).

The core Java part of Myna to run the JS is only 4 Java classes to
wrap the Mozilla Rhino runtime.  All of the rest of the functionality
is provided via JavaScript.  I will carbon copy the main developer of
Myna on this message, but I am pretty sure we can create a trimmed
down version very easily.  Worst case scenario if you could not
integrate it, a Host object for Rhino could be written or Myna's could
be tweaked.  I do reuse some JavaScript Myna libraries, but that could
be dropped in anywhere.

>
> Also, is the parser component packaged as a maven module to be easily used
> as dependency somewhere else?

Ah well, I said it was rough around the edges.  :-)  My use case
scenario right now is to drop this on a server running a commercial
mail server to be an API gateway to web applications to specialized
functionality.  A bridge to the commercial mail server so to speak.
It was a project I am working on for a client specific to their specs.
 This tiny lightweight runtime allows me drop this on a server and
have have a bridge to that system.  When I make updates, I pull all
the new JS code with Mercurial to all the servers.  The project is not
complete either.  Just the parsing and a couple of API calls so far.
So that kind of stuff will be up to you.

>
> Whatever the response to the 2 above questions, feel free to push it where
> you like. I will look at it :)

Kind of let me know what you think about the above to help me package
it, but also make sure you are still interested.

Thanks.

Tony

>
> Tks,
> - Eric
>
> [1] http://code.google.com/a/apache-extras.org/hosting/
> [2] http://www.mynajs.org/site/article/MynaPermissionsAdministrator.ejs
>
>
> On 13/05/2011 01:51, Tony Zakula wrote:
>>
>> Hi Eric,
>>
>> JavaScript is also extremely flexible.  With Rhino, you get the full
>> power of Java plus a lot of flexibility.
>>
>> What is the preferred spot of release?  Github, Bitbuckit, or is there
>> another?
>>
>> Tony
>>
>>
>> On Thu, May 12, 2011 at 8:19 AM, Eric Charles<er...@apache.org>  wrote:
>>>
>>> Hi Tony,
>>>
>>> Javascript in james server would be a primeur.
>>> But why not... there is more and more JS "on the other side" (thinking to
>>> Node.js...). I'm using today Jackson to manipulate JSON in Java, but
>>> Javascript has a more natural fit, so I understand why you choose it.
>>>
>>> We will start around end-May with MAILBOX-44, but there today no
>>> discussions/decisions on the chosen format to persist mail.
>>>
>>> I would say "don't hurry, don't put pressure on you, and keep us updated
>>> when you think to release it" :)
>>>
>>> Tks,
>>> - Eric
>>>
>>> On 12/05/2011 14:35, Tony Zakula wrote:
>>>>
>>>> Hi Eric,
>>>>
>>>> I would be more than happy to release the code now even though it is
>>>> not entirely finished if you are interested.  The parsing part is
>>>> pretty good, and I am using it in a production project right now.  I
>>>> am not sure it will fit your bill though as I am using it on a mail
>>>> server to do message list bounce processing.  Although I write Java
>>>> code for a living, I wanted to make it easy to modify this utility on
>>>> a running server so I used an open source project I contribute to at
>>>> Mynajs.org which is built on top of the Mozilla Rhino project.  I use
>>>> Mime4J, but my code is written in JavaScript.
>>>>
>>>> I would be more than happy to release the code now if you are
>>>> interested.
>>>>
>>>> Please let me know.
>>>>
>>>> Thanks!
>>>>
>>>> Tony
>>>>
>>>> On Wed, May 11, 2011 at 11:21 PM, Eric Charles<er...@apache.org>
>>>>  wrote:
>>>>>
>>>>> Hi Tony,
>>>>>
>>>>> We are starting to work on MAILBOX-44 "Design and implement a
>>>>> distributed
>>>>> mailbox using Hadoop" [1]
>>>>>
>>>>> We will need to store the mail in hadoop and the JSON format (in avro
>>>>> file)
>>>>> may be a option.
>>>>>
>>>>> You said you are "still polishing for release" your JSON transformer.
>>>>> Have you got any plan to release it in opensource so we could use it ?
>>>>>
>>>>> Tks,
>>>>> Eric
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/MAILBOX-44
>>>>>
>>>>>
>>>>> On 10/05/2011 10:00, Robert Burrell Donkin wrote:
>>>>>>
>>>>>> On Sun, May 8, 2011 at 2:44 PM, Tony Zakula<to...@gmail.com>
>>>>>>  wrote:
>>>>>>>
>>>>>>> Not sure on where the project leaders want to go,
>>>>>>
>>>>>> Projects are community led here at Apache (see eg [1][2][3][4]). If
>>>>>> there's development interest from the community and it's in scope for
>>>>>> the project, then that's a direction the code will move in.
>>>>>>
>>>>>>> but I think being
>>>>>>> able to store messages in different formats to be able to plugin to
>>>>>>> systems would be great.  Instead of each person writing their own
>>>>>>> parser, most people would just plugin the larger piece to their
>>>>>>> system
>>>>>>> and start there.
>>>>>>
>>>>>> +1
>>>>>>
>>>>>> This vision seems to fit with the work over at Tika [5] and Lucene
>>>>>> [6].
>>>>>>
>>>>>>> I did not see where you specified what you are thinking about for
>>>>>>> summer.  Is that a link somewhere yet?
>>>>>>
>>>>>> The mailing lists (see [7] and eg [8]) are the primary tools we use
>>>>>> here at Apache. Stuff only tends to get written down later, if at all.
>>>>>> We've been throwing ideas around on the lists, hoping that people
>>>>>> might pick some of them up and run with them ;-)
>>>>>>
>>>>>> Robert
>>>>>>
>>>>>> [1] http://www.apache.org/foundation/how-it-works.html
>>>>>> [2] http://www.apache.org/foundation/getinvolved.html
>>>>>> [3] http://jakarta.apache.org/site/contributing.html
>>>>>> [4] http://www.apache.org/dev/contributors.html
>>>>>> [5] http://tika.apache.org/
>>>>>> [6] http://lucene.apache.org/
>>>>>> [7] http://www.apache.org/dev/#mail
>>>>>> [8] http://www.apache.org/dev/contrib-email-tips.html
>>>>>
>>>>>
>>>
>>>
>
>

Re: JSON parsing in mynajs (was Re: Headless mail renderer)

Posted by Tony Zakula <to...@gmail.com>.
Hi Eric,

I will put it on my list, but the main work is being done in - [1]

The method probably for your purposes to look at is at the bottom of
that file and is:

var parseMessage = function parse_message(fileName) {
  // This function parses the message and gets it back in JSON
  var object = null;
  var fis = new Myna.File(fileName).getInputStream();
  //Create message with stream from file
  try {
    var mimeMsg = new org.apache.james.mime4j.message.Message(fis);
  }
  catch (e) {
    // if error it is from parser, just delete message.
    //Myna.println("Mime4J error parsing message")
  }
  object = files.createEntityNode(mimeMsg);
  fis.close();
  return object;
}

The create entity node located at the following link is really the
recursive function that does the parsing - [2]

Things are setup through index.ejs to be able to be called from an
http connection as Myna is deployed as a Java webapp.  To call from a
Java main method, we would need to customize Myna or create a Rhino
host object as Mark described [3].

Tony Zakula
Direct Line (906) 364-8082

[1] https://bitbucket.org/tzakula/javascript-email-bounce-processor/src/65b933d28c1f/bounceProcessor/processBounces.sjs
[2] https://bitbucket.org/tzakula/javascript-email-bounce-processor/src/65b933d28c1f/bounceProcessor/createEntityNode.sjs
[3] https://groups.google.com/forum/#!topic/mynajs-general/UnMTvxHNE



On Mon, May 16, 2011 at 4:47 AM, Eric Charles <er...@apache.org> wrote:
> Hi Tony,
>
> Cool! Tks for this :)
>
> Event if index.ejs gives some clues on how to run, it would be great to have
> a short README that explains how to use it (and how to call it from a Java
> main class).
>
> Tks,
> Eric
>
>
> On 16/05/2011 04:45, Tony Zakula wrote:
>>
>> Hi Eric and all,
>>
>> I have posted some basic code for parsing emails using Mime4J and
>> JavaScript at
>>
>> https://bitbucket.org/tzakula/javascript-email-bounce-processor
>>
>> Thanks.
>>
>> Tony Z
>>
>>
>>
>> On Thu, May 12, 2011 at 10:28 PM, Eric Charles<er...@apache.org>  wrote:
>>>
>>> Hi Tony,
>>>
>>> We've got also the apache-extras [1] which is mercurial (hosted by google
>>> for addons to apache projects).
>>>
>>> I was wondering if the parser can be used without the full myna server. I
>>> mean, we simply need to parse mail, not to configure a full server with
>>> users,... ([2]).
>>>
>>> Also, is the parser component packaged as a maven module to be easily
>>> used
>>> as dependency somewhere else?
>>>
>>> Whatever the response to the 2 above questions, feel free to push it
>>> where
>>> you like. I will look at it :)
>>>
>>> Tks,
>>> - Eric
>>>
>>> [1] http://code.google.com/a/apache-extras.org/hosting/
>>> [2] http://www.mynajs.org/site/article/MynaPermissionsAdministrator.ejs
>>>
>>>
>>> On 13/05/2011 01:51, Tony Zakula wrote:
>>>>
>>>> Hi Eric,
>>>>
>>>> JavaScript is also extremely flexible.  With Rhino, you get the full
>>>> power of Java plus a lot of flexibility.
>>>>
>>>> What is the preferred spot of release?  Github, Bitbuckit, or is there
>>>> another?
>>>>
>>>> Tony
>>>>
>>>>
>>>> On Thu, May 12, 2011 at 8:19 AM, Eric Charles<er...@apache.org>    wrote:
>>>>>
>>>>> Hi Tony,
>>>>>
>>>>> Javascript in james server would be a primeur.
>>>>> But why not... there is more and more JS "on the other side" (thinking
>>>>> to
>>>>> Node.js...). I'm using today Jackson to manipulate JSON in Java, but
>>>>> Javascript has a more natural fit, so I understand why you choose it.
>>>>>
>>>>> We will start around end-May with MAILBOX-44, but there today no
>>>>> discussions/decisions on the chosen format to persist mail.
>>>>>
>>>>> I would say "don't hurry, don't put pressure on you, and keep us
>>>>> updated
>>>>> when you think to release it" :)
>>>>>
>>>>> Tks,
>>>>> - Eric
>>>>>
>>>>> On 12/05/2011 14:35, Tony Zakula wrote:
>>>>>>
>>>>>> Hi Eric,
>>>>>>
>>>>>> I would be more than happy to release the code now even though it is
>>>>>> not entirely finished if you are interested.  The parsing part is
>>>>>> pretty good, and I am using it in a production project right now.  I
>>>>>> am not sure it will fit your bill though as I am using it on a mail
>>>>>> server to do message list bounce processing.  Although I write Java
>>>>>> code for a living, I wanted to make it easy to modify this utility on
>>>>>> a running server so I used an open source project I contribute to at
>>>>>> Mynajs.org which is built on top of the Mozilla Rhino project.  I use
>>>>>> Mime4J, but my code is written in JavaScript.
>>>>>>
>>>>>> I would be more than happy to release the code now if you are
>>>>>> interested.
>>>>>>
>>>>>> Please let me know.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Tony
>>>>>>
>>>>>> On Wed, May 11, 2011 at 11:21 PM, Eric Charles<er...@apache.org>
>>>>>>  wrote:
>>>>>>>
>>>>>>> Hi Tony,
>>>>>>>
>>>>>>> We are starting to work on MAILBOX-44 "Design and implement a
>>>>>>> distributed
>>>>>>> mailbox using Hadoop" [1]
>>>>>>>
>>>>>>> We will need to store the mail in hadoop and the JSON format (in avro
>>>>>>> file)
>>>>>>> may be a option.
>>>>>>>
>>>>>>> You said you are "still polishing for release" your JSON transformer.
>>>>>>> Have you got any plan to release it in opensource so we could use it
>>>>>>> ?
>>>>>>>
>>>>>>> Tks,
>>>>>>> Eric
>>>>>>>
>>>>>>> [1] https://issues.apache.org/jira/browse/MAILBOX-44
>>>>>>>
>>>>>>>
>>>>>>> On 10/05/2011 10:00, Robert Burrell Donkin wrote:
>>>>>>>>
>>>>>>>> On Sun, May 8, 2011 at 2:44 PM, Tony Zakula<to...@gmail.com>
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>> Not sure on where the project leaders want to go,
>>>>>>>>
>>>>>>>> Projects are community led here at Apache (see eg [1][2][3][4]). If
>>>>>>>> there's development interest from the community and it's in scope
>>>>>>>> for
>>>>>>>> the project, then that's a direction the code will move in.
>>>>>>>>
>>>>>>>>> but I think being
>>>>>>>>> able to store messages in different formats to be able to plugin to
>>>>>>>>> systems would be great.  Instead of each person writing their own
>>>>>>>>> parser, most people would just plugin the larger piece to their
>>>>>>>>> system
>>>>>>>>> and start there.
>>>>>>>>
>>>>>>>> +1
>>>>>>>>
>>>>>>>> This vision seems to fit with the work over at Tika [5] and Lucene
>>>>>>>> [6].
>>>>>>>>
>>>>>>>>> I did not see where you specified what you are thinking about for
>>>>>>>>> summer.  Is that a link somewhere yet?
>>>>>>>>
>>>>>>>> The mailing lists (see [7] and eg [8]) are the primary tools we use
>>>>>>>> here at Apache. Stuff only tends to get written down later, if at
>>>>>>>> all.
>>>>>>>> We've been throwing ideas around on the lists, hoping that people
>>>>>>>> might pick some of them up and run with them ;-)
>>>>>>>>
>>>>>>>> Robert
>>>>>>>>
>>>>>>>> [1] http://www.apache.org/foundation/how-it-works.html
>>>>>>>> [2] http://www.apache.org/foundation/getinvolved.html
>>>>>>>> [3] http://jakarta.apache.org/site/contributing.html
>>>>>>>> [4] http://www.apache.org/dev/contributors.html
>>>>>>>> [5] http://tika.apache.org/
>>>>>>>> [6] http://lucene.apache.org/
>>>>>>>> [7] http://www.apache.org/dev/#mail
>>>>>>>> [8] http://www.apache.org/dev/contrib-email-tips.html
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>>
>
>

JSON parsing in mynajs (was Re: Headless mail renderer)

Posted by Eric Charles <er...@apache.org>.
Hi Tony,

Cool! Tks for this :)

Event if index.ejs gives some clues on how to run, it would be great to 
have a short README that explains how to use it (and how to call it from 
a Java main class).

Tks,
Eric


On 16/05/2011 04:45, Tony Zakula wrote:
> Hi Eric and all,
>
> I have posted some basic code for parsing emails using Mime4J and JavaScript at
>
> https://bitbucket.org/tzakula/javascript-email-bounce-processor
>
> Thanks.
>
> Tony Z
>
>
>
> On Thu, May 12, 2011 at 10:28 PM, Eric Charles<er...@apache.org>  wrote:
>> Hi Tony,
>>
>> We've got also the apache-extras [1] which is mercurial (hosted by google
>> for addons to apache projects).
>>
>> I was wondering if the parser can be used without the full myna server. I
>> mean, we simply need to parse mail, not to configure a full server with
>> users,... ([2]).
>>
>> Also, is the parser component packaged as a maven module to be easily used
>> as dependency somewhere else?
>>
>> Whatever the response to the 2 above questions, feel free to push it where
>> you like. I will look at it :)
>>
>> Tks,
>> - Eric
>>
>> [1] http://code.google.com/a/apache-extras.org/hosting/
>> [2] http://www.mynajs.org/site/article/MynaPermissionsAdministrator.ejs
>>
>>
>> On 13/05/2011 01:51, Tony Zakula wrote:
>>>
>>> Hi Eric,
>>>
>>> JavaScript is also extremely flexible.  With Rhino, you get the full
>>> power of Java plus a lot of flexibility.
>>>
>>> What is the preferred spot of release?  Github, Bitbuckit, or is there
>>> another?
>>>
>>> Tony
>>>
>>>
>>> On Thu, May 12, 2011 at 8:19 AM, Eric Charles<er...@apache.org>    wrote:
>>>>
>>>> Hi Tony,
>>>>
>>>> Javascript in james server would be a primeur.
>>>> But why not... there is more and more JS "on the other side" (thinking to
>>>> Node.js...). I'm using today Jackson to manipulate JSON in Java, but
>>>> Javascript has a more natural fit, so I understand why you choose it.
>>>>
>>>> We will start around end-May with MAILBOX-44, but there today no
>>>> discussions/decisions on the chosen format to persist mail.
>>>>
>>>> I would say "don't hurry, don't put pressure on you, and keep us updated
>>>> when you think to release it" :)
>>>>
>>>> Tks,
>>>> - Eric
>>>>
>>>> On 12/05/2011 14:35, Tony Zakula wrote:
>>>>>
>>>>> Hi Eric,
>>>>>
>>>>> I would be more than happy to release the code now even though it is
>>>>> not entirely finished if you are interested.  The parsing part is
>>>>> pretty good, and I am using it in a production project right now.  I
>>>>> am not sure it will fit your bill though as I am using it on a mail
>>>>> server to do message list bounce processing.  Although I write Java
>>>>> code for a living, I wanted to make it easy to modify this utility on
>>>>> a running server so I used an open source project I contribute to at
>>>>> Mynajs.org which is built on top of the Mozilla Rhino project.  I use
>>>>> Mime4J, but my code is written in JavaScript.
>>>>>
>>>>> I would be more than happy to release the code now if you are
>>>>> interested.
>>>>>
>>>>> Please let me know.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Tony
>>>>>
>>>>> On Wed, May 11, 2011 at 11:21 PM, Eric Charles<er...@apache.org>
>>>>>   wrote:
>>>>>>
>>>>>> Hi Tony,
>>>>>>
>>>>>> We are starting to work on MAILBOX-44 "Design and implement a
>>>>>> distributed
>>>>>> mailbox using Hadoop" [1]
>>>>>>
>>>>>> We will need to store the mail in hadoop and the JSON format (in avro
>>>>>> file)
>>>>>> may be a option.
>>>>>>
>>>>>> You said you are "still polishing for release" your JSON transformer.
>>>>>> Have you got any plan to release it in opensource so we could use it ?
>>>>>>
>>>>>> Tks,
>>>>>> Eric
>>>>>>
>>>>>> [1] https://issues.apache.org/jira/browse/MAILBOX-44
>>>>>>
>>>>>>
>>>>>> On 10/05/2011 10:00, Robert Burrell Donkin wrote:
>>>>>>>
>>>>>>> On Sun, May 8, 2011 at 2:44 PM, Tony Zakula<to...@gmail.com>
>>>>>>>   wrote:
>>>>>>>>
>>>>>>>> Not sure on where the project leaders want to go,
>>>>>>>
>>>>>>> Projects are community led here at Apache (see eg [1][2][3][4]). If
>>>>>>> there's development interest from the community and it's in scope for
>>>>>>> the project, then that's a direction the code will move in.
>>>>>>>
>>>>>>>> but I think being
>>>>>>>> able to store messages in different formats to be able to plugin to
>>>>>>>> systems would be great.  Instead of each person writing their own
>>>>>>>> parser, most people would just plugin the larger piece to their
>>>>>>>> system
>>>>>>>> and start there.
>>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> This vision seems to fit with the work over at Tika [5] and Lucene
>>>>>>> [6].
>>>>>>>
>>>>>>>> I did not see where you specified what you are thinking about for
>>>>>>>> summer.  Is that a link somewhere yet?
>>>>>>>
>>>>>>> The mailing lists (see [7] and eg [8]) are the primary tools we use
>>>>>>> here at Apache. Stuff only tends to get written down later, if at all.
>>>>>>> We've been throwing ideas around on the lists, hoping that people
>>>>>>> might pick some of them up and run with them ;-)
>>>>>>>
>>>>>>> Robert
>>>>>>>
>>>>>>> [1] http://www.apache.org/foundation/how-it-works.html
>>>>>>> [2] http://www.apache.org/foundation/getinvolved.html
>>>>>>> [3] http://jakarta.apache.org/site/contributing.html
>>>>>>> [4] http://www.apache.org/dev/contributors.html
>>>>>>> [5] http://tika.apache.org/
>>>>>>> [6] http://lucene.apache.org/
>>>>>>> [7] http://www.apache.org/dev/#mail
>>>>>>> [8] http://www.apache.org/dev/contrib-email-tips.html
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>


Re: Headless mail renderer

Posted by Tony Zakula <to...@gmail.com>.
Hi Eric and all,

I have posted some basic code for parsing emails using Mime4J and JavaScript at

https://bitbucket.org/tzakula/javascript-email-bounce-processor

Thanks.

Tony Z



On Thu, May 12, 2011 at 10:28 PM, Eric Charles <er...@apache.org> wrote:
> Hi Tony,
>
> We've got also the apache-extras [1] which is mercurial (hosted by google
> for addons to apache projects).
>
> I was wondering if the parser can be used without the full myna server. I
> mean, we simply need to parse mail, not to configure a full server with
> users,... ([2]).
>
> Also, is the parser component packaged as a maven module to be easily used
> as dependency somewhere else?
>
> Whatever the response to the 2 above questions, feel free to push it where
> you like. I will look at it :)
>
> Tks,
> - Eric
>
> [1] http://code.google.com/a/apache-extras.org/hosting/
> [2] http://www.mynajs.org/site/article/MynaPermissionsAdministrator.ejs
>
>
> On 13/05/2011 01:51, Tony Zakula wrote:
>>
>> Hi Eric,
>>
>> JavaScript is also extremely flexible.  With Rhino, you get the full
>> power of Java plus a lot of flexibility.
>>
>> What is the preferred spot of release?  Github, Bitbuckit, or is there
>> another?
>>
>> Tony
>>
>>
>> On Thu, May 12, 2011 at 8:19 AM, Eric Charles<er...@apache.org>  wrote:
>>>
>>> Hi Tony,
>>>
>>> Javascript in james server would be a primeur.
>>> But why not... there is more and more JS "on the other side" (thinking to
>>> Node.js...). I'm using today Jackson to manipulate JSON in Java, but
>>> Javascript has a more natural fit, so I understand why you choose it.
>>>
>>> We will start around end-May with MAILBOX-44, but there today no
>>> discussions/decisions on the chosen format to persist mail.
>>>
>>> I would say "don't hurry, don't put pressure on you, and keep us updated
>>> when you think to release it" :)
>>>
>>> Tks,
>>> - Eric
>>>
>>> On 12/05/2011 14:35, Tony Zakula wrote:
>>>>
>>>> Hi Eric,
>>>>
>>>> I would be more than happy to release the code now even though it is
>>>> not entirely finished if you are interested.  The parsing part is
>>>> pretty good, and I am using it in a production project right now.  I
>>>> am not sure it will fit your bill though as I am using it on a mail
>>>> server to do message list bounce processing.  Although I write Java
>>>> code for a living, I wanted to make it easy to modify this utility on
>>>> a running server so I used an open source project I contribute to at
>>>> Mynajs.org which is built on top of the Mozilla Rhino project.  I use
>>>> Mime4J, but my code is written in JavaScript.
>>>>
>>>> I would be more than happy to release the code now if you are
>>>> interested.
>>>>
>>>> Please let me know.
>>>>
>>>> Thanks!
>>>>
>>>> Tony
>>>>
>>>> On Wed, May 11, 2011 at 11:21 PM, Eric Charles<er...@apache.org>
>>>>  wrote:
>>>>>
>>>>> Hi Tony,
>>>>>
>>>>> We are starting to work on MAILBOX-44 "Design and implement a
>>>>> distributed
>>>>> mailbox using Hadoop" [1]
>>>>>
>>>>> We will need to store the mail in hadoop and the JSON format (in avro
>>>>> file)
>>>>> may be a option.
>>>>>
>>>>> You said you are "still polishing for release" your JSON transformer.
>>>>> Have you got any plan to release it in opensource so we could use it ?
>>>>>
>>>>> Tks,
>>>>> Eric
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/MAILBOX-44
>>>>>
>>>>>
>>>>> On 10/05/2011 10:00, Robert Burrell Donkin wrote:
>>>>>>
>>>>>> On Sun, May 8, 2011 at 2:44 PM, Tony Zakula<to...@gmail.com>
>>>>>>  wrote:
>>>>>>>
>>>>>>> Not sure on where the project leaders want to go,
>>>>>>
>>>>>> Projects are community led here at Apache (see eg [1][2][3][4]). If
>>>>>> there's development interest from the community and it's in scope for
>>>>>> the project, then that's a direction the code will move in.
>>>>>>
>>>>>>> but I think being
>>>>>>> able to store messages in different formats to be able to plugin to
>>>>>>> systems would be great.  Instead of each person writing their own
>>>>>>> parser, most people would just plugin the larger piece to their
>>>>>>> system
>>>>>>> and start there.
>>>>>>
>>>>>> +1
>>>>>>
>>>>>> This vision seems to fit with the work over at Tika [5] and Lucene
>>>>>> [6].
>>>>>>
>>>>>>> I did not see where you specified what you are thinking about for
>>>>>>> summer.  Is that a link somewhere yet?
>>>>>>
>>>>>> The mailing lists (see [7] and eg [8]) are the primary tools we use
>>>>>> here at Apache. Stuff only tends to get written down later, if at all.
>>>>>> We've been throwing ideas around on the lists, hoping that people
>>>>>> might pick some of them up and run with them ;-)
>>>>>>
>>>>>> Robert
>>>>>>
>>>>>> [1] http://www.apache.org/foundation/how-it-works.html
>>>>>> [2] http://www.apache.org/foundation/getinvolved.html
>>>>>> [3] http://jakarta.apache.org/site/contributing.html
>>>>>> [4] http://www.apache.org/dev/contributors.html
>>>>>> [5] http://tika.apache.org/
>>>>>> [6] http://lucene.apache.org/
>>>>>> [7] http://www.apache.org/dev/#mail
>>>>>> [8] http://www.apache.org/dev/contrib-email-tips.html
>>>>>
>>>>>
>>>
>>>
>
>

Re: Headless mail renderer

Posted by Eric Charles <er...@apache.org>.
Hi Tony,

We've got also the apache-extras [1] which is mercurial (hosted by 
google for addons to apache projects).

I was wondering if the parser can be used without the full myna server. 
I mean, we simply need to parse mail, not to configure a full server 
with users,... ([2]).

Also, is the parser component packaged as a maven module to be easily 
used as dependency somewhere else?

Whatever the response to the 2 above questions, feel free to push it 
where you like. I will look at it :)

Tks,
- Eric

[1] http://code.google.com/a/apache-extras.org/hosting/
[2] http://www.mynajs.org/site/article/MynaPermissionsAdministrator.ejs


On 13/05/2011 01:51, Tony Zakula wrote:
> Hi Eric,
>
> JavaScript is also extremely flexible.  With Rhino, you get the full
> power of Java plus a lot of flexibility.
>
> What is the preferred spot of release?  Github, Bitbuckit, or is there another?
>
> Tony
>
>
> On Thu, May 12, 2011 at 8:19 AM, Eric Charles<er...@apache.org>  wrote:
>> Hi Tony,
>>
>> Javascript in james server would be a primeur.
>> But why not... there is more and more JS "on the other side" (thinking to
>> Node.js...). I'm using today Jackson to manipulate JSON in Java, but
>> Javascript has a more natural fit, so I understand why you choose it.
>>
>> We will start around end-May with MAILBOX-44, but there today no
>> discussions/decisions on the chosen format to persist mail.
>>
>> I would say "don't hurry, don't put pressure on you, and keep us updated
>> when you think to release it" :)
>>
>> Tks,
>> - Eric
>>
>> On 12/05/2011 14:35, Tony Zakula wrote:
>>>
>>> Hi Eric,
>>>
>>> I would be more than happy to release the code now even though it is
>>> not entirely finished if you are interested.  The parsing part is
>>> pretty good, and I am using it in a production project right now.  I
>>> am not sure it will fit your bill though as I am using it on a mail
>>> server to do message list bounce processing.  Although I write Java
>>> code for a living, I wanted to make it easy to modify this utility on
>>> a running server so I used an open source project I contribute to at
>>> Mynajs.org which is built on top of the Mozilla Rhino project.  I use
>>> Mime4J, but my code is written in JavaScript.
>>>
>>> I would be more than happy to release the code now if you are interested.
>>>
>>> Please let me know.
>>>
>>> Thanks!
>>>
>>> Tony
>>>
>>> On Wed, May 11, 2011 at 11:21 PM, Eric Charles<er...@apache.org>    wrote:
>>>>
>>>> Hi Tony,
>>>>
>>>> We are starting to work on MAILBOX-44 "Design and implement a distributed
>>>> mailbox using Hadoop" [1]
>>>>
>>>> We will need to store the mail in hadoop and the JSON format (in avro
>>>> file)
>>>> may be a option.
>>>>
>>>> You said you are "still polishing for release" your JSON transformer.
>>>> Have you got any plan to release it in opensource so we could use it ?
>>>>
>>>> Tks,
>>>> Eric
>>>>
>>>> [1] https://issues.apache.org/jira/browse/MAILBOX-44
>>>>
>>>>
>>>> On 10/05/2011 10:00, Robert Burrell Donkin wrote:
>>>>>
>>>>> On Sun, May 8, 2011 at 2:44 PM, Tony Zakula<to...@gmail.com>
>>>>>   wrote:
>>>>>>
>>>>>> Not sure on where the project leaders want to go,
>>>>>
>>>>> Projects are community led here at Apache (see eg [1][2][3][4]). If
>>>>> there's development interest from the community and it's in scope for
>>>>> the project, then that's a direction the code will move in.
>>>>>
>>>>>> but I think being
>>>>>> able to store messages in different formats to be able to plugin to
>>>>>> systems would be great.  Instead of each person writing their own
>>>>>> parser, most people would just plugin the larger piece to their system
>>>>>> and start there.
>>>>>
>>>>> +1
>>>>>
>>>>> This vision seems to fit with the work over at Tika [5] and Lucene [6].
>>>>>
>>>>>> I did not see where you specified what you are thinking about for
>>>>>> summer.  Is that a link somewhere yet?
>>>>>
>>>>> The mailing lists (see [7] and eg [8]) are the primary tools we use
>>>>> here at Apache. Stuff only tends to get written down later, if at all.
>>>>> We've been throwing ideas around on the lists, hoping that people
>>>>> might pick some of them up and run with them ;-)
>>>>>
>>>>> Robert
>>>>>
>>>>> [1] http://www.apache.org/foundation/how-it-works.html
>>>>> [2] http://www.apache.org/foundation/getinvolved.html
>>>>> [3] http://jakarta.apache.org/site/contributing.html
>>>>> [4] http://www.apache.org/dev/contributors.html
>>>>> [5] http://tika.apache.org/
>>>>> [6] http://lucene.apache.org/
>>>>> [7] http://www.apache.org/dev/#mail
>>>>> [8] http://www.apache.org/dev/contrib-email-tips.html
>>>>
>>>>
>>
>>


Re: Headless mail renderer

Posted by Tony Zakula <to...@gmail.com>.
Hi Eric,

JavaScript is also extremely flexible.  With Rhino, you get the full
power of Java plus a lot of flexibility.

What is the preferred spot of release?  Github, Bitbuckit, or is there another?

Tony


On Thu, May 12, 2011 at 8:19 AM, Eric Charles <er...@apache.org> wrote:
> Hi Tony,
>
> Javascript in james server would be a primeur.
> But why not... there is more and more JS "on the other side" (thinking to
> Node.js...). I'm using today Jackson to manipulate JSON in Java, but
> Javascript has a more natural fit, so I understand why you choose it.
>
> We will start around end-May with MAILBOX-44, but there today no
> discussions/decisions on the chosen format to persist mail.
>
> I would say "don't hurry, don't put pressure on you, and keep us updated
> when you think to release it" :)
>
> Tks,
> - Eric
>
> On 12/05/2011 14:35, Tony Zakula wrote:
>>
>> Hi Eric,
>>
>> I would be more than happy to release the code now even though it is
>> not entirely finished if you are interested.  The parsing part is
>> pretty good, and I am using it in a production project right now.  I
>> am not sure it will fit your bill though as I am using it on a mail
>> server to do message list bounce processing.  Although I write Java
>> code for a living, I wanted to make it easy to modify this utility on
>> a running server so I used an open source project I contribute to at
>> Mynajs.org which is built on top of the Mozilla Rhino project.  I use
>> Mime4J, but my code is written in JavaScript.
>>
>> I would be more than happy to release the code now if you are interested.
>>
>> Please let me know.
>>
>> Thanks!
>>
>> Tony
>>
>> On Wed, May 11, 2011 at 11:21 PM, Eric Charles<er...@apache.org>  wrote:
>>>
>>> Hi Tony,
>>>
>>> We are starting to work on MAILBOX-44 "Design and implement a distributed
>>> mailbox using Hadoop" [1]
>>>
>>> We will need to store the mail in hadoop and the JSON format (in avro
>>> file)
>>> may be a option.
>>>
>>> You said you are "still polishing for release" your JSON transformer.
>>> Have you got any plan to release it in opensource so we could use it ?
>>>
>>> Tks,
>>> Eric
>>>
>>> [1] https://issues.apache.org/jira/browse/MAILBOX-44
>>>
>>>
>>> On 10/05/2011 10:00, Robert Burrell Donkin wrote:
>>>>
>>>> On Sun, May 8, 2011 at 2:44 PM, Tony Zakula<to...@gmail.com>
>>>>  wrote:
>>>>>
>>>>> Not sure on where the project leaders want to go,
>>>>
>>>> Projects are community led here at Apache (see eg [1][2][3][4]). If
>>>> there's development interest from the community and it's in scope for
>>>> the project, then that's a direction the code will move in.
>>>>
>>>>> but I think being
>>>>> able to store messages in different formats to be able to plugin to
>>>>> systems would be great.  Instead of each person writing their own
>>>>> parser, most people would just plugin the larger piece to their system
>>>>> and start there.
>>>>
>>>> +1
>>>>
>>>> This vision seems to fit with the work over at Tika [5] and Lucene [6].
>>>>
>>>>> I did not see where you specified what you are thinking about for
>>>>> summer.  Is that a link somewhere yet?
>>>>
>>>> The mailing lists (see [7] and eg [8]) are the primary tools we use
>>>> here at Apache. Stuff only tends to get written down later, if at all.
>>>> We've been throwing ideas around on the lists, hoping that people
>>>> might pick some of them up and run with them ;-)
>>>>
>>>> Robert
>>>>
>>>> [1] http://www.apache.org/foundation/how-it-works.html
>>>> [2] http://www.apache.org/foundation/getinvolved.html
>>>> [3] http://jakarta.apache.org/site/contributing.html
>>>> [4] http://www.apache.org/dev/contributors.html
>>>> [5] http://tika.apache.org/
>>>> [6] http://lucene.apache.org/
>>>> [7] http://www.apache.org/dev/#mail
>>>> [8] http://www.apache.org/dev/contrib-email-tips.html
>>>
>>>
>
>

Re: Headless mail renderer

Posted by Eric Charles <er...@apache.org>.
Hi Tony,

Javascript in james server would be a primeur.
But why not... there is more and more JS "on the other side" (thinking 
to Node.js...). I'm using today Jackson to manipulate JSON in Java, but 
Javascript has a more natural fit, so I understand why you choose it.

We will start around end-May with MAILBOX-44, but there today no 
discussions/decisions on the chosen format to persist mail.

I would say "don't hurry, don't put pressure on you, and keep us updated 
when you think to release it" :)

Tks,
- Eric

On 12/05/2011 14:35, Tony Zakula wrote:
> Hi Eric,
>
> I would be more than happy to release the code now even though it is
> not entirely finished if you are interested.  The parsing part is
> pretty good, and I am using it in a production project right now.  I
> am not sure it will fit your bill though as I am using it on a mail
> server to do message list bounce processing.  Although I write Java
> code for a living, I wanted to make it easy to modify this utility on
> a running server so I used an open source project I contribute to at
> Mynajs.org which is built on top of the Mozilla Rhino project.  I use
> Mime4J, but my code is written in JavaScript.
>
> I would be more than happy to release the code now if you are interested.
>
> Please let me know.
>
> Thanks!
>
> Tony
>
> On Wed, May 11, 2011 at 11:21 PM, Eric Charles<er...@apache.org>  wrote:
>> Hi Tony,
>>
>> We are starting to work on MAILBOX-44 "Design and implement a distributed
>> mailbox using Hadoop" [1]
>>
>> We will need to store the mail in hadoop and the JSON format (in avro file)
>> may be a option.
>>
>> You said you are "still polishing for release" your JSON transformer.
>> Have you got any plan to release it in opensource so we could use it ?
>>
>> Tks,
>> Eric
>>
>> [1] https://issues.apache.org/jira/browse/MAILBOX-44
>>
>>
>> On 10/05/2011 10:00, Robert Burrell Donkin wrote:
>>>
>>> On Sun, May 8, 2011 at 2:44 PM, Tony Zakula<to...@gmail.com>    wrote:
>>>>
>>>> Not sure on where the project leaders want to go,
>>>
>>> Projects are community led here at Apache (see eg [1][2][3][4]). If
>>> there's development interest from the community and it's in scope for
>>> the project, then that's a direction the code will move in.
>>>
>>>> but I think being
>>>> able to store messages in different formats to be able to plugin to
>>>> systems would be great.  Instead of each person writing their own
>>>> parser, most people would just plugin the larger piece to their system
>>>> and start there.
>>>
>>> +1
>>>
>>> This vision seems to fit with the work over at Tika [5] and Lucene [6].
>>>
>>>> I did not see where you specified what you are thinking about for
>>>> summer.  Is that a link somewhere yet?
>>>
>>> The mailing lists (see [7] and eg [8]) are the primary tools we use
>>> here at Apache. Stuff only tends to get written down later, if at all.
>>> We've been throwing ideas around on the lists, hoping that people
>>> might pick some of them up and run with them ;-)
>>>
>>> Robert
>>>
>>> [1] http://www.apache.org/foundation/how-it-works.html
>>> [2] http://www.apache.org/foundation/getinvolved.html
>>> [3] http://jakarta.apache.org/site/contributing.html
>>> [4] http://www.apache.org/dev/contributors.html
>>> [5] http://tika.apache.org/
>>> [6] http://lucene.apache.org/
>>> [7] http://www.apache.org/dev/#mail
>>> [8] http://www.apache.org/dev/contrib-email-tips.html
>>
>>


Re: Headless mail renderer

Posted by Tony Zakula <to...@gmail.com>.
Hi Eric,

I would be more than happy to release the code now even though it is
not entirely finished if you are interested.  The parsing part is
pretty good, and I am using it in a production project right now.  I
am not sure it will fit your bill though as I am using it on a mail
server to do message list bounce processing.  Although I write Java
code for a living, I wanted to make it easy to modify this utility on
a running server so I used an open source project I contribute to at
Mynajs.org which is built on top of the Mozilla Rhino project.  I use
Mime4J, but my code is written in JavaScript.

I would be more than happy to release the code now if you are interested.

Please let me know.

Thanks!

Tony

On Wed, May 11, 2011 at 11:21 PM, Eric Charles <er...@apache.org> wrote:
> Hi Tony,
>
> We are starting to work on MAILBOX-44 "Design and implement a distributed
> mailbox using Hadoop" [1]
>
> We will need to store the mail in hadoop and the JSON format (in avro file)
> may be a option.
>
> You said you are "still polishing for release" your JSON transformer.
> Have you got any plan to release it in opensource so we could use it ?
>
> Tks,
> Eric
>
> [1] https://issues.apache.org/jira/browse/MAILBOX-44
>
>
> On 10/05/2011 10:00, Robert Burrell Donkin wrote:
>>
>> On Sun, May 8, 2011 at 2:44 PM, Tony Zakula<to...@gmail.com>  wrote:
>>>
>>> Not sure on where the project leaders want to go,
>>
>> Projects are community led here at Apache (see eg [1][2][3][4]). If
>> there's development interest from the community and it's in scope for
>> the project, then that's a direction the code will move in.
>>
>>> but I think being
>>> able to store messages in different formats to be able to plugin to
>>> systems would be great.  Instead of each person writing their own
>>> parser, most people would just plugin the larger piece to their system
>>> and start there.
>>
>> +1
>>
>> This vision seems to fit with the work over at Tika [5] and Lucene [6].
>>
>>> I did not see where you specified what you are thinking about for
>>> summer.  Is that a link somewhere yet?
>>
>> The mailing lists (see [7] and eg [8]) are the primary tools we use
>> here at Apache. Stuff only tends to get written down later, if at all.
>> We've been throwing ideas around on the lists, hoping that people
>> might pick some of them up and run with them ;-)
>>
>> Robert
>>
>> [1] http://www.apache.org/foundation/how-it-works.html
>> [2] http://www.apache.org/foundation/getinvolved.html
>> [3] http://jakarta.apache.org/site/contributing.html
>> [4] http://www.apache.org/dev/contributors.html
>> [5] http://tika.apache.org/
>> [6] http://lucene.apache.org/
>> [7] http://www.apache.org/dev/#mail
>> [8] http://www.apache.org/dev/contrib-email-tips.html
>
>

Re: Headless mail renderer

Posted by Eric Charles <er...@apache.org>.
Hi Tony,

We are starting to work on MAILBOX-44 "Design and implement a 
distributed mailbox using Hadoop" [1]

We will need to store the mail in hadoop and the JSON format (in avro 
file) may be a option.

You said you are "still polishing for release" your JSON transformer.
Have you got any plan to release it in opensource so we could use it ?

Tks,
Eric

[1] https://issues.apache.org/jira/browse/MAILBOX-44


On 10/05/2011 10:00, Robert Burrell Donkin wrote:
> On Sun, May 8, 2011 at 2:44 PM, Tony Zakula<to...@gmail.com>  wrote:
>> Not sure on where the project leaders want to go,
>
> Projects are community led here at Apache (see eg [1][2][3][4]). If
> there's development interest from the community and it's in scope for
> the project, then that's a direction the code will move in.
>
>> but I think being
>> able to store messages in different formats to be able to plugin to
>> systems would be great.  Instead of each person writing their own
>> parser, most people would just plugin the larger piece to their system
>> and start there.
>
> +1
>
> This vision seems to fit with the work over at Tika [5] and Lucene [6].
>
>> I did not see where you specified what you are thinking about for
>> summer.  Is that a link somewhere yet?
>
> The mailing lists (see [7] and eg [8]) are the primary tools we use
> here at Apache. Stuff only tends to get written down later, if at all.
> We've been throwing ideas around on the lists, hoping that people
> might pick some of them up and run with them ;-)
>
> Robert
>
> [1] http://www.apache.org/foundation/how-it-works.html
> [2] http://www.apache.org/foundation/getinvolved.html
> [3] http://jakarta.apache.org/site/contributing.html
> [4] http://www.apache.org/dev/contributors.html
> [5] http://tika.apache.org/
> [6] http://lucene.apache.org/
> [7] http://www.apache.org/dev/#mail
> [8] http://www.apache.org/dev/contrib-email-tips.html


Re: Headless mail renderer

Posted by Robert Burrell Donkin <ro...@gmail.com>.
On Sun, May 8, 2011 at 2:44 PM, Tony Zakula <to...@gmail.com> wrote:
> Not sure on where the project leaders want to go,

Projects are community led here at Apache (see eg [1][2][3][4]). If
there's development interest from the community and it's in scope for
the project, then that's a direction the code will move in.

> but I think being
> able to store messages in different formats to be able to plugin to
> systems would be great.  Instead of each person writing their own
> parser, most people would just plugin the larger piece to their system
> and start there.

+1

This vision seems to fit with the work over at Tika [5] and Lucene [6].

> I did not see where you specified what you are thinking about for
> summer.  Is that a link somewhere yet?

The mailing lists (see [7] and eg [8]) are the primary tools we use
here at Apache. Stuff only tends to get written down later, if at all.
We've been throwing ideas around on the lists, hoping that people
might pick some of them up and run with them ;-)

Robert

[1] http://www.apache.org/foundation/how-it-works.html
[2] http://www.apache.org/foundation/getinvolved.html
[3] http://jakarta.apache.org/site/contributing.html
[4] http://www.apache.org/dev/contributors.html
[5] http://tika.apache.org/
[6] http://lucene.apache.org/
[7] http://www.apache.org/dev/#mail
[8] http://www.apache.org/dev/contrib-email-tips.html

Re: Headless mail renderer

Posted by Tony Zakula <to...@gmail.com>.
Not sure on where the project leaders want to go, but I think being
able to store messages in different formats to be able to plugin to
systems would be great.  Instead of each person writing their own
parser, most people would just plugin the larger piece to their system
and start there.

I did not see where you specified what you are thinking about for
summer.  Is that a link somewhere yet?

Tony Z



On Fri, May 6, 2011 at 3:32 PM, Robert Burrell Donkin
<ro...@gmail.com> wrote:
> On Fri, May 6, 2011 at 12:05 PM, Tony Zakula <to...@gmail.com> wrote:
>> Hey,
>>
>> That is a cool project!  Congratulations!  I have one where that I am
>> still polishing for release that transforms messages into JSON format
>> and then stores the JSON.  My initial benchmarks on non-optimized code
>> is an average of 25,000 messages an hour with the main bottle neck
>> being the IO.  Cool to see what other people are doing.
>
> Transforming mail into JSON (or XML) seems a hot topic. It'll probably
> be needed for some of the mailbox (Hadoop, Casandra, CouchDB etc) and
> machine learning stuff we hope to experiment with this summer. I think
> that transformation modules building on mime4j would be a good fit for
> Mime4J.
>
> Opinions?
>
> Robert
>

Re: Headless mail renderer

Posted by Robert Burrell Donkin <ro...@gmail.com>.
On Fri, May 6, 2011 at 12:05 PM, Tony Zakula <to...@gmail.com> wrote:
> Hey,
>
> That is a cool project!  Congratulations!  I have one where that I am
> still polishing for release that transforms messages into JSON format
> and then stores the JSON.  My initial benchmarks on non-optimized code
> is an average of 25,000 messages an hour with the main bottle neck
> being the IO.  Cool to see what other people are doing.

Transforming mail into JSON (or XML) seems a hot topic. It'll probably
be needed for some of the mailbox (Hadoop, Casandra, CouchDB etc) and
machine learning stuff we hope to experiment with this summer. I think
that transformation modules building on mime4j would be a good fit for
Mime4J.

Opinions?

Robert

Re: Headless mail renderer

Posted by Tony Zakula <to...@gmail.com>.
Hey,

That is a cool project!  Congratulations!  I have one where that I am
still polishing for release that transforms messages into JSON format
and then stores the JSON.  My initial benchmarks on non-optimized code
is an average of 25,000 messages an hour with the main bottle neck
being the IO.  Cool to see what other people are doing.

Tony Z


On Fri, May 6, 2011 at 3:35 AM, Noss Benoit <be...@secu.lu> wrote:
> Hello Stefano and all the other who helped me,
>
> I worked with two students on a headless mail renderer (written in JAVA)
> I recently opened a project on SourceForge to share this experience
> (http://sourceforge.net/projects/mailtopdf/)
>
> Purpose is to render allmost all mails (body + attachments) into one or more
> PDFs. Focus was not set on a "sexy" rendition but on a rendition at all.
> Mails are read through imap or from a directory, renderer and saved as PDF
> in an output directory. It uses OpenOffice and JAI in background (for the
> attachments)
> I'm quite happy with the first results : it renders 98% of the mails with
> their attachments (mean pdf rendition value per mail =300ms on a normal
> machine)
>
> Just to let you know it and to thank again
>
>
> Benoīt NOSS
>
>
>
>
> On 25.01.2011 10:30, Stefano Bagnara wrote:
>>
>> 2011/1/25 Noss Benoit<be...@secu.lu>:
>>>
>>> Hi, after your comments, I know think I have to split my project in two
>>> parts
>>>
>>> 1/ The first part has to parse the message and write an html or xhtml
>>> page
>>> representing the output I want for the message
>>> 2/ The second part has to render the html I precedently generated to PDF
>>
>> I do that in a single step because of the content-id "cid:" image
>> references.
>> BTW logically you need to separate components: parser and renderer.
>>
>>> I tried flying saucer in the past, it can generate PDF, but it needed
>>> strict
>>> XHTML for the input, and lots of mails are not strict XHTML
>>
>> I've had very good results parsing the html with validator.nu parser:
>> http://about.validator.nu/htmlparser/
>>
>> I parsed thousands of HTML email and tested most html parser out there
>> and validator.nu was the only one parsing them all.
>>
>>> On the one hand, I think I can improve my parser to get the html I want
>>> for
>>> most of the mails I have to transform.
>>> On the other hand, I don't know the openoffice SDK, webkit and Mozilla,
>>> and
>>> html rendering will be the hardest part....
>>
>> If you used flying saucer in past then go ahead with that.
>>
>> Stefano
>>
>
>
>
>
>
>

Re: Headless mail renderer

Posted by Noss Benoit <be...@secu.lu>.
Hi Eric,

    * OpenOffice is used to render MicrosoftOffice and OpenOffice
      attachments into PDF
      OpenOffice badly renders html into PDF
    * iText is used to render XHTML to PDF.
      Like Stefano proposed, render html into XHTML with nu.validator
      (or with jtidy in my case) and then use flying saucer to make a
      PDF out of XHTML
      The flying saucer project internally uses iText 2.x (2.0.8 in my
      case) + iText5.0.6

Benoît

On 06.05.2011 11:54, Eric Charles wrote:
> I Benoït,
>
> Many tks for feedback and contribution.
>
> I just downloaded your zip and saw jodconverter (and associated 
> uno..., ju.. jars from openoffice sdk) and itext libs.
>
> You also import jdoconverter and itext classes in PDFConverterJAVA.
>
> What would you advice for any html/text pdf convertion based on your 
> experience?
>
> Tks,
> - Eric
>
> On 6/05/2011 10:35, Noss Benoit wrote:
>> Hello Stefano and all the other who helped me,
>>
>> I worked with two students on a headless mail renderer (written in JAVA)
>> I recently opened a project on SourceForge to share this experience
>> (http://sourceforge.net/projects/mailtopdf/)
>>
>> Purpose is to render allmost all mails (body + attachments) into one or
>> more PDFs. Focus was not set on a "sexy" rendition but on a rendition at
>> all. Mails are read through imap or from a directory, renderer and saved
>> as PDF in an output directory. It uses OpenOffice and JAI in background
>> (for the attachments)
>> I'm quite happy with the first results : it renders 98% of the mails
>> with their attachments (mean pdf rendition value per mail =300ms on a
>> normal machine)
>>
>> Just to let you know it and to thank again
>>
>>
>> Benoît NOSS
>>
>>
>>
>>
>> On 25.01.2011 10:30, Stefano Bagnara wrote:
>>> 2011/1/25 Noss Benoit<be...@secu.lu>:
>>>> Hi, after your comments, I know think I have to split my project in 
>>>> two
>>>> parts
>>>>
>>>> 1/ The first part has to parse the message and write an html or xhtml
>>>> page
>>>> representing the output I want for the message
>>>> 2/ The second part has to render the html I precedently generated 
>>>> to PDF
>>> I do that in a single step because of the content-id "cid:" image
>>> references.
>>> BTW logically you need to separate components: parser and renderer.
>>>
>>>> I tried flying saucer in the past, it can generate PDF, but it needed
>>>> strict
>>>> XHTML for the input, and lots of mails are not strict XHTML
>>> I've had very good results parsing the html with validator.nu parser:
>>> http://about.validator.nu/htmlparser/
>>>
>>> I parsed thousands of HTML email and tested most html parser out there
>>> and validator.nu was the only one parsing them all.
>>>
>>>> On the one hand, I think I can improve my parser to get the html I
>>>> want for
>>>> most of the mails I have to transform.
>>>> On the other hand, I don't know the openoffice SDK, webkit and
>>>> Mozilla, and
>>>> html rendering will be the hardest part....
>>> If you used flying saucer in past then go ahead with that.
>>>
>>> Stefano
>>>
>>
>>
>>
>>
>>


 

Re: Headless mail renderer

Posted by Eric Charles <er...@apache.org>.
I Benoït,

Many tks for feedback and contribution.

I just downloaded your zip and saw jodconverter (and associated uno..., 
ju.. jars from openoffice sdk) and itext libs.

You also import jdoconverter and itext classes in PDFConverterJAVA.

What would you advice for any html/text pdf convertion based on your 
experience?

Tks,
- Eric

On 6/05/2011 10:35, Noss Benoit wrote:
> Hello Stefano and all the other who helped me,
>
> I worked with two students on a headless mail renderer (written in JAVA)
> I recently opened a project on SourceForge to share this experience
> (http://sourceforge.net/projects/mailtopdf/)
>
> Purpose is to render allmost all mails (body + attachments) into one or
> more PDFs. Focus was not set on a "sexy" rendition but on a rendition at
> all. Mails are read through imap or from a directory, renderer and saved
> as PDF in an output directory. It uses OpenOffice and JAI in background
> (for the attachments)
> I'm quite happy with the first results : it renders 98% of the mails
> with their attachments (mean pdf rendition value per mail =300ms on a
> normal machine)
>
> Just to let you know it and to thank again
>
>
> Benoît NOSS
>
>
>
>
> On 25.01.2011 10:30, Stefano Bagnara wrote:
>> 2011/1/25 Noss Benoit<be...@secu.lu>:
>>> Hi, after your comments, I know think I have to split my project in two
>>> parts
>>>
>>> 1/ The first part has to parse the message and write an html or xhtml
>>> page
>>> representing the output I want for the message
>>> 2/ The second part has to render the html I precedently generated to PDF
>> I do that in a single step because of the content-id "cid:" image
>> references.
>> BTW logically you need to separate components: parser and renderer.
>>
>>> I tried flying saucer in the past, it can generate PDF, but it needed
>>> strict
>>> XHTML for the input, and lots of mails are not strict XHTML
>> I've had very good results parsing the html with validator.nu parser:
>> http://about.validator.nu/htmlparser/
>>
>> I parsed thousands of HTML email and tested most html parser out there
>> and validator.nu was the only one parsing them all.
>>
>>> On the one hand, I think I can improve my parser to get the html I
>>> want for
>>> most of the mails I have to transform.
>>> On the other hand, I don't know the openoffice SDK, webkit and
>>> Mozilla, and
>>> html rendering will be the hardest part....
>> If you used flying saucer in past then go ahead with that.
>>
>> Stefano
>>
>
>
>
>
>