You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by Robert Burrell Donkin <ro...@gmail.com> on 2008/03/08 20:17:39 UTC

[mime4J] MessageSearcher...? [WAS Re: [Mime4J] getReader]

On Thu, Mar 6, 2008 at 7:13 PM, Norman Maurer <no...@apache.org> wrote:
>
>  Am Donnerstag, den 06.03.2008, 18:02 +0000 schrieb Robert Burrell
>  Donkin:
>
>
> > IMAP requires case-insensitive charset-aware searching of mail body
>  > parts (a bit of a PITA, i know and most likely slow). ATM the Mime4J
>  > pull parser has a getInputStream method but i plan to add a getReader
>  > method which will perform charset-aware content-decoding for text
>  > parts and return null for others (seems easier than throwing an
>  > exception but i'm willing to be talked out of this decision) since i
>  > think that this function may be generally useful.
>  >
>  > - robert
>
>  +1
>
>  Sounds good.

hmm...

does the message searcher (which searches a mail for a particular
sequence of characters)  sounds like something which might be
generally useful addition to Mime4J?

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: [mime4J] MessageSearcher...? [WAS Re: [Mime4J] getReader]

Posted by Robert Burrell Donkin <ro...@gmail.com>.
On Sun, Mar 9, 2008 at 4:05 PM, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
>
>  On Sun, Mar 9, 2008 at 11:53 AM, Robert Burrell Donkin
>  <ro...@gmail.com> wrote:
>  > On Sun, Mar 9, 2008 at 9:41 AM, Jukka Zitting <ju...@gmail.com> wrote:
>
> >  >  A search feature is typically only useful when applied to a collection
>  >  >  of messages, so I'm not sure if something like that really makes sense
>  >  >  in the scope of Mime4J.
>  >
>  >  the object i had in mind performs a character search of the bodies and
>  >  fields of one MIME document. this isn't a byte-wise but
>  >  character-based search which requires decoding of transfer encodings
>  >  and setting appropriate charsets. the function isn't complex but
>  >  getting the decoding right for each part is a little fiddly and
>  >  requires an understanding of MIME encoding.
>
>  Isn't most of that complexity already be handled by the new getReader() method?

yes it is. it's just a classic pull parsing implementation.  so maybe
it would work better in the documentation as a worked example...

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: [mime4J] MessageSearcher...? [WAS Re: [Mime4J] getReader]

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Sun, Mar 9, 2008 at 11:53 AM, Robert Burrell Donkin
<ro...@gmail.com> wrote:
> On Sun, Mar 9, 2008 at 9:41 AM, Jukka Zitting <ju...@gmail.com> wrote:
>  >  A search feature is typically only useful when applied to a collection
>  >  of messages, so I'm not sure if something like that really makes sense
>  >  in the scope of Mime4J.
>
>  the object i had in mind performs a character search of the bodies and
>  fields of one MIME document. this isn't a byte-wise but
>  character-based search which requires decoding of transfer encodings
>  and setting appropriate charsets. the function isn't complex but
>  getting the decoding right for each part is a little fiddly and
>  requires an understanding of MIME encoding.

Isn't most of that complexity already be handled by the new getReader() method?

BR,

Jukka Zitting

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: [mime4J] MessageSearcher...? [WAS Re: [Mime4J] getReader]

Posted by Robert Burrell Donkin <ro...@gmail.com>.
On Sun, Mar 9, 2008 at 9:41 AM, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
>
>  On Sat, Mar 8, 2008 at 9:17 PM, Robert Burrell Donkin
>  <ro...@gmail.com> wrote:
>  >  does the message searcher (which searches a mail for a particular
>  >  sequence of characters)  sounds like something which might be
>  >  generally useful addition to Mime4J?
>
>  A search feature is typically only useful when applied to a collection
>  of messages, so I'm not sure if something like that really makes sense
>  in the scope of Mime4J.

the object i had in mind performs a character search of the bodies and
fields of one MIME document. this isn't a byte-wise but
character-based search which requires decoding of transfer encodings
and setting appropriate charsets. the function isn't complex but
getting the decoding right for each part is a little fiddly and
requires an understanding of MIME encoding.

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: [mime4J] MessageSearcher...? [WAS Re: [Mime4J] getReader]

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Sat, Mar 8, 2008 at 9:17 PM, Robert Burrell Donkin
<ro...@gmail.com> wrote:
>  does the message searcher (which searches a mail for a particular
>  sequence of characters)  sounds like something which might be
>  generally useful addition to Mime4J?

A search feature is typically only useful when applied to a collection
of messages, so I'm not sure if something like that really makes sense
in the scope of Mime4J.

BR,

Jukka Zitting

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: [mime4J] MessageSearcher...? [WAS Re: [Mime4J] getReader]

Posted by Robert Burrell Donkin <ro...@gmail.com>.
On Mon, Mar 10, 2008 at 10:21 PM, Noel J. Bergman <no...@devtech.com> wrote:
> Robert Burrell Donkin wrote:
>
>  >  I think that I'd rather we follow the example of other
>  >  getInputStream/getReader exemplars, and throw
>  >  java.io.UnsupportedEncodingException if it is called on a type for which
>  a
>  >  Reader is not appropriate.
>
>  > in the end, i decided that consistency was more important and elected
>  > to throw the appropriate exception.
>
>  Consistency with what?

the rest of the pull parser

>  > i don't really like having to catch runtime exceptions
>
>  Perhaps that's why the Servlet and Portlet APIs declare
>  UnsupportedEncodingException and IOException instead of
>  UnsupportedCharsetException or IllegalCharsetNameException?  I'd have to
>  check the code to see if they actually catch the latter, and return the
>  former, or if they just allow them to leak out undeclared.

IIRC UnsupportedCharsetException and or IllegalCharsetNameException
were introduced with nio

i've used both approaches to exceptions (bio and nio) quite a bit now
and i think that i prefer nio. when i specify a charset which i know
is gauranteed to be present then it's a PITA to have to handle that
exception. occasionally, i may need to use a variable charset and then
i just have to catch the runtime.

>  > > If I asked you to search
>  > > for SOMETEXT, what do I get back?  I might want a collection of [part,
>  > > first_offset]-tuples.  Less than that, and you would be discarding
>  > > information that you should already have, more than that and I would be
>  > > forcing you to do extra work that a given use case might not require.
>  On
>  > > the other hand, perhaps I just want to know about the first part in
>  which
>  > > you find it, not all parts containing it.  And can I ask that you search
>  > > only headers or parts of only certain content-type(s)?
>  > >
>  > > Point being that, yes, I'd like to have a search functionality, but I'd
>  like
>  > > some discussion over the interface and supported use cases.  And if we
>  don't
>  > > build this into an existing class, it could be a shared utility, rather
>  than
>  > > leaving it as just an example for people to clone.
>
>  > (as jukka posted) given getReader search is just a trivial pull parser
>  > application. definitely not worth a major design exercise. probably
>  > best as a worked example somewhere in the documentation.
>
>  I suspect that it will have a lot more utility than that.  Mailets will want
>  to search for text within the message structure.

but there's a balance between utility and simplicity. text searching
is now a simple application of the library. better just to give a
worked example in the documentation and then think about whether
including a simplified email-only interface as a mailet utility would
be worthwhile.

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


RE: [mime4J] MessageSearcher...? [WAS Re: [Mime4J] getReader]

Posted by "Noel J. Bergman" <no...@devtech.com>.
Robert Burrell Donkin wrote:

>  I think that I'd rather we follow the example of other
>  getInputStream/getReader exemplars, and throw
>  java.io.UnsupportedEncodingException if it is called on a type for which
a
>  Reader is not appropriate.

> in the end, i decided that consistency was more important and elected
> to throw the appropriate exception.

Consistency with what?

> i don't really like having to catch runtime exceptions

Perhaps that's why the Servlet and Portlet APIs declare
UnsupportedEncodingException and IOException instead of
UnsupportedCharsetException or IllegalCharsetNameException?  I'd have to
check the code to see if they actually catch the latter, and return the
former, or if they just allow them to leak out undeclared.

> > If I asked you to search
> > for SOMETEXT, what do I get back?  I might want a collection of [part,
> > first_offset]-tuples.  Less than that, and you would be discarding
> > information that you should already have, more than that and I would be
> > forcing you to do extra work that a given use case might not require.
On
> > the other hand, perhaps I just want to know about the first part in
which
> > you find it, not all parts containing it.  And can I ask that you search
> > only headers or parts of only certain content-type(s)?
> >
> > Point being that, yes, I'd like to have a search functionality, but I'd
like
> > some discussion over the interface and supported use cases.  And if we
don't
> > build this into an existing class, it could be a shared utility, rather
than
> > leaving it as just an example for people to clone.

> (as jukka posted) given getReader search is just a trivial pull parser
> application. definitely not worth a major design exercise. probably
> best as a worked example somewhere in the documentation.

I suspect that it will have a lot more utility than that.  Mailets will want
to search for text within the message structure.

	--- Noel



---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Re: [mime4J] MessageSearcher...? [WAS Re: [Mime4J] getReader]

Posted by Robert Burrell Donkin <ro...@gmail.com>.
On Sun, Mar 9, 2008 at 10:01 PM, Noel J. Bergman <no...@devtech.com> wrote:
> Robert Burrell Donkin wrote:
>
>  > > > IMAP requires case-insensitive charset-aware searching of mail body
>  > > > parts (a bit of a PITA, i know and most likely slow). ATM the Mime4J
>  > > > pull parser has a getInputStream method but i plan to add a getReader
>  > > > method which will perform charset-aware content-decoding for text
>  > > > parts and return null for others (seems easier than throwing an
>  > > > exception but i'm willing to be talked out of this decision) since i
>  > > > think that this function may be generally useful.
>
>  I think that I'd rather we follow the example of other
>  getInputStream/getReader exemplars, and throw
>  java.io.UnsupportedEncodingException if it is called on a type for which a
>  Reader is not appropriate.

in the end, i decided that consistency was more important and elected
to throw the appropriate exception. so that's what the implementation
does. i don't really like having to catch runtime exceptions but it's
not unreasonable in this case.

>  > does the message searcher (which searches a mail for a particular
>  > sequence of characters)  sounds like something which might be
>  > generally useful addition to Mime4J?
>
>  Yes.  But what do you propose as the interface?  If I asked you to search
>  for SOMETEXT, what do I get back?  I might want a collection of [part,
>  first_offset]-tuples.  Less than that, and you would be discarding
>  information that you should already have, more than that and I would be
>  forcing you to do extra work that a given use case might not require.  On
>  the other hand, perhaps I just want to know about the first part in which
>  you find it, not all parts containing it.  And can I ask that you search
>  only headers or parts of only certain content-type(s)?
>
>  Point being that, yes, I'd like to have a search functionality, but I'd like
>  some discussion over the interface and supported use cases.  And if we don't
>  build this into an existing class, it could be a shared utility, rather than
>  leaving it as just an example for people to clone.

(as jukka posted) given getReader search is just a trivial pull parser
application. definitely not worth a major design exercise. probably
best as a worked example somewhere in the documentation.

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


RE: [mime4J] MessageSearcher...? [WAS Re: [Mime4J] getReader]

Posted by "Noel J. Bergman" <no...@devtech.com>.
Robert Burrell Donkin wrote:

> > > IMAP requires case-insensitive charset-aware searching of mail body
> > > parts (a bit of a PITA, i know and most likely slow). ATM the Mime4J
> > > pull parser has a getInputStream method but i plan to add a getReader
> > > method which will perform charset-aware content-decoding for text
> > > parts and return null for others (seems easier than throwing an
> > > exception but i'm willing to be talked out of this decision) since i
> > > think that this function may be generally useful.

I think that I'd rather we follow the example of other
getInputStream/getReader exemplars, and throw
java.io.UnsupportedEncodingException if it is called on a type for which a
Reader is not appropriate.

> does the message searcher (which searches a mail for a particular
> sequence of characters)  sounds like something which might be
> generally useful addition to Mime4J?

Yes.  But what do you propose as the interface?  If I asked you to search
for SOMETEXT, what do I get back?  I might want a collection of [part,
first_offset]-tuples.  Less than that, and you would be discarding
information that you should already have, more than that and I would be
forcing you to do extra work that a given use case might not require.  On
the other hand, perhaps I just want to know about the first part in which
you find it, not all parts containing it.  And can I ask that you search
only headers or parts of only certain content-type(s)?

Point being that, yes, I'd like to have a search functionality, but I'd like
some discussion over the interface and supported use cases.  And if we don't
build this into an existing class, it could be a shared utility, rather than
leaving it as just an example for people to clone.

	--- Noel



---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org