You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by Tellier Benoit <bt...@apache.org> on 2019/12/02 03:41:07 UTC

Re: About Attachments and JMAP

Hi Matthieu,

On 28/11/2019 23:08, Matthieu Baechler wrote:
> On Thu, 2019-11-28 at 18:04 +0700, Tellier Benoit wrote:
>> Thanks for the reply!
>>
>> [...]
>>
>> My concern of the day is the fact **To read fast JMAP property I need
>> slow full message loading**, and this is what my proposition
>> addresses.
> 
> First, my explanation about byte array usage was in the _Context_
> section, no more. I don't expect you to work on this problem right now.
> 
> Second, the reason why I talked about it is that it's another solution
> for the same problem: you read more data than needed thus this code
> path is slow.
> 
>>
>> I encourage you to split your **James usage of byte arrays**
>> discussion
>> in a separated thread (let's discuss this! but let's honor this with
>> it's standalone discussions :-))
> 
> I don't know if there's anything to discuss right now. I don't expect
> Linagora teams to work on this specific topic in the forseable future.

On friday I worked on a fix to not aggressively read attachment binary
content upon read (but require further interactions with the Attachment
Manager in order to retrieve the content).

This solves the `byte array` part of my problem (and hold the
'consequence' part of the promise).

https://github.com/linagora/james-project/pull/2943

>> [...]
> 
> Well, technically, hasAttachment being a data derived from some other,
> it means we compute it at some point. My argument was: if we need it
> often (and based on your explanation, that's the case) then we should
> compute it and store it. My sentence was maybe too general but there's
> no confusion for me. 

I would be in favor of this:
 - Storing the 'hasAttachment' properties (JMAP definition of it) along
side the preview
 - Thus 'PreComputedPreviewStore' should be renamed into
'MessageFastViewProjection'
 - Other fields could be added to it while still benefitting from the
"PreComputation" and webadmin endpoints.

>> I'm just speaking of better structuring already available data.
>>
>>>> ## Proposal
>>>>
>>>> Introduce a new POJO: MessageAttachmentMetadata (mailbox-api)
>>>>  - id
>>>>  - name
>>>>  - cid
>>>>  - isInline
>>>>  - size
>>>>  - type
>>>>
>>>>  - Message (mailbox-store) & MessageResult (mailbox-api) SHOULD
>>>> return
>>>> MessageAttachmentMetadata NOT MessageAttachment. Thus these
>>>> metadata
>>>> will be added at the FetchGroup.profile.MINIMAL.
>>>
>>> Do we always need attachments information?
>>
>> Of course not.
>>
>> We can think about a separate FetchGroup / FetchType read level if
>> that
>> is an issue.
> 
> Reading a list of MessageAttachmentMetadata means it's probably not
> done with a single query (I think we don't have a projection for that).
> 
> I'm a bit concerned that the minimal profile always read attachments
> metadata, we should consider adding some profiles.

Yes let's not change that for now. Agree that we cannot enforce such
choices on potential future implementations.

Given JMAP hasAttachment precomputation proposal above, a separate
FetchGroup/FetchType do not seem necessary.

> 
>>  - Memory has it for free
>>  - Over implementations do lazy load this from the full content (and
>> parse it) - maildir + jpa.
>>
>> Which mean that the impact of this decision is limited.
> 
> But we would probably miss an opportunity to be faster, right?
> 
>>>> We need to port "scanning search" to do on the fly message
>>>> parsing. 
>>>
>>> Not sure to understand this sentence
>>
>> Scanning search iterates messages. Upon **attachment** criterion,
>> read
>> the message full content and parse it to retrieve attachment list
>> (that
>> might not be stored with some implementation).
>>
>> Did I explained it better?
> 
> I somehow understand what Scanning search does but I still don't
> understand what we need to change.

If only message metadata is attached to a MailboxMessage then scanning
search can not rely on attachment content for attachment textual content
search.

We need to be parsing message content in order to retrieve that
information back (as the mailbox implementation might not provide
Attachment manager/mapper).

>>>> [...]
>>>> ## Alternative
>>>>
>>>> We could merge MessageAttachment & Attachment together however
>>>> this
>>>> would lead to significant datastructure re-arranging for no
>>>> behavioural
>>>> gains and just a slightly more lean API.
>>>>
>>>> Thus I propose not to takle this now.
>>>
>>> I'm not sure to understand this alternative.
>>
>> That's not important here. I'm saying there is a 1 to 1 replationship
>> between Attachment and MessageAttachment POJOs (as I recall it this
>> is a
>> vestige of content deduplication at the attachment level).
>>
>> I see that we can get rid of Attachment POJO, but that would require
>> (cassandra) storage re-structuration. For API gains.
> 
> API is more important than implementation as far as I'm concerned. We
> can postpone things because we don't want to invest right now but if
> it's not an investment issue, we should adapt the storage to the better
> API.

Here that's a cost issue as grouping Attachment & MessageAttachment
together as it implies changing simultaneously the message and the
attachment table, what so far we do not know how to do it.

The refactoring would be easy.

Not the data migration.

> 
>>> [...]
>> [...]
> 
> All backends support storing attachment as far as I know.

All backend expose attachment through MailboxMessage.

However only a handful of them provides the Attachment mapper/manager
support, and that is a big difference (jpa & maildir do not)

> I guess you think "no using a BlobStore".
> And don't see what we be harder in this case, could you elaborate.

"BlobStore" is currently an implementation detail and not a top level
API citizen.

Some mailbox implementation (memory - jpa - maildir) don't rely on this
API, which usage cannot be infered.

From a mailbox-api point of view, blobStore calls need to be abstracted
away with some additional levels of indirection.

> 
> Cheers,
> 

Cheers

Benoit

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org