You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jackrabbit.apache.org by Apache Wiki <wi...@apache.org> on 2012/02/16 12:22:12 UTC

[Jackrabbit Wiki] Update of "Goals and non goals for Jackrabbit 3" by MichaelDürig

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.

The "Goals and non goals for Jackrabbit 3" page has been changed by MichaelDürig:
http://wiki.apache.org/jackrabbit/Goals%20and%20non%20goals%20for%20Jackrabbit%203

New page:
=== Design principle ===
Best effort: everything might be corrupt at any time:
 * node types
 * child node existence
 * clients may not make '''any''' consistency assumptions

=== Goals ===
 * Pass TCK. But TCK might be adapted for invalid or edge cases.
 * Node type consistency on save and set type (including mixin). Inconsistencies occurring do to write skew or degradation effects are acceptable though.  
 * Scalability: 
  * Read throughput: no degradation from current Jackrabbit 2, repeated read not slow, take advantage of locality for random reads. '''TODO''': Needs further clarification
  * High write throughput across cluster nodes. 
  * Big lists of direct child nodes (10M)
  * Concurrent writes within single cluster node. '''TODO''': Needs further clarification
  * Big transactions (> 100000k nodes at 1kB each, cq5 content package)
  * Start up time < 1s
  * Number of nodes in repository (tar pm): 100M 
  * Number of nodes in shared cloud: 10T
  * 1G binaries with 2MB per binary => 2PB Repository size
 * Simple/Fast queries (i.e. through specialized indexes) (3ms)
 * Partitioning of observation. '''TODO''': Needs further clarification
  * Handling of recursive deletes: large number of NODE_REMOVED events vs. delete event for specific properties in subtree.
 * Number of users: 200M / 20M per group
 * Full versioning model 
 * Flexible durability (depending on durability guarantees of back end)

=== Non goals ===
 * Node type consistency when node type definition changes
 * Consistency guarantees
 * Scalability:
  * Big property list
  * Same name siblings
  * Namespace remapping
 * Query index complete 
 * Fast move
 * JCR lock == sync
  
=== Maybe ===
 * Multi valued properties
 * Sharable nodes                       
 * Fast delete

Re: [Jackrabbit Wiki] Update of "Goals and non goals for Jackrabbit 3" by MichaelDürig

Posted by Michael Dürig <md...@apache.org>.
Hi Jukka,

Thanks for the comments. I generally share you concerns about the goals 
and terms being to vague. As it stands this is only a dump from what we 
gathered at our F22 and it is to be discussed and further clarified.

I plan to span off separate discussions thread about the issues you 
raised and probably about others sometimes next week. This should help 
us pushing this forward.

Michael

On 16.2.12 15:39, Jukka Zitting wrote:
> Hi,
>
> On Thu, Feb 16, 2012 at 12:22 PM, Apache Wiki<wi...@apache.org>  wrote:
>> === Design principle ===
>> Best effort: everything might be corrupt at any time:
>>   * node types
>>   * child node existence
>>   * clients may not make '''any''' consistency assumptions
>
> That's too vague, IMHO. A client needs to be able to make *some*
> assumptions about the consistency of the repository. For example, if I
> write something to the repository and nobody else modifies that
> content, it should be safe for me to assume that the content still
> exists (in a consistent state) when I next look for it. What happens
> when others modify the content is a different question, but even there
> we need to be able to give some deterministic guarantees about
> (eventual) consistency.
>
> Also, the word "corrupt" here sounds quite harsh. IMHO a corruption is
> always a big problem. Instead we may want to allow the repository to
> be at times "out of sync" or "temporarily inconsistent" as long as
> it'll eventually reach a more stable state.
>
>>   * Pass TCK. But TCK might be adapted for invalid or edge cases.
>
> "Pass TCK" is a bit vague unless we specify which optional features we
> intend to support. Any changes to the TCK need to be based on a more
> accurate reading of the spec or (where we feel the spec is too strict)
> a proposal to JSR 333 to relax the requirements.
>
>> Scalability:
>
> We need more specific benchmarks for these. Good early guidelines though.
>
>> === Non goals ===
>> [...]
>>   * Consistency guarantees
>
> As discussed above, we IMHO definitely need consistency guarantees,
> the question is just how strict we want to make them.
>
>>   * JCR lock == sync
>
> Can you elaborate? I'd either implement locking correctly or not
> implement it at all, not something in between that a client can't
> really rely on.
>
> BR,
>
> Jukka Zitting

Re: [Jackrabbit Wiki] Update of "Goals and non goals for Jackrabbit 3" by MichaelDürig

Posted by Michael Dürig <md...@apache.org>.
>>   * JCR lock == sync
>
> Can you elaborate? I'd either implement locking correctly or not
> implement it at all, not something in between that a client can't
> really rely on.

AFAIU it this was related to a clustered setup where locking might be 
quite expensive. We might want to offer reduced locking functionality 
(or none at all) in this case. OTOH we should definitely have a look at 
Apache Zookeper [1] (thanks Bertrand) which might help us solving some 
of these issues.

Michael

[1] http://zookeeper.apache.org/


Re: [Jackrabbit Wiki] Update of "Goals and non goals for Jackrabbit 3" by MichaelDürig

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Thu, Feb 16, 2012 at 12:22 PM, Apache Wiki <wi...@apache.org> wrote:
> === Design principle ===
> Best effort: everything might be corrupt at any time:
>  * node types
>  * child node existence
>  * clients may not make '''any''' consistency assumptions

That's too vague, IMHO. A client needs to be able to make *some*
assumptions about the consistency of the repository. For example, if I
write something to the repository and nobody else modifies that
content, it should be safe for me to assume that the content still
exists (in a consistent state) when I next look for it. What happens
when others modify the content is a different question, but even there
we need to be able to give some deterministic guarantees about
(eventual) consistency.

Also, the word "corrupt" here sounds quite harsh. IMHO a corruption is
always a big problem. Instead we may want to allow the repository to
be at times "out of sync" or "temporarily inconsistent" as long as
it'll eventually reach a more stable state.

>  * Pass TCK. But TCK might be adapted for invalid or edge cases.

"Pass TCK" is a bit vague unless we specify which optional features we
intend to support. Any changes to the TCK need to be based on a more
accurate reading of the spec or (where we feel the spec is too strict)
a proposal to JSR 333 to relax the requirements.

> Scalability:

We need more specific benchmarks for these. Good early guidelines though.

> === Non goals ===
> [...]
>  * Consistency guarantees

As discussed above, we IMHO definitely need consistency guarantees,
the question is just how strict we want to make them.

>  * JCR lock == sync

Can you elaborate? I'd either implement locking correctly or not
implement it at all, not something in between that a client can't
really rely on.

BR,

Jukka Zitting