You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jackrabbit.apache.org by Apache Wiki <wi...@apache.org> on 2012/02/16 12:22:12 UTC
[Jackrabbit Wiki] Update of "Goals and non goals for Jackrabbit 3" by MichaelDürig
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.
The "Goals and non goals for Jackrabbit 3" page has been changed by MichaelDürig:
http://wiki.apache.org/jackrabbit/Goals%20and%20non%20goals%20for%20Jackrabbit%203
New page:
=== Design principle ===
Best effort: everything might be corrupt at any time:
* node types
* child node existence
* clients may not make '''any''' consistency assumptions
=== Goals ===
* Pass TCK. But TCK might be adapted for invalid or edge cases.
* Node type consistency on save and set type (including mixin). Inconsistencies occurring do to write skew or degradation effects are acceptable though.
* Scalability:
* Read throughput: no degradation from current Jackrabbit 2, repeated read not slow, take advantage of locality for random reads. '''TODO''': Needs further clarification
* High write throughput across cluster nodes.
* Big lists of direct child nodes (10M)
* Concurrent writes within single cluster node. '''TODO''': Needs further clarification
* Big transactions (> 100000k nodes at 1kB each, cq5 content package)
* Start up time < 1s
* Number of nodes in repository (tar pm): 100M
* Number of nodes in shared cloud: 10T
* 1G binaries with 2MB per binary => 2PB Repository size
* Simple/Fast queries (i.e. through specialized indexes) (3ms)
* Partitioning of observation. '''TODO''': Needs further clarification
* Handling of recursive deletes: large number of NODE_REMOVED events vs. delete event for specific properties in subtree.
* Number of users: 200M / 20M per group
* Full versioning model
* Flexible durability (depending on durability guarantees of back end)
=== Non goals ===
* Node type consistency when node type definition changes
* Consistency guarantees
* Scalability:
* Big property list
* Same name siblings
* Namespace remapping
* Query index complete
* Fast move
* JCR lock == sync
=== Maybe ===
* Multi valued properties
* Sharable nodes
* Fast delete
Re: [Jackrabbit Wiki] Update of "Goals and non goals for Jackrabbit 3" by MichaelDürig
Posted by Michael Dürig <md...@apache.org>.
Hi Jukka,
Thanks for the comments. I generally share you concerns about the goals
and terms being to vague. As it stands this is only a dump from what we
gathered at our F22 and it is to be discussed and further clarified.
I plan to span off separate discussions thread about the issues you
raised and probably about others sometimes next week. This should help
us pushing this forward.
Michael
On 16.2.12 15:39, Jukka Zitting wrote:
> Hi,
>
> On Thu, Feb 16, 2012 at 12:22 PM, Apache Wiki<wi...@apache.org> wrote:
>> === Design principle ===
>> Best effort: everything might be corrupt at any time:
>> * node types
>> * child node existence
>> * clients may not make '''any''' consistency assumptions
>
> That's too vague, IMHO. A client needs to be able to make *some*
> assumptions about the consistency of the repository. For example, if I
> write something to the repository and nobody else modifies that
> content, it should be safe for me to assume that the content still
> exists (in a consistent state) when I next look for it. What happens
> when others modify the content is a different question, but even there
> we need to be able to give some deterministic guarantees about
> (eventual) consistency.
>
> Also, the word "corrupt" here sounds quite harsh. IMHO a corruption is
> always a big problem. Instead we may want to allow the repository to
> be at times "out of sync" or "temporarily inconsistent" as long as
> it'll eventually reach a more stable state.
>
>> * Pass TCK. But TCK might be adapted for invalid or edge cases.
>
> "Pass TCK" is a bit vague unless we specify which optional features we
> intend to support. Any changes to the TCK need to be based on a more
> accurate reading of the spec or (where we feel the spec is too strict)
> a proposal to JSR 333 to relax the requirements.
>
>> Scalability:
>
> We need more specific benchmarks for these. Good early guidelines though.
>
>> === Non goals ===
>> [...]
>> * Consistency guarantees
>
> As discussed above, we IMHO definitely need consistency guarantees,
> the question is just how strict we want to make them.
>
>> * JCR lock == sync
>
> Can you elaborate? I'd either implement locking correctly or not
> implement it at all, not something in between that a client can't
> really rely on.
>
> BR,
>
> Jukka Zitting
Re: [Jackrabbit Wiki] Update of "Goals and non goals for Jackrabbit 3" by MichaelDürig
Posted by Michael Dürig <md...@apache.org>.
>> * JCR lock == sync
>
> Can you elaborate? I'd either implement locking correctly or not
> implement it at all, not something in between that a client can't
> really rely on.
AFAIU it this was related to a clustered setup where locking might be
quite expensive. We might want to offer reduced locking functionality
(or none at all) in this case. OTOH we should definitely have a look at
Apache Zookeper [1] (thanks Bertrand) which might help us solving some
of these issues.
Michael
[1] http://zookeeper.apache.org/
Re: [Jackrabbit Wiki] Update of "Goals and non goals for Jackrabbit 3" by MichaelDürig
Posted by Jukka Zitting <ju...@gmail.com>.
Hi,
On Thu, Feb 16, 2012 at 12:22 PM, Apache Wiki <wi...@apache.org> wrote:
> === Design principle ===
> Best effort: everything might be corrupt at any time:
> * node types
> * child node existence
> * clients may not make '''any''' consistency assumptions
That's too vague, IMHO. A client needs to be able to make *some*
assumptions about the consistency of the repository. For example, if I
write something to the repository and nobody else modifies that
content, it should be safe for me to assume that the content still
exists (in a consistent state) when I next look for it. What happens
when others modify the content is a different question, but even there
we need to be able to give some deterministic guarantees about
(eventual) consistency.
Also, the word "corrupt" here sounds quite harsh. IMHO a corruption is
always a big problem. Instead we may want to allow the repository to
be at times "out of sync" or "temporarily inconsistent" as long as
it'll eventually reach a more stable state.
> * Pass TCK. But TCK might be adapted for invalid or edge cases.
"Pass TCK" is a bit vague unless we specify which optional features we
intend to support. Any changes to the TCK need to be based on a more
accurate reading of the spec or (where we feel the spec is too strict)
a proposal to JSR 333 to relax the requirements.
> Scalability:
We need more specific benchmarks for these. Good early guidelines though.
> === Non goals ===
> [...]
> * Consistency guarantees
As discussed above, we IMHO definitely need consistency guarantees,
the question is just how strict we want to make them.
> * JCR lock == sync
Can you elaborate? I'd either implement locking correctly or not
implement it at all, not something in between that a client can't
really rely on.
BR,
Jukka Zitting