You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Michael Dürig <md...@apache.org> on 2012/07/04 14:37:07 UTC

Notes from the Oakathon

Hi,

Last week some of us met for another small Oakathon in Basel. Below is a 
list of topics that where touched and worked on and the related Jira 
issues. Any additional input is appreciated.

- Observation (OAK-144):
   * Observation events don't report every single event but rather a 
consolidated view at a given time (basically a a diff). Rational: 
performance on heavy write systems, (order of) individual events not 
clear/different for different cluster nodes in clustered setup.
   * Implement user data support on a best effort basis. In a clustered 
setup this can't be supported. See also 
http://markmail.org/message/osxupy3twc3pyild
   * oak-core should provide lightweight mechanism for clients to 
discover changes. oak-jcr can use this to implement JCR observation on 
top and trade off performance implications as needed.

- Internal value handling
   * To reduce GC overhead we should try to keep the number of small 
objects down
   * Possibly merge CoreValue into PropertyState
   * Use flyweight instances for common values/properties

- HTTP bindings (OAK-103, OAK-104)
   * Provide extension points for Sling to hook into. (See also the 
discussion on @oak-dev: http://markmail.org/message/tzpbxf5zduybezya.)

- Full text index (OAK-154)
   * Added an initial Lucene index stored under /jcr:system/oak:lucene 
in the repository (final location(s) TBD)
   * Index updates in a CommitEditor hook, so the index is always up to 
date with latest content changes
     + TODO: How to postpone time-consuming indexing operations like 
full text extraction
     + TODO: How to merge conflicts in the search index?
   * Basic QueryIndexProvider allows the existing query engine to 
leverage the Lucene index
     + TODO: Integrate with the rest of the build

- Merge/rebase logic handling (OAK-157)
   * The Lucene index work encountered some issues with the way we 
handle the merge/rebase operations
   * Need to look deeper into that over the next few weeks

Michael

Re: Notes from the Oakathon

Posted by Felix Meschberger <fm...@adobe.com>.
Hi,

Am 04.07.2012 um 14:37 schrieb Michael Dürig:

> - HTTP bindings (OAK-103, OAK-104)
>   * Provide extension points for Sling to hook into. (See also the 
> discussion on @oak-dev: http://markmail.org/message/tzpbxf5zduybezya.)

Just to clarify my thoughts here: HTTP Bindings make sense but only to be able to support remoting between the Oak core and Oak-JCR layers. As such the bindings replicate the Oak API over HTTP. Nothing else.

Specifically, applications should not be invited and encouraged to leverage these bindings.

As such, Sling couldn't care less for these bindings, because Sling sticks to the JCR API.

Regards
Felix

Re: Notes from the Oakathon

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Wed, Jul 4, 2012 at 5:43 PM, Jukka Zitting <ju...@gmail.com> wrote:
> On Wed, Jul 4, 2012 at 5:35 PM, Stefan Guggisberg
> <st...@gmail.com> wrote:
>> On Wed, Jul 4, 2012 at 2:37 PM, Michael Dürig <md...@apache.org> wrote:
>>> - HTTP bindings (OAK-103, OAK-104)
>>>   * Provide extension points for Sling to hook into. (See also the
>>
>> i don't remember any discussions that sling should be using
>> an alternative interface to access the repository. IMO
>> sling should only use the jcr api.
>
> It came up as a potential option that Sling might be interested in.

Sounds like Sling isn't interested in this, so let's drop the idea.

BR,

Jukka Zitting

Re: Notes from the Oakathon

Posted by Bertrand Delacretaz <bd...@apache.org>.
On Thu, Jul 5, 2012 at 1:24 PM, Thomas Mueller <mu...@adobe.com> wrote:
> ...the plan is to support additional index mechanisms using a plugin
> mechanism. I know there is a person working on Solr support, and we plan
> to support Lucene and Tika (to extract text and metadata) as well. We
> might want to support 'native' MongoDb indexes at some point (if the data
> is stored in MongoDb).
>
> Just I didn't think / know that Sling *itself* would have its own index
> mechanism....

ok, got it now, thanks for clarifying!

-Bertrand

Re: Notes from the Oakathon

Posted by Thomas Mueller <mu...@adobe.com>.
Hi,

>>I'm not sure what you mean here. I don't see Sling having it's own index
>> *mechanism*. Probably Sling will need specific indexes, but that's just
>> configuration, and not implementation (code)...
>
>Being able to integrate totally different indexing mechanisms can be
>useful if Oak can provide such extension points. For example to
>integrate different types of indexing systems (Solr, Stanbol,
>image/media indexing) so as to be able to query them with the same API
>that's used for the native repository indexes.
>
>Such indexes can of course be integrated in higher layers, but if the
>Oak design allows them that would be useful.

Yes, the plan is to support additional index mechanisms using a plugin
mechanism. I know there is a person working on Solr support, and we plan
to support Lucene and Tika (to extract text and metadata) as well. We
might want to support 'native' MongoDb indexes at some point (if the data
is stored in MongoDb).

Just I didn't think / know that Sling *itself* would have its own index
mechanism.

Regards,
Thomas


Re: Notes from the Oakathon

Posted by Bertrand Delacretaz <bd...@apache.org>.
On Thu, Jul 5, 2012 at 12:19 PM, Thomas Mueller <mu...@adobe.com> wrote:
> About indexing:
>
>> Another potential Sling extension would be a custom Oak index provider
>> for optimizing the kinds of queries Sling uses.
>
> I'm not sure what you mean here. I don't see Sling having it's own index
> *mechanism*. Probably Sling will need specific indexes, but that's just
> configuration, and not implementation (code)...

Being able to integrate totally different indexing mechanisms can be
useful if Oak can provide such extension points. For example to
integrate different types of indexing systems (Solr, Stanbol,
image/media indexing) so as to be able to query them with the same API
that's used for the native repository indexes.

Such indexes can of course be integrated in higher layers, but if the
Oak design allows them that would be useful.

-Bertrand

Re: Notes from the Oakathon

Posted by Thomas Mueller <mu...@adobe.com>.
Hi,

I agree with Felix. I don't currently see a need for Sling to "bypass" the
JCR API. If that would be needed, there is something wrong with the JCR
API or the JCR implementation.

About indexing:

> Another potential Sling extension would be a custom Oak index provider
> for optimizing the kinds of queries Sling uses.

I'm not sure what you mean here. I don't see Sling having it's own index
*mechanism*. Probably Sling will need specific indexes, but that's just
configuration, and not implementation (code). Sling should create the
indexes it needs just like any user application. I think indexes should be
configured within the repository (as nodes), so that the regular JCR API
could be used. To simplify the index configuration, we might create a
"index config tool / library" (with it's own API), but this tool should
just use the JCR API internally, no shortcut directly into Oak (for
multiple reasons, for example so we don't need a separate remoting for
it). Such a tool could be used by any kind of application, it's just
helper classes to create / manipulate nodes within a repository; it would
use the regular JCR API.

Regards,
Thomas


Re: Notes from the Oakathon

Posted by Felix Meschberger <fm...@adobe.com>.
Hi Jukka,

Am 04.07.2012 um 17:43 schrieb Jukka Zitting:

> Hi,
> 
> On Wed, Jul 4, 2012 at 5:35 PM, Stefan Guggisberg
> <st...@gmail.com> wrote:
>> On Wed, Jul 4, 2012 at 2:37 PM, Michael Dürig <md...@apache.org> wrote:
>>> - HTTP bindings (OAK-103, OAK-104)
>>>  * Provide extension points for Sling to hook into. (See also the
>> 
>> i don't remember any discussions that sling should be using
>> an alternative interface to access the repository. IMO
>> sling should only use the jcr api.

Absolutely.

In my not so humble opinion, Sling must stick to the JCR API and not fuzz with any Oak API and therefore not with any Oak HTTP binding.


> 
> It came up as a potential option that Sling might be interested in.

I already said before, that Sling should not do it and is not currently discussing it.

> 
> For example, many Sling components are currently having a hard time
> with the JCR observation feature, and giving them access to lower
> level features in Oak (or alternatively implementing the relevant
> Sling features directly as an Oak plugin) could simplify things a lot.

JCR Observation on a spec level has issues which have to be fixed on a spec level. Jackrabbit observation implementation has other issues, which habe to be fixed in the Jackrabbit implementation.
 
Neither of both issues will ever trigger Sling to bypass the JCR API.

> 
> Another potential Sling extension would be a custom Oak index provider
> for optimizing the kinds of queries Sling uses.

This has nothing to do with the Http binding.

If Sling would do such a thing, Sling would hook into the regular Oak API.

> 
> BR,
> 
> Jukka Zitting


Re: Notes from the Oakathon

Posted by Carsten Ziegeler <cz...@apache.org>.
2012/7/4 Jukka Zitting <ju...@gmail.com>:
> Hi,
>
>
> It came up as a potential option that Sling might be interested in.
>
> For example, many Sling components are currently having a hard time
> with the JCR observation feature, and giving them access to lower
> level features in Oak (or alternatively implementing the relevant
> Sling features directly as an Oak plugin) could simplify things a lot.

>From the peanut gallery: I'm not aware of problems with the JCR API in Sling

>
> Another potential Sling extension would be a custom Oak index provider
> for optimizing the kinds of queries Sling uses.

While this might make sense, that wouldn't be tied to Sling but any
client of OAK.

Carsten

-- 
Carsten Ziegeler
cziegeler@apache.org

Re: Notes from the Oakathon

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Wed, Jul 4, 2012 at 5:35 PM, Stefan Guggisberg
<st...@gmail.com> wrote:
> On Wed, Jul 4, 2012 at 2:37 PM, Michael Dürig <md...@apache.org> wrote:
>> - HTTP bindings (OAK-103, OAK-104)
>>   * Provide extension points for Sling to hook into. (See also the
>
> i don't remember any discussions that sling should be using
> an alternative interface to access the repository. IMO
> sling should only use the jcr api.

It came up as a potential option that Sling might be interested in.

For example, many Sling components are currently having a hard time
with the JCR observation feature, and giving them access to lower
level features in Oak (or alternatively implementing the relevant
Sling features directly as an Oak plugin) could simplify things a lot.

Another potential Sling extension would be a custom Oak index provider
for optimizing the kinds of queries Sling uses.

BR,

Jukka Zitting

Re: Notes from the Oakathon

Posted by Stefan Guggisberg <st...@gmail.com>.
On Wed, Jul 4, 2012 at 2:37 PM, Michael Dürig <md...@apache.org> wrote:
>
> Hi,
>
> Last week some of us met for another small Oakathon in Basel. Below is a
> list of topics that where touched and worked on and the related Jira issues.
> Any additional input is appreciated.
>
> - Observation (OAK-144):
>   * Observation events don't report every single event but rather a
> consolidated view at a given time (basically a a diff). Rational:
> performance on heavy write systems, (order of) individual events not
> clear/different for different cluster nodes in clustered setup.
>   * Implement user data support on a best effort basis. In a clustered setup
> this can't be supported. See also
> http://markmail.org/message/osxupy3twc3pyild
>   * oak-core should provide lightweight mechanism for clients to discover
> changes. oak-jcr can use this to implement JCR observation on top and trade
> off performance implications as needed.
>
> - Internal value handling
>   * To reduce GC overhead we should try to keep the number of small objects
> down
>   * Possibly merge CoreValue into PropertyState
>   * Use flyweight instances for common values/properties
>
> - HTTP bindings (OAK-103, OAK-104)
>   * Provide extension points for Sling to hook into. (See also the

i don't remember any discussions that sling should be using
an alternative interface to access the repository. IMO
sling should only use the jcr api.

cheers
stefan

> discussion on @oak-dev: http://markmail.org/message/tzpbxf5zduybezya.)
>
> - Full text index (OAK-154)
>   * Added an initial Lucene index stored under /jcr:system/oak:lucene in the
> repository (final location(s) TBD)
>   * Index updates in a CommitEditor hook, so the index is always up to date
> with latest content changes
>     + TODO: How to postpone time-consuming indexing operations like full
> text extraction
>     + TODO: How to merge conflicts in the search index?
>   * Basic QueryIndexProvider allows the existing query engine to leverage
> the Lucene index
>     + TODO: Integrate with the rest of the build
>
> - Merge/rebase logic handling (OAK-157)
>   * The Lucene index work encountered some issues with the way we handle the
> merge/rebase operations
>   * Need to look deeper into that over the next few weeks
>
> Michael