You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by Radu Cotescu <ra...@apache.org> on 2016/07/15 13:24:02 UTC

[DISCUSS] Path filtering support for ResourceChangeListeners

Hello,

The current release of the Sling API
org.apache.sling.api.resource.observation.ResourceChangeListener
(org.apache.sling.api.resource.observation;version=1.0.0) specification
does not provide any kind of support for path filtering or pattern
matching. As I understand it, the future regarding Resource observation is
to switch from registering OSGi EventHandlers to registering
ResourceChangeListeners. If one concept replaces the other, then we need to
find an acceptable solution for the filtering support that EventHandlers
brought to the table [0][1].

The implementation I proposed [2] relied on adding support for the Glob
pattern matching provided by java.nio.file.PathMatcher (albeit I only
thought of a sub-set of it). What syntax would you prefer and why? Do we
need to support the full Glob syntax? Would a sub-set be enough? Do we want
to also support RegEx? Should we actually filter everything directly in the
ResourceChangeListener#onChange and not care about providing support for
filtering in the service's configuration?

Thanks,
Radu

[0] -
https://osgi.org/javadoc/r6/cmpn/org/osgi/service/event/EventHandler.html
[1] -
https://osgi.org/javadoc/r6/cmpn/org/osgi/service/event/EventConstants.html#EVENT_FILTER
[2] -
https://github.com/apache/sling/commit/4264dc16205abab300d041d15524c6d996b9d40a

RE: [DISCUSS] Path filtering support for ResourceChangeListeners

Posted by Stefan Seifert <ss...@pro-vision.de>.
i personally would prefer using regex instead of glob syntax, because other areas in sling use it as well for path matching (e.g. sling mapping).

about the performance implications where the filtering should take place i cannot say much currently.

stefan

>-----Original Message-----
>From: Radu Cotescu [mailto:radu@apache.org]
>Sent: Friday, July 15, 2016 3:24 PM
>To: Sling Dev
>Subject: [DISCUSS] Path filtering support for ResourceChangeListeners
>
>Hello,
>
>The current release of the Sling API
>org.apache.sling.api.resource.observation.ResourceChangeListener
>(org.apache.sling.api.resource.observation;version=1.0.0) specification
>does not provide any kind of support for path filtering or pattern
>matching. As I understand it, the future regarding Resource observation is
>to switch from registering OSGi EventHandlers to registering
>ResourceChangeListeners. If one concept replaces the other, then we need to
>find an acceptable solution for the filtering support that EventHandlers
>brought to the table [0][1].
>
>The implementation I proposed [2] relied on adding support for the Glob
>pattern matching provided by java.nio.file.PathMatcher (albeit I only
>thought of a sub-set of it). What syntax would you prefer and why? Do we
>need to support the full Glob syntax? Would a sub-set be enough? Do we want
>to also support RegEx? Should we actually filter everything directly in the
>ResourceChangeListener#onChange and not care about providing support for
>filtering in the service's configuration?
>
>Thanks,
>Radu
>
>[0] -
>https://osgi.org/javadoc/r6/cmpn/org/osgi/service/event/EventHandler.html
>[1] -
>https://osgi.org/javadoc/r6/cmpn/org/osgi/service/event/EventConstants.html
>#EVENT_FILTER
>[2] -
>https://github.com/apache/sling/commit/4264dc16205abab300d041d15524c6d996b9
>d40a

RE: [DISCUSS] Path filtering support for ResourceChangeListeners

Posted by Oliver Lietz <ap...@oliverlietz.de>.
On Tuesday 19 July 2016 17:05:09 Stefan Seifert wrote:
> what about option 2 + 4 together (but only implementiong 'glob' currently)?
> 
> then it would be:
> - easy to detect if the user wishs to use a pattern instead of a fixed path
> by looking for a prefix (currently this is done by magically looking for
> special characters)
 - possible to add regex or other pattern support later
> without breaking compatibility - gives us a quick start with only
> supporting glob currently

+1

O.

> stefan
[...]


Re: [DISCUSS] Path filtering support for ResourceChangeListeners

Posted by Radu Cotescu <ra...@apache.org>.
Got it, thanks!

On Wed, 20 Jul 2016 at 14:01 Carsten Ziegeler <cz...@apache.org> wrote:

> > Hi,
> >
> > I'm not sure if options 2 + 4 work together without changing the API yet
> > one more time when / if we decide to also support regex (albeit just a
> > minor increase). I have nothing against supporting them from the
> beginning
> > - unfortunately we don't have a clear separation between API and
> > implementation here and the API has to anyways state what it supports.
> >
> > This means that a ResourceChangeListener will support the following ways
> > for defining a matcher:
> >
> > 1. explicit paths, absolute or relative to search paths
> > 2. '.' , for the cases when the listener is interested in all changes
> under
> > the search paths
> > 3. limited glob pattern matching ('*', '**'), by prefixing the pattern
> with
> > 'glob:'
> > 4. regex matching, by prefixing the pattern with 'regex:'
> >
> > Did I miss something?
> >
>
> Looks good, as stated previously I would prefer to not support regex:
> for now. We can add it in a later version if really required. And by
> using the prefixes we can add it without breaking existing configurations.
>
> Carsten
>

Re: [DISCUSS] Path filtering support for ResourceChangeListeners

Posted by Carsten Ziegeler <cz...@apache.org>.
> Hi,
> 
> I'm not sure if options 2 + 4 work together without changing the API yet
> one more time when / if we decide to also support regex (albeit just a
> minor increase). I have nothing against supporting them from the beginning
> - unfortunately we don't have a clear separation between API and
> implementation here and the API has to anyways state what it supports.
> 
> This means that a ResourceChangeListener will support the following ways
> for defining a matcher:
> 
> 1. explicit paths, absolute or relative to search paths
> 2. '.' , for the cases when the listener is interested in all changes under
> the search paths
> 3. limited glob pattern matching ('*', '**'), by prefixing the pattern with
> 'glob:'
> 4. regex matching, by prefixing the pattern with 'regex:'
> 
> Did I miss something?
> 

Looks good, as stated previously I would prefer to not support regex:
for now. We can add it in a later version if really required. And by
using the prefixes we can add it without breaking existing configurations.

Carsten

> Thanks,
> Radu
> 
> On Wed, 20 Jul 2016 at 07:09 Carsten Ziegeler <cz...@apache.org> wrote:
> 
>>> what about option 2 + 4 together (but only implementiong 'glob'
>> currently)?
>>>
>>> then it would be:
>>> - easy to detect if the user wishs to use a pattern instead of a fixed
>> path by looking for a prefix (currently this is done by magically looking
>> for special characters)
>>> - possible to add regex or other pattern support later without breaking
>> compatibility
>>> - gives us a quick start with only supporting glob currently
>>>
>> Sounds good to me
>>
>> Carsten
>>
>>
>>
>> --
>> Carsten Ziegeler
>> Adobe Research Switzerland
>> cziegeler@apache.org
>>
>>
> 


 

-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org


Re: [DISCUSS] Path filtering support for ResourceChangeListeners

Posted by Radu Cotescu <ra...@apache.org>.
Hi,

I'm not sure if options 2 + 4 work together without changing the API yet
one more time when / if we decide to also support regex (albeit just a
minor increase). I have nothing against supporting them from the beginning
- unfortunately we don't have a clear separation between API and
implementation here and the API has to anyways state what it supports.

This means that a ResourceChangeListener will support the following ways
for defining a matcher:

1. explicit paths, absolute or relative to search paths
2. '.' , for the cases when the listener is interested in all changes under
the search paths
3. limited glob pattern matching ('*', '**'), by prefixing the pattern with
'glob:'
4. regex matching, by prefixing the pattern with 'regex:'

Did I miss something?

Thanks,
Radu

On Wed, 20 Jul 2016 at 07:09 Carsten Ziegeler <cz...@apache.org> wrote:

> > what about option 2 + 4 together (but only implementiong 'glob'
> currently)?
> >
> > then it would be:
> > - easy to detect if the user wishs to use a pattern instead of a fixed
> path by looking for a prefix (currently this is done by magically looking
> for special characters)
> > - possible to add regex or other pattern support later without breaking
> compatibility
> > - gives us a quick start with only supporting glob currently
> >
> Sounds good to me
>
> Carsten
>
>
>
> --
> Carsten Ziegeler
> Adobe Research Switzerland
> cziegeler@apache.org
>
>

Re: [DISCUSS] Path filtering support for ResourceChangeListeners

Posted by Carsten Ziegeler <cz...@apache.org>.
> what about option 2 + 4 together (but only implementiong 'glob' currently)?
> 
> then it would be:
> - easy to detect if the user wishs to use a pattern instead of a fixed path by looking for a prefix (currently this is done by magically looking for special characters)
> - possible to add regex or other pattern support later without breaking compatibility
> - gives us a quick start with only supporting glob currently
> 
Sounds good to me

Carsten

 

-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org


RE: [DISCUSS] Path filtering support for ResourceChangeListeners

Posted by Stefan Seifert <ss...@pro-vision.de>.
what about option 2 + 4 together (but only implementiong 'glob' currently)?

then it would be:
- easy to detect if the user wishs to use a pattern instead of a fixed path by looking for a prefix (currently this is done by magically looking for special characters)
- possible to add regex or other pattern support later without breaking compatibility
- gives us a quick start with only supporting glob currently

stefan

>-----Original Message-----
>From: Radu Cotescu [mailto:radu@apache.org]
>Sent: Tuesday, July 19, 2016 6:26 PM
>To: Sling Dev
>Subject: Re: [DISCUSS] Path filtering support for ResourceChangeListeners
>
>Hello again,
>
>I'd like to reach to a conclusion for this feature and we currently have
>the following options:
>
>1. use RegEx for these filters, as proposed by Stefan Seifert - we're
>already using RegEx for Sling Mapping
>2. use a limited glob pattern matching, allowing just '**', '*', as
>proposed by Carsten
>3. use the full glob pattern matching syntax defined at [4]
>4. support both regex and glob, but then force consumers to use the 'glob:'
>/ 'regex:' prefixes for their patterns
>
>I'm still not a big supporter of RegEx usage in this case, due to their
>complexity and potential to match more changes than needed due to simple
>mistakes in the syntax: compare '/apps/**/*.html' with
>'^\/apps\/[a-zA-Z0-9:_\-\.]*\/?[a-zA-Z0-9:_\-]+\.html$'. However if we
>really need to support RegEx I'd prefer option 4, where at least consumers
>can write simpler matching patterns if needed.
>
>Thanks,
>Radu
>
>On Mon, 18 Jul 2016 at 15:37 Radu Cotescu <ra...@apache.org> wrote:
>
>> Hi,
>>
>> I'm also a supporter of the glob pattern matching, since those filters
>are
>> easier to write than RegEx [3] (and not just because we'd have to escape
>> every '/'). We could try to support the full syntax described at [4],
>> though, if we need more flexibility.
>>
>> Regards,
>> Radu
>>
>> [3] - https://xkcd.com/1171/
>> [4] -
>>
>https://docs.oracle.com/javase/7/docs/api/java/nio/file/FileSystem.html#get
>PathMatcher(java.lang.String
>> )
>>
>> On Sat, 16 Jul 2016 at 17:56 Carsten Ziegeler <cz...@apache.org>
>> wrote:
>>
>>> I think we should keep it simple and model it based on existing use
>cases.
>>> So far, the only pattern matching which comes to my mind is matching
>>> based on the extension. Basically everything that is caching something
>>> requires this, like the jsp engine being interested in changes of *.jsp
>>> files etc.
>>> Apart from that listeners are usually interested in changes in a
>>> specific sub tree, but without any additional filtering.
>>>
>>> Therefore I think the **, * matching similar to what we know from Ant or
>>> Maven or other tools should be enough.
>>>
>>> I wouldn't go with more powerful matching as the idea of the RCLs is
>>> that the filter matching is done by the underlying storage provider,
>>> e.g. Oak. This allows to delegate the heavy work to the storage and
>>> reduce the number of events send by the storage to Sling. Of course, if
>>> the storage can't filter itself, the Sling provider implementation can
>>> still do an additional filtering, but that might be rather expensive.
>>>
>>> Regards
>>> Carsten
>>>
>>> --
>>> Carsten Ziegeler
>>> Adobe Research Switzerland
>>> cziegeler@apache.org
>>>
>>>

Re: [DISCUSS] Path filtering support for ResourceChangeListeners

Posted by Radu Cotescu <ra...@apache.org>.
Hello again,

I'd like to reach to a conclusion for this feature and we currently have
the following options:

1. use RegEx for these filters, as proposed by Stefan Seifert - we're
already using RegEx for Sling Mapping
2. use a limited glob pattern matching, allowing just '**', '*', as
proposed by Carsten
3. use the full glob pattern matching syntax defined at [4]
4. support both regex and glob, but then force consumers to use the 'glob:'
/ 'regex:' prefixes for their patterns

I'm still not a big supporter of RegEx usage in this case, due to their
complexity and potential to match more changes than needed due to simple
mistakes in the syntax: compare '/apps/**/*.html' with
'^\/apps\/[a-zA-Z0-9:_\-\.]*\/?[a-zA-Z0-9:_\-]+\.html$'. However if we
really need to support RegEx I'd prefer option 4, where at least consumers
can write simpler matching patterns if needed.

Thanks,
Radu

On Mon, 18 Jul 2016 at 15:37 Radu Cotescu <ra...@apache.org> wrote:

> Hi,
>
> I'm also a supporter of the glob pattern matching, since those filters are
> easier to write than RegEx [3] (and not just because we'd have to escape
> every '/'). We could try to support the full syntax described at [4],
> though, if we need more flexibility.
>
> Regards,
> Radu
>
> [3] - https://xkcd.com/1171/
> [4] -
> https://docs.oracle.com/javase/7/docs/api/java/nio/file/FileSystem.html#getPathMatcher(java.lang.String
> )
>
> On Sat, 16 Jul 2016 at 17:56 Carsten Ziegeler <cz...@apache.org>
> wrote:
>
>> I think we should keep it simple and model it based on existing use cases.
>> So far, the only pattern matching which comes to my mind is matching
>> based on the extension. Basically everything that is caching something
>> requires this, like the jsp engine being interested in changes of *.jsp
>> files etc.
>> Apart from that listeners are usually interested in changes in a
>> specific sub tree, but without any additional filtering.
>>
>> Therefore I think the **, * matching similar to what we know from Ant or
>> Maven or other tools should be enough.
>>
>> I wouldn't go with more powerful matching as the idea of the RCLs is
>> that the filter matching is done by the underlying storage provider,
>> e.g. Oak. This allows to delegate the heavy work to the storage and
>> reduce the number of events send by the storage to Sling. Of course, if
>> the storage can't filter itself, the Sling provider implementation can
>> still do an additional filtering, but that might be rather expensive.
>>
>> Regards
>> Carsten
>>
>> --
>> Carsten Ziegeler
>> Adobe Research Switzerland
>> cziegeler@apache.org
>>
>>

Re: [DISCUSS] Path filtering support for ResourceChangeListeners

Posted by Radu Cotescu <ra...@apache.org>.
Hi,

I'm also a supporter of the glob pattern matching, since those filters are
easier to write than RegEx [3] (and not just because we'd have to escape
every '/'). We could try to support the full syntax described at [4],
though, if we need more flexibility.

Regards,
Radu

[3] - https://xkcd.com/1171/
[4] -
https://docs.oracle.com/javase/7/docs/api/java/nio/file/FileSystem.html#getPathMatcher(java.lang.String
)

On Sat, 16 Jul 2016 at 17:56 Carsten Ziegeler <cz...@apache.org> wrote:

> I think we should keep it simple and model it based on existing use cases.
> So far, the only pattern matching which comes to my mind is matching
> based on the extension. Basically everything that is caching something
> requires this, like the jsp engine being interested in changes of *.jsp
> files etc.
> Apart from that listeners are usually interested in changes in a
> specific sub tree, but without any additional filtering.
>
> Therefore I think the **, * matching similar to what we know from Ant or
> Maven or other tools should be enough.
>
> I wouldn't go with more powerful matching as the idea of the RCLs is
> that the filter matching is done by the underlying storage provider,
> e.g. Oak. This allows to delegate the heavy work to the storage and
> reduce the number of events send by the storage to Sling. Of course, if
> the storage can't filter itself, the Sling provider implementation can
> still do an additional filtering, but that might be rather expensive.
>
> Regards
> Carsten
>
> --
> Carsten Ziegeler
> Adobe Research Switzerland
> cziegeler@apache.org
>
>

Re: [DISCUSS] Path filtering support for ResourceChangeListeners

Posted by Michael Dürig <md...@apache.org>.

On 16.7.16 5:56 , Carsten Ziegeler wrote:
> I wouldn't go with more powerful matching as the idea of the RCLs is
> that the filter matching is done by the underlying storage provider,
> e.g. Oak. This allows to delegate the heavy work to the storage and
> reduce the number of events send by the storage to Sling.

And Oak already supports some form of globbing [1].

Michael

[1] 
https://github.com/mduerig/jackrabbit-oak/blob/875d538a4e023fa31aa1c5b90574b6d35bb1569c/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/observation/filter/GlobbingPathFilter.java#L60-L60

Re: [DISCUSS] Path filtering support for ResourceChangeListeners

Posted by Carsten Ziegeler <cz...@apache.org>.
I think we should keep it simple and model it based on existing use cases.
So far, the only pattern matching which comes to my mind is matching
based on the extension. Basically everything that is caching something
requires this, like the jsp engine being interested in changes of *.jsp
files etc.
Apart from that listeners are usually interested in changes in a
specific sub tree, but without any additional filtering.

Therefore I think the **, * matching similar to what we know from Ant or
Maven or other tools should be enough.

I wouldn't go with more powerful matching as the idea of the RCLs is
that the filter matching is done by the underlying storage provider,
e.g. Oak. This allows to delegate the heavy work to the storage and
reduce the number of events send by the storage to Sling. Of course, if
the storage can't filter itself, the Sling provider implementation can
still do an additional filtering, but that might be rather expensive.

Regards
Carsten

-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org


Re: [DISCUSS] Path filtering support for ResourceChangeListeners

Posted by Radu Cotescu <ra...@apache.org>.
BTW, the issue is https://issues.apache.org/jira/browse/SLING-5837.

On Fri, 15 Jul 2016 at 15:24 Radu Cotescu <ra...@apache.org> wrote:

> Hello,
>
> The current release of the Sling API
> org.apache.sling.api.resource.observation.ResourceChangeListener
> (org.apache.sling.api.resource.observation;version=1.0.0) specification
> does not provide any kind of support for path filtering or pattern
> matching. As I understand it, the future regarding Resource observation is
> to switch from registering OSGi EventHandlers to registering
> ResourceChangeListeners. If one concept replaces the other, then we need to
> find an acceptable solution for the filtering support that EventHandlers
> brought to the table [0][1].
>
> The implementation I proposed [2] relied on adding support for the Glob
> pattern matching provided by java.nio.file.PathMatcher (albeit I only
> thought of a sub-set of it). What syntax would you prefer and why? Do we
> need to support the full Glob syntax? Would a sub-set be enough? Do we want
> to also support RegEx? Should we actually filter everything directly in the
> ResourceChangeListener#onChange and not care about providing support for
> filtering in the service's configuration?
>
> Thanks,
> Radu
>
> [0] -
> https://osgi.org/javadoc/r6/cmpn/org/osgi/service/event/EventHandler.html
> [1] -
> https://osgi.org/javadoc/r6/cmpn/org/osgi/service/event/EventConstants.html#EVENT_FILTER
> [2] -
> https://github.com/apache/sling/commit/4264dc16205abab300d041d15524c6d996b9d40a
>
>
>