You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by Stefan Seifert <ss...@pro-vision.de> on 2017/03/14 13:03:57 UTC

[contentparser] API for simple content representation

after the name is settled starting a new thread about the api oft he new "JCR Content Parser" component [1].
the previous implementation used a simple nested map approach for representing the parsed content.

the following feedback was open:
1. konrad: JCR allows to have a child node and property of a node with the same name.
2. bertrand: it would be useful to have the "getDeep" or "getByPath" method directly in the API

i've create a PR for review which addresses both issues and generally introduces a better separation of child nodes and properties by introducing a new interface. i did not use the term "Resource" for the new hierarchy item interface to avoid confusion with the Sling API. i added a getName() method but not much more to keep it really simple.

please give [2] as short review from the API perspective.

stefan

[1] https://lists.apache.org/thread.html/8c7b540d266f975308681513f1896b45944cace4fc1eea2168cf345b@%3Cdev.sling.apache.org%3E
[2] https://github.com/apache/sling/pull/202



Re: [contentparser] API for simple content representation

Posted by Carsten Ziegeler <cz...@apache.org>.
Lgtm, thanks Stefan!

Carsten

Stefan Seifert wrote
> i've prepare a new PR with *only* a streaming API [1].
> please have a new review on the API part.
> 
> it has a very simple ContentHandler interface which reports every resource found with its map of properties - ordered as found in the source.
> 
> to keep the existing unit tests i added the handler that converts the stream API to a map representation to the *unit test* code - this is the code which will then be duplicated across the modules that need a map representation. [2]
> 
> when this is the way we want to go i will commit it.
> 
> stefan
> 
> [1] https://github.com/apache/sling/pull/203
> [2] https://github.com/stefanseifert/sling/tree/feature/SLING-6592-contentparser-stream-api/bundles/jcr/contentparser/src/test/java/org/apache/sling/jcr/contentparser/impl/mapsupport
> 
> 
> 
>> -----Original Message-----
>> From: Carsten Ziegeler [mailto:cziegeler@apache.org]
>> Sent: Wednesday, March 15, 2017 3:35 PM
>> To: dev@sling.apache.org
>> Subject: Re: [contentparser] API for simple content representation
>>
>> Stefan Seifert wrote
>>>
>>>> A streaming API which defines a handler with a method that gets the name
>>>> (or path?) and the map of properties would avoid having this additional
>>>> tree api. I don't see a huge need to be able to have an api to traverse
>>>> the parsed content. If any user of the contentparser really needs it, it
>>>> can use whatever it wants.
>>>
>>> a Stream API would be a good solution for one half of the use cases.
>>> the other half benefits from a Traversing API - we may leave it to the
>> other bundles and live with some code duplication.
>>>
>>> before dropping traversing API completely - what about this: we rename
>> the traversing API to something that makes it obvious that this is not yet-
>> another-content-representation-api but only a tool to hand over data - e.g.
>> "ParsingResult" instead of "Content"?
>>>
>>> the solution could be:
>>> - offer a Streaming API
>>> - offer an optional handler based on this streaming API that produces a
>> "ParsingResult" that has the minimal traversing capabilities as outlined in
>> the PR.
>>>
>>> we could even put this in two separate modules, but this feels a bit
>> over-engineered.
>>>
>>
>> I guess whatever we name it, it's another traversing API :)
>> I would prefer to go with just the streaming api and only if really
>> really really needed by many others start about thinking how to solve that.
>>
>> Carsten
>> --
>> Carsten Ziegeler
>> Adobe Research Switzerland
>> cziegeler@apache.org
> 


 

-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

RE: [contentparser] API for simple content representation

Posted by Stefan Seifert <ss...@pro-vision.de>.
i've prepare a new PR with *only* a streaming API [1].
please have a new review on the API part.

it has a very simple ContentHandler interface which reports every resource found with its map of properties - ordered as found in the source.

to keep the existing unit tests i added the handler that converts the stream API to a map representation to the *unit test* code - this is the code which will then be duplicated across the modules that need a map representation. [2]

when this is the way we want to go i will commit it.

stefan

[1] https://github.com/apache/sling/pull/203
[2] https://github.com/stefanseifert/sling/tree/feature/SLING-6592-contentparser-stream-api/bundles/jcr/contentparser/src/test/java/org/apache/sling/jcr/contentparser/impl/mapsupport



>-----Original Message-----
>From: Carsten Ziegeler [mailto:cziegeler@apache.org]
>Sent: Wednesday, March 15, 2017 3:35 PM
>To: dev@sling.apache.org
>Subject: Re: [contentparser] API for simple content representation
>
>Stefan Seifert wrote
>>
>>> A streaming API which defines a handler with a method that gets the name
>>> (or path?) and the map of properties would avoid having this additional
>>> tree api. I don't see a huge need to be able to have an api to traverse
>>> the parsed content. If any user of the contentparser really needs it, it
>>> can use whatever it wants.
>>
>> a Stream API would be a good solution for one half of the use cases.
>> the other half benefits from a Traversing API - we may leave it to the
>other bundles and live with some code duplication.
>>
>> before dropping traversing API completely - what about this: we rename
>the traversing API to something that makes it obvious that this is not yet-
>another-content-representation-api but only a tool to hand over data - e.g.
>"ParsingResult" instead of "Content"?
>>
>> the solution could be:
>> - offer a Streaming API
>> - offer an optional handler based on this streaming API that produces a
>"ParsingResult" that has the minimal traversing capabilities as outlined in
>the PR.
>>
>> we could even put this in two separate modules, but this feels a bit
>over-engineered.
>>
>
>I guess whatever we name it, it's another traversing API :)
>I would prefer to go with just the streaming api and only if really
>really really needed by many others start about thinking how to solve that.
>
>Carsten
>--
>Carsten Ziegeler
>Adobe Research Switzerland
>cziegeler@apache.org


Re: [contentparser] API for simple content representation

Posted by Carsten Ziegeler <cz...@apache.org>.
Stefan Seifert wrote
> 
>> A streaming API which defines a handler with a method that gets the name
>> (or path?) and the map of properties would avoid having this additional
>> tree api. I don't see a huge need to be able to have an api to traverse
>> the parsed content. If any user of the contentparser really needs it, it
>> can use whatever it wants.
> 
> a Stream API would be a good solution for one half of the use cases.
> the other half benefits from a Traversing API - we may leave it to the other bundles and live with some code duplication.
> 
> before dropping traversing API completely - what about this: we rename the traversing API to something that makes it obvious that this is not yet-another-content-representation-api but only a tool to hand over data - e.g. "ParsingResult" instead of "Content"?
> 
> the solution could be:
> - offer a Streaming API
> - offer an optional handler based on this streaming API that produces a "ParsingResult" that has the minimal traversing capabilities as outlined in the PR.
> 
> we could even put this in two separate modules, but this feels a bit over-engineered.
> 

I guess whatever we name it, it's another traversing API :)
I would prefer to go with just the streaming api and only if really
really really needed by many others start about thinking how to solve that.

Carsten
-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

RE: [contentparser] API for simple content representation

Posted by Stefan Seifert <ss...@pro-vision.de>.
>A streaming API which defines a handler with a method that gets the name
>(or path?) and the map of properties would avoid having this additional
>tree api. I don't see a huge need to be able to have an api to traverse
>the parsed content. If any user of the contentparser really needs it, it
>can use whatever it wants.

a Stream API would be a good solution for one half of the use cases.
the other half benefits from a Traversing API - we may leave it to the other bundles and live with some code duplication.

before dropping traversing API completely - what about this: we rename the traversing API to something that makes it obvious that this is not yet-another-content-representation-api but only a tool to hand over data - e.g. "ParsingResult" instead of "Content"?

the solution could be:
- offer a Streaming API
- offer an optional handler based on this streaming API that produces a "ParsingResult" that has the minimal traversing capabilities as outlined in the PR.

we could even put this in two separate modules, but this feels a bit over-engineered.

stefan

Re: [contentparser] API for simple content representation

Posted by Carsten Ziegeler <cz...@apache.org>.
Stefan Seifert wrote
> 
>> What is the exact purpose of this module? Or asked differently, what do
>> we expect clients do with the parsed content?
> 
> some background about contentparser in this initial thread [1]
> 
> my use cases are:
> 1. load content from JSON or JCR XML files into mock repositories in sling-mock (resourceresolver-mock, jcr-mock, oak) -> import to repository via Resource API (not JCR API)
> 2. load content from JSON or JCR XML files for mounting them in the fsresource provider -> hold data in a cheap in-memory-cache for developer machines mounting a couple of projects with fsresource
> 3. probably migrate the JSON content parsing part of jcr contenloader to this lib, it has the same parsing logic implemented currently
> 4. outside sling i've other projects which need this logic as well e.g. [2]
> 
> as 1. and 2. are maintained in two branches we've currently 4-5 active projects in sling that need this component and more projects outside, and we do not want to duplicate the logic again and again.
> 
> my first intention for the API was to have nearly no API, so i used nested maps for content representation. this created problems with the JCR specialty allowing node and property with same name, so this thread with the new proposal with a very thin layer above the nested maps. i do not want to make the API more complex. but we need some sort of interface to get the parsed content in a hierarchical form to process it further.
> 
> one perhaps open question is if a stream-based API is better than an "in-memory" map-based API. for use cases like contentloader and mock-import stream-based is better, for fsresource map-based is easier. of course it's possible to create a map with a stream-based API, but then helper methods like "getChild(path)" is duplicated again.
> 
Thanks for the explanations.

Certainly the JCR api is way to complicated for this use case and I
understand that the resource API is close, but returning a resource
would create some restrictions on it's usage. So using that would be
problematic.

I was wondering if it would work to have a parse method which gets a
resource resolver and maybe a root path. And the parser would then
simply parse the file and directly create the resources in the resource
tree. So this would rather be an importer than a parser, but it would
avoid having a intermediate tree representation. But that doesn't seem
to fit all use cases.

A streaming API which defines a handler with a method that gets the name
(or path?) and the map of properties would avoid having this additional
tree api. I don't see a huge need to be able to have an api to traverse
the parsed content. If any user of the contentparser really needs it, it
can use whatever it wants.

 Regards

Carsten

-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

RE: [contentparser] API for simple content representation

Posted by Stefan Seifert <ss...@pro-vision.de>.
>What is the exact purpose of this module? Or asked differently, what do
>we expect clients do with the parsed content?

some background about contentparser in this initial thread [1]

my use cases are:
1. load content from JSON or JCR XML files into mock repositories in sling-mock (resourceresolver-mock, jcr-mock, oak) -> import to repository via Resource API (not JCR API)
2. load content from JSON or JCR XML files for mounting them in the fsresource provider -> hold data in a cheap in-memory-cache for developer machines mounting a couple of projects with fsresource
3. probably migrate the JSON content parsing part of jcr contenloader to this lib, it has the same parsing logic implemented currently
4. outside sling i've other projects which need this logic as well e.g. [2]

as 1. and 2. are maintained in two branches we've currently 4-5 active projects in sling that need this component and more projects outside, and we do not want to duplicate the logic again and again.

my first intention for the API was to have nearly no API, so i used nested maps for content representation. this created problems with the JCR specialty allowing node and property with same name, so this thread with the new proposal with a very thin layer above the nested maps. i do not want to make the API more complex. but we need some sort of interface to get the parsed content in a hierarchical form to process it further.

one perhaps open question is if a stream-based API is better than an "in-memory" map-based API. for use cases like contentloader and mock-import stream-based is better, for fsresource map-based is easier. of course it's possible to create a map with a stream-based API, but then helper methods like "getChild(path)" is duplicated again.

stefan

[1] https://lists.apache.org/thread.html/8c7b540d266f975308681513f1896b45944cace4fc1eea2168cf345b@%3Cdev.sling.apache.org%3E
[2] http://devops.wcm.io/conga/


Re: [contentparser] API for simple content representation

Posted by Carsten Ziegeler <cz...@apache.org>.
Stefan Seifert wrote
> 
>> I like Stefan's Content proposal but it's really a ContentTree, right?
> 
> to be precise the current "Content" interface represents one node in the tree - so alternative names could be "ContentNode" or "ContentResource"?
> 

Sorry for joining the discussion so late, but do we really want to
create another tree api?
There is the JCR api, we have our resource abstraction, I assume there
are even more hidden in the bundles we use and now we want to create yet
another one with basically the same methods/behaviour?

What is the exact purpose of this module? Or asked differently, what do
we expect clients do with the parsed content?

Regards

Carsten

-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org

RE: [contentparser] API for simple content representation

Posted by Stefan Seifert <ss...@pro-vision.de>.
>I like Stefan's Content proposal but it's really a ContentTree, right?

to be precise the current "Content" interface represents one node in the tree - so alternative names could be "ContentNode" or "ContentResource"?

stefan

Re: [contentparser] API for simple content representation

Posted by Bertrand Delacretaz <bd...@apache.org>.
On Tue, Mar 14, 2017 at 2:11 PM, Stefan Seifert <ss...@pro-vision.de> wrote:
>> Konrad wrote:
>>...I would be in favour of reusing the Resource interface from Sling API...
>
> ...this would open a can of worms because we would have to support all of it including ResourceResolver etc.
> or we leave a lot of methods throwing exceptions or returning null which would not be nice either....

I agree with Stefan that it's good to be able to represent a content
tree in such utility libraries with no or minimal dependencies on
Sling. The other use case that I have in mind is serializing content
for sending to and receiving from external content processors that
should be very independent from Sling, and this is a good step in that
direction.

Using the Resource API introduces a lot of transient dependencies that
I much prefer leaving out for such cases.

I like Stefan's Content proposal but it's really a ContentTree, right?

And we might move that API to its own module to make it independently
reusable. Maybe later.

-Bertrand

Re: [contentparser] API for simple content representation

Posted by Robert Munteanu <ro...@apache.org>.
On Tue, 2017-03-14 at 14:19 +0100, Konrad Windszus wrote:
> > On 14 Mar 2017, at 14:11, Stefan Seifert <ss...@pro-vision.de>
> > wrote:
> > 
> > > > please give [2] as short review from the API perspective.
> > > 
> > > I just did it. Basically it looks good, but I would be in favour
> > > of reusing
> > > the Resource interface from Sling API.
> > 
> > uh, this would open a can of worms because we would have to support
> > all of it including ResourceResolver etc. or we leave a lot of
> > methods throwing exceptions or returning null which would not be
> > nice either.
> 
> Most of the developers are familiar with that interface, and since
> 80% of the methods also make sense for "offline" resources (https://s
> ling.apache.org/documentation/the-sling-engine/resources.html#what-
> is-a-resource) I think that would be the lesser of the two evils.�

I think that to have a proper implementation of the Resource interface
we also need to have a proper implementation of the ResourceResolver,
which is probably non-trivial.

Robert

Re: [contentparser] API for simple content representation

Posted by Konrad Windszus <ko...@gmx.de>.
> On 14 Mar 2017, at 14:11, Stefan Seifert <ss...@pro-vision.de> wrote:
> 
>>> please give [2] as short review from the API perspective.
>> 
>> I just did it. Basically it looks good, but I would be in favour of reusing
>> the Resource interface from Sling API.
> 
> uh, this would open a can of worms because we would have to support all of it including ResourceResolver etc. or we leave a lot of methods throwing exceptions or returning null which would not be nice either.
Most of the developers are familiar with that interface, and since 80% of the methods also make sense for "offline" resources (https://sling.apache.org/documentation/the-sling-engine/resources.html#what-is-a-resource) I think that would be the lesser of the two evils. 
> 
> stefan
> 


RE: [contentparser] API for simple content representation

Posted by Stefan Seifert <ss...@pro-vision.de>.
>> please give [2] as short review from the API perspective.
>
>I just did it. Basically it looks good, but I would be in favour of reusing
>the Resource interface from Sling API.

uh, this would open a can of worms because we would have to support all of it including ResourceResolver etc. or we leave a lot of methods throwing exceptions or returning null which would not be nice either.

stefan


Re: [contentparser] API for simple content representation

Posted by Konrad Windszus <ko...@gmx.de>.
> On 14 Mar 2017, at 14:03, Stefan Seifert <ss...@pro-vision.de> wrote:
> 
> after the name is settled starting a new thread about the api oft he new "JCR Content Parser" component [1].
> the previous implementation used a simple nested map approach for representing the parsed content.
> 
> the following feedback was open:
> 1. konrad: JCR allows to have a child node and property of a node with the same name.
> 2. bertrand: it would be useful to have the "getDeep" or "getByPath" method directly in the API
> 
> i've create a PR for review which addresses both issues and generally introduces a better separation of child nodes and properties by introducing a new interface. i did not use the term "Resource" for the new hierarchy item interface to avoid confusion with the Sling API. i added a getName() method but not much more to keep it really simple.
> 
> please give [2] as short review from the API perspective.
I just did it. Basically it looks good, but I would be in favour of reusing the Resource interface from Sling API.
> 
> stefan
> 
> [1] https://lists.apache.org/thread.html/8c7b540d266f975308681513f1896b45944cace4fc1eea2168cf345b@%3Cdev.sling.apache.org%3E
> [2] https://github.com/apache/sling/pull/202
> 
>