You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Alessandro Bologna <al...@gmail.com> on 2008/03/31 22:46:44 UTC

XML, SNS, and JCR

Hi all,

One of the most fascinating thing about the JCR is that it always gives more
to think.

What follows is a very long message that is tryng to make the point that
maybe we need another way to map XML to JCR and vice versa. Besides begin
long, it is probably even boring, and probably even naive in some parts, so
read it only if the topic matters to you...

So, the story goes that after that Jukka proposed if it would be worth
dropping support for Same Name Siblings, and knowing well how SNS are useful
in mapping XML documents in the JCR, I wondered if there was something that
was missing in the puzzle: XML has no issue with SNS, and XPATH (1.0 and 2.0)
are quite happy with them too. At the same time, thinking of David's
modeling suggestion "Beware of Same Name Siblings" seems to contradict the
usage experience of those who come from an XML background.

In other words, in XML is pretty normal to have:

<people>
<my:employee>
  <my:name first="John" last="Smith"/>
  <my:dob value="10/01/1970">
</my:employee>
<my:employee>
  <my:name first="Mary" last="Smith"/>
  <my:dob value="11/07/1973">
</my:employee>
</people>

while it would be unusual something like:

<people>
<john.smith>
  <my:name first="John" last="Smith"/>
  <my:dob value="10/01/1970">
</john.smith>
<mary.smith>
  <my:name first="Mary" last="Smith"/>
  <my:dob value="11/07/1973">
</mary.smith>
</people>

It's possible of course, just not the usual way people design XML.
By the way, in the examples above I am using an attribute-centric model just
for simplicity of comparison with the JCR properties.

The same considerations would apply if I were to use child elements (and
jcr:xmltext child nodes with a property jcr:xmlcharacters), but what matter
is that, in XML, *the element name is quite always mapped to the type, not
the instance*.

In JCR modeling, this can lead to all the well known issues with same name
siblings, so the approach is instead more "file" centric, where each element
(node) is given an unique identifier, *unless it's not needed*: for example
this (should) be ok in JCR:

people
|
+---john.smith
|     +---- my:name:
|     |       +--- -first: John
|     |       +--- -last: Smith
|     +---- my:dob:
|             +--- -value: 10/01/1970
+---mary.smith
      +---- my:name:
      |       +--- -first: Mary
      |       +--- -last: Smith
      +---- my:dob:
              +--- -value: 11/07/1973


In order to avoid SNS, the idea is to use a parent-unique id for node name,
where the conflict would arise, but it is not required for nodes that are
logically already unique in their parent's context (for instance, my:name
and my:dob). In this model, when needed, the node name can always be made
unique, adding SSN, DOB, or something else if needed.

This means that the XML/XPATH query

*/people/my:employee/my:name[@last='Smith']  *

would need to be rewritten as in JCR/XPATH:

*/jcr:root/people/*/my:name[@**last='Smith'**]*

because of course the node name is not known a priori, or (better) as

*/jcr:root/people/element(*,nt:base)/name[@**last='Smith'**]*

The second notation uses the XPATH 2.0 element() function, that allows to
select nodes of a specific type  (or of a type that is inherited from the
type). In XML, it uses the schema element name, in JCR, the node type.

If we were using custom node types, and let's assume that we do from now on,
then the JCR query above could have been written more specifically as:

*/jcr:root/people/element(*,my:employee)/my:name[@first='John'] *

assuming a simple CND such as:

[my:name] > nt:base
  - first: string
  - last:  string
[my:dob]  > nt:base
  - value: string

[my:employee] > nt:base
  + my:name = my:name
  + my:dob = my:dob

Incidentally, custom node types are quite essentials when we could have
several different types of nodes under 'people', for instance *my:employee*and
*my:freelancer*:

If I didn't have node types, and I wanted to find all the Smiths that are
not freelancer, and not having access to the parent axis (it's not required
in JCR), I would have to do:

*/jcr:root/people/*/my:name[@**last='Smith'**]

*and then, in Java, find out which one has not a freelancer parent. Besides
being tedious, it could be very inefficient.

In traditional XLM modelling and querying, and unless we wanted to take
advantage of inheritance, this would not be needed because the XPATH itself
would allow distinguishing between the cases:

*/jcr:root/people/my:employee/my:name[@**last='Smith'**] *

Of course, in both cases (JCR and XML) I could structure my data better and
separate employees from freelancers under different nodes (*my:employees*and
*my:freelancers*), and I would not have this problem; at the same time, when
you can have multiple criteria, orthogonal or not, it becomes quite complex
to choose which one is the best to be "hardwired" in the structure (what
about male/female, working/retired, etc).

The choice of what is driving the hierarchy and what is instead an attribute
(or a property) sometimes is not obvious, and often turns out to be not the
right one (when it's too late, typically...).

The choice of viewing JCR structures as XML is not a side effect, it's part
of the JCR specs, where it says that an XPATH query is run  against the
virtual XML document (6.6.4.10 and others). And an XML Document View is the
*normal *way to look at the data as XML. (Of course, System View is the one
to be used for round tripping, I know...).

At the same time, as we see, this special relationship that JCR has with XML
should not used to inspire the model, because SNS are complex to handle, and
therefore nodes should have as name a parent-unique "id" and not their
"type", and the element(*,my:type) function should be used wherever I really
intend to select by the type of the node.

Because of this, it is not unusual to have to write queries such as

*//element(*,my:type)/element(*,my:other-type)[element(*,my:last-type)] *

instead of

*//my:type/my:other-type[my:last-type]*

and this assuming that every node is strictly typed, which is not always
desirable or possible.

As another use case, in my application (yes, who cares?), XSLT stylesheets
can access the repository by using a (RESTful) type of query that is
expressed in JCR/XPATH, and they can work with the resulting document using
XML/XPATH. This means that for instance, if my node's XML representation
URI is (for instance)

*http://localhost/jcr/default/blogs/2008/myfirstpost/blog*

and the resulting document is:

<blog>
  <headline>test</headline>
  <body>
   <p>first paragraph</p>
   <p>second paragraph</p>
  </body>
</blog>

The nice things is that it's possible to use for instance

*http://localhost/jcr/default/blogs/2008/myfirstpost/blog/headline *

to get only the headline, or even:

*http://localhost/jcr/default/blogs/2008/*/blog/headline *

to get all the headlines in 2008.

What I could not do, if SNS were not there, is:

*http://localhost/jcr/default/blogs/2008/myfirstpost/blog/body/p[1]*

to get the first paragraph on my blog, or *
http://localhost/jcr/default/blogs/2008/*/blog/body/p[1]* to get all the
first paragraphs in all post in 2008. So, even when nodes have unique names
('*myfirstpost'*), at a certain level 'below'  same name siblings in the
form of tags are likely to appear, and it's a nice thing, because it allows
a seamless transition from the URI of a node representation as it is seen on
the server to the URI of the element that is being processed. In other
words, the URI space is continuous.

Still, the dilemma remains: why in JCR modeling is best practice to name
nodes with their contents, and in XML with their types?

What I wonder is if it would not be a good idea to* introduce another type
of Document View *(let's call it Normal View for  now), where *node types
are element names*, *properties are still attributes*, *and a jcr:name
pseudo-attribute is added* *(instead of jcr:primaryType) to represent the
node name.

*In this case, I could write my query with 'old style' XPATH 1.0 (minus of
course the order by), XML could still be used to inspire the model and SNS
would be avoided. And, I believe, queries would be both simpler and would
make more sense to XML developers, to the point that it would be easier to
migrate an XML centric application in the JCR model (with some caveats, of
course)

With this feature, the JCR structure above could be queried with XPATH
against it's virtual Normal View (in addition to the Document View):

<people>
  <my:employee jcr:name="john.smith">
    <my:name first="John" last="Smith"/>
    <my:dob value="10/01/1970">
  </my:employee>
  <my:employee jcr:name="john.smith">
    <my:name first="Mary" last="Smith"/>
    <my:dob value="11/07/1973">
  </my:employee>
</people>

so, for instance, i could write:

*//people//my:employee[2]/my:name* as an XPATH expression for the Normal
View to find my second employee,
*//people//my:employee[@id='john.smith']/my:dob* to find when the employee
(not the freelancer) with id john.smith was born

And what if no nodetypes are defined? Then the regular Document View based
JCR/XPATH would be probably better suited, as the intent of the alternative
Normal View is to express queries using XML style XPATH for nodes that are
typed, and to disambiguate the way that XML documents are seen once imported
in the JCR.

So what about importing and exporting this view?

In the JCR paradigm, or at least in Jackrabbit, importing XML (that is not
generated by a System View export) means to map each element to a node, each
attribute to a property. If the element does not have a jcr:primaryType
attribute, then the element is created as nt:unstructured, the attributes as
string and XML text nodes are created  as jcr:xmltext children with a single
property of type string (jcr:xmlcharacters). If instead the jcr:primaryType
attribute is present, then Jackrabbit tries to map the XML to the
corresponding nodetype, throwing an exception if it can't (for instance
because of a conflicting structure).

So, in addition to this behavior during import, another one could be
introduced:

*During import, each element that has a property jcr:name would be created
as a node with name equal to the value of jcr:name, and with a node type
equal to the element's name. If the node type is not present, it could be
either created on the fly (as inherited fom nt:unstructured), or an
exception could be thrown. Similarly, if an element does not have a
jcr:name, or has a jcr:name identical to a sibling, an exception could be
thrown, or a new id could be assigned silently.
*
For export, a new method exportNormalView() could be added to the already
present exportDocumentView() and exportSystemView() and would export a
materialized view of the virtual Normal View.

In this way, importing XML in the repository would not create SNS, the
element() function would be needed only when the type inheritance hierarchy
needs to be evaluated, and, most important, people would not be confused
anymore with modeling "the XML way" vs "the JCR way".

Finally, the technical question. Is there a simple way to extend the XPATH
parser to handle this type of queries? Or, has anybody had any experience
plugging Jaxen in the JCR? Everything else seems to be a pretty
straightforward thing to implement, even if just to see how it behaves in
the real world.

Of course, any thought, even an utterly critical thought, is welcome.
Alessandro

Re: XML, SNS, and JCR

Posted by Alessandro Bologna <al...@gmail.com>.
Thanks David,
if you say that's an interesting idea, then I may actaully start to believe
in it too ;).

Please see my answers inline below. And, since  I am at that, I need to fix
a couple of typos in my original message:

Where I wrote:

> so, for instance, i could write:
>
> *//people//my:employee[2]/my:name* as an XPATH expression for the Normal
> View to find my second employee,
> *//people//my:employee[@id='john.smith']/my:dob* to find when the employee
> (not the freelancer) with id john.smith was born
>

What I *really* meant was:

> so, for instance, i could write:
>
> *//people//my:employee[2]/my:name* as an XPATH expression for the Normal
> View to find my second employee,
> *//people//my:employee[@jcr:name='john.smith']/my:dob* to find when the
> employee (not the freelancer) with jcr:name john.smith was born
>


First of all, congratulations to the restful URLs that you mention in
> your post, which is something that i do not see very often. You
> may find that the URL mapping in Apache Sling [1] is very similar.
>

I know, I wish Sling had been announced a bit earlier... We already had our
first prototype out last summer and it was a bit too late to start with
Sling, but will certainly look for areas of synergy.


> As you mention the DocView is for round tripping arbitrary XML while
> the SysView is for round tripping arbitrary content. If I am not mistaken
> the "Normal View" would not allow either of the two, but would add a lot
> of value for an efficient way to deal with something that I would call
> "real-life JCR aware XML".


Well, you are right, and the intent is not really to round-trip, but to
provide a way to port existing XML applications to JCR, and to leverage the
JCR as a (very powerful) way to deal with extremely large XML structures.

As I mentioned already, in such paradigm, it's possible to restfully
transition from a node (representation) to another, thus effectively
navigate the repository either within its hierarchical structure, or through
references to other nodes (which can be expressed as paths or jcr
references), or with XPATH queries.

If a simple XSLT stylesheet is the user agent (but any other client
application that can use XML would do as well), it's trivial to process
virtually the entire repository, no matter how much large it is.

For instance, take this few lines of an hypothetical and oversimplified xslt
(2.0) stylesheet:

    <xsl:template match="/">
    <xsl:variable name="posts"
         select="document('
http://localhost/jcr/default/blogs/2007/*/headline')"/>
    <xsl:result-document href="
http://localhost/jcr/default/blogs/2007/index.html">
        <html>
            <head><title>2007 posts</title></head>
            <body>
            <ul>
                <xsl:apply-templates select="posts"/>
            </ul>
        </html>
    </xsl:result-document>
    </xsl:template>

    <xsl:template match="headline">
        <li><xsl:value-of select="."/></li>
    </xsl:template>

This alone, using a bit of restful GETting and PUTting, can create the index
page for my blog, post it on the URL I want, and it can run on another
server as well.

More in general, the use case is that of an organization with hundred of
thousand of XML documents that are organized in some sort of hierarchical
fashion, with reference or hyperlinks to each other, and that together form
a super-document that is really complex to manipulate with traditional
tools. Once you load them in the JCR, you can access all of them at once,
extract and transform what you want etc.

And, of course, the other intent is to mend the chasm between two worlds
before it becomes too large...


> I think it would be interesting to find out what the
> characteristics and limitations of such a view are both from an XML
> (import)
> and from JCR (export/query) perspective are. I assume we would end up
> with an the same limitations as the DocView from a JCR perspective
> and possibly with the limitation that the XML elements would have to
> match to pre-registered (possibly auto-defined & registered) node types.


The main limitation I can think of, when it comes to the Document View, is
that node type information is lost, and property arrays are a bit "squeezed"
in attributes. There may be other, of course, that I am not aware of. Are
there more?

To be honest, both limitations are not huge in my experience, even with the
traditional Document View, when you consider that the use case is to process
XML with JCR (and not vice versa).

If there are string sequences in attributes, they can very well become
arrays of properties if the corresponding nodetype says so, or stay as a
single string that happens to have some innocuous white spaces in there.
And, type information can be preserved as long a node types are defined
(maybe even importing the result of a XML Schema to CND conversion).

Now, with the Normal View, the advantage I would see is that I would not
need to rethink my document structure avoding SNS to be able to take
advantage of what the JCR offers (especially in terms of Java APIs). And my
existing XPATH queries would still work, but now they would work across
documents too.

This approach would allow both a content-first and a structure-first way of
thinking:

Content first? Just load your XMLs with the option to create empty,
unstructured nodetypes for you, and finally get that comprehensive view you
could never get before.

Structure first? Grab that schema (the one that that you have not updated in
the last two years), update it, import it in the JCR and go happy.

Thanks again for the attention and your input.
Alessandro

Re: XML, SNS, and JCR

Posted by David Nuescheler <da...@day.com>.
Hi Alessandro,

thanks for the very thoughtful and inspiring post.

I think you bring up many interesting points, it may even be worth
splitting things into various different conversations.

First of all, congratulations to the restful URLs that you mention in
your post, which is something that i do not see very often. You
may find that the URL mapping in Apache Sling [1] is very similar.

I have to admit that when I crafted the "Beware of SNS" [2] rule I
thought of people modelling in the content repository that come from
an database background and I will probably have to look at things
again from a XML perspective.

I think the approach to the "Normal View" is very intriguing, and after
thinking it through briefly I think it would not be too hard to implement
neither for import/export nor for XPath query and yet as you point
out would avoid the XML vs. JCR datamodel dilemma.

As you mention the DocView is for round tripping arbitrary XML while
the SysView is for round tripping arbitrary content. If I am not mistaken
the "Normal View" would not allow either of the two, but would add a lot
of value for an efficient way to deal with something that I would call
"real-life
JCR aware XML". I think it would be interesting to find out what the
characteristics and limitations of such a view are both from an XML (import)
and from JCR (export/query) perspective are. I assume we would end up
with an the same limitations as the DocView from a JCR perspective
and possibly with the limitation that the XML elements would have to
match to pre-registered (possibly auto-defined & registered) node types.

Is that correct?

Very interesting idea.

regards,
david



[1] http://incubator.apache.org/sling/site/index.html
[2] http://wiki.apache.org/jackrabbit/DavidsModel#head-1df0224190c265f5156f037eb3f20e314fa6c4a7

On Mon, Mar 31, 2008 at 10:46 PM, Alessandro Bologna
<al...@gmail.com> wrote:
> Hi all,
>
>  One of the most fascinating thing about the JCR is that it always gives more
>  to think.
>
>  What follows is a very long message that is tryng to make the point that
>  maybe we need another way to map XML to JCR and vice versa. Besides begin
>  long, it is probably even boring, and probably even naive in some parts, so
>  read it only if the topic matters to you...
>
>  So, the story goes that after that Jukka proposed if it would be worth
>  dropping support for Same Name Siblings, and knowing well how SNS are useful
>  in mapping XML documents in the JCR, I wondered if there was something that
>  was missing in the puzzle: XML has no issue with SNS, and XPATH (1.0 and 2.0)
>  are quite happy with them too. At the same time, thinking of David's
>  modeling suggestion "Beware of Same Name Siblings" seems to contradict the
>  usage experience of those who come from an XML background.
>
>  In other words, in XML is pretty normal to have:
>
>  <people>
>  <my:employee>
>   <my:name first="John" last="Smith"/>
>   <my:dob value="10/01/1970">
>  </my:employee>
>  <my:employee>
>   <my:name first="Mary" last="Smith"/>
>   <my:dob value="11/07/1973">
>  </my:employee>
>  </people>
>
>  while it would be unusual something like:
>
>  <people>
>  <john.smith>
>   <my:name first="John" last="Smith"/>
>   <my:dob value="10/01/1970">
>  </john.smith>
>  <mary.smith>
>   <my:name first="Mary" last="Smith"/>
>   <my:dob value="11/07/1973">
>  </mary.smith>
>  </people>
>
>  It's possible of course, just not the usual way people design XML.
>  By the way, in the examples above I am using an attribute-centric model just
>  for simplicity of comparison with the JCR properties.
>
>  The same considerations would apply if I were to use child elements (and
>  jcr:xmltext child nodes with a property jcr:xmlcharacters), but what matter
>  is that, in XML, *the element name is quite always mapped to the type, not
>  the instance*.
>
>  In JCR modeling, this can lead to all the well known issues with same name
>  siblings, so the approach is instead more "file" centric, where each element
>  (node) is given an unique identifier, *unless it's not needed*: for example
>  this (should) be ok in JCR:
>
>  people
>  |
>  +---john.smith
>  |     +---- my:name:
>  |     |       +--- -first: John
>  |     |       +--- -last: Smith
>  |     +---- my:dob:
>  |             +--- -value: 10/01/1970
>  +---mary.smith
>       +---- my:name:
>       |       +--- -first: Mary
>       |       +--- -last: Smith
>       +---- my:dob:
>               +--- -value: 11/07/1973
>
>
>  In order to avoid SNS, the idea is to use a parent-unique id for node name,
>  where the conflict would arise, but it is not required for nodes that are
>  logically already unique in their parent's context (for instance, my:name
>  and my:dob). In this model, when needed, the node name can always be made
>  unique, adding SSN, DOB, or something else if needed.
>
>  This means that the XML/XPATH query
>
>  */people/my:employee/my:name[@last='Smith']  *
>
>  would need to be rewritten as in JCR/XPATH:
>
>  */jcr:root/people/*/my:name[@**last='Smith'**]*
>
>  because of course the node name is not known a priori, or (better) as
>
>  */jcr:root/people/element(*,nt:base)/name[@**last='Smith'**]*
>
>  The second notation uses the XPATH 2.0 element() function, that allows to
>  select nodes of a specific type  (or of a type that is inherited from the
>  type). In XML, it uses the schema element name, in JCR, the node type.
>
>  If we were using custom node types, and let's assume that we do from now on,
>  then the JCR query above could have been written more specifically as:
>
>  */jcr:root/people/element(*,my:employee)/my:name[@first='John'] *
>
>  assuming a simple CND such as:
>
>  [my:name] > nt:base
>   - first: string
>   - last:  string
>  [my:dob]  > nt:base
>   - value: string
>
>  [my:employee] > nt:base
>   + my:name = my:name
>   + my:dob = my:dob
>
>  Incidentally, custom node types are quite essentials when we could have
>  several different types of nodes under 'people', for instance *my:employee*and
>  *my:freelancer*:
>
>  If I didn't have node types, and I wanted to find all the Smiths that are
>  not freelancer, and not having access to the parent axis (it's not required
>  in JCR), I would have to do:
>
>  */jcr:root/people/*/my:name[@**last='Smith'**]
>
>  *and then, in Java, find out which one has not a freelancer parent. Besides
>  being tedious, it could be very inefficient.
>
>  In traditional XLM modelling and querying, and unless we wanted to take
>  advantage of inheritance, this would not be needed because the XPATH itself
>  would allow distinguishing between the cases:
>
>  */jcr:root/people/my:employee/my:name[@**last='Smith'**] *
>
>  Of course, in both cases (JCR and XML) I could structure my data better and
>  separate employees from freelancers under different nodes (*my:employees*and
>  *my:freelancers*), and I would not have this problem; at the same time, when
>  you can have multiple criteria, orthogonal or not, it becomes quite complex
>  to choose which one is the best to be "hardwired" in the structure (what
>  about male/female, working/retired, etc).
>
>  The choice of what is driving the hierarchy and what is instead an attribute
>  (or a property) sometimes is not obvious, and often turns out to be not the
>  right one (when it's too late, typically...).
>
>  The choice of viewing JCR structures as XML is not a side effect, it's part
>  of the JCR specs, where it says that an XPATH query is run  against the
>  virtual XML document (6.6.4.10 and others). And an XML Document View is the
>  *normal *way to look at the data as XML. (Of course, System View is the one
>  to be used for round tripping, I know...).
>
>  At the same time, as we see, this special relationship that JCR has with XML
>  should not used to inspire the model, because SNS are complex to handle, and
>  therefore nodes should have as name a parent-unique "id" and not their
>  "type", and the element(*,my:type) function should be used wherever I really
>  intend to select by the type of the node.
>
>  Because of this, it is not unusual to have to write queries such as
>
>  *//element(*,my:type)/element(*,my:other-type)[element(*,my:last-type)] *
>
>  instead of
>
>  *//my:type/my:other-type[my:last-type]*
>
>  and this assuming that every node is strictly typed, which is not always
>  desirable or possible.
>
>  As another use case, in my application (yes, who cares?), XSLT stylesheets
>  can access the repository by using a (RESTful) type of query that is
>  expressed in JCR/XPATH, and they can work with the resulting document using
>  XML/XPATH. This means that for instance, if my node's XML representation
>  URI is (for instance)
>
>  *http://localhost/jcr/default/blogs/2008/myfirstpost/blog*
>
>  and the resulting document is:
>
>  <blog>
>   <headline>test</headline>
>   <body>
>    <p>first paragraph</p>
>    <p>second paragraph</p>
>   </body>
>  </blog>
>
>  The nice things is that it's possible to use for instance
>
>  *http://localhost/jcr/default/blogs/2008/myfirstpost/blog/headline *
>
>  to get only the headline, or even:
>
>  *http://localhost/jcr/default/blogs/2008/*/blog/headline *
>
>  to get all the headlines in 2008.
>
>  What I could not do, if SNS were not there, is:
>
>  *http://localhost/jcr/default/blogs/2008/myfirstpost/blog/body/p[1]*
>
>  to get the first paragraph on my blog, or *
>  http://localhost/jcr/default/blogs/2008/*/blog/body/p[1]* to get all the
>  first paragraphs in all post in 2008. So, even when nodes have unique names
>  ('*myfirstpost'*), at a certain level 'below'  same name siblings in the
>  form of tags are likely to appear, and it's a nice thing, because it allows
>  a seamless transition from the URI of a node representation as it is seen on
>  the server to the URI of the element that is being processed. In other
>  words, the URI space is continuous.
>
>  Still, the dilemma remains: why in JCR modeling is best practice to name
>  nodes with their contents, and in XML with their types?
>
>  What I wonder is if it would not be a good idea to* introduce another type
>  of Document View *(let's call it Normal View for  now), where *node types
>  are element names*, *properties are still attributes*, *and a jcr:name
>  pseudo-attribute is added* *(instead of jcr:primaryType) to represent the
>  node name.
>
>  *In this case, I could write my query with 'old style' XPATH 1.0 (minus of
>  course the order by), XML could still be used to inspire the model and SNS
>  would be avoided. And, I believe, queries would be both simpler and would
>  make more sense to XML developers, to the point that it would be easier to
>  migrate an XML centric application in the JCR model (with some caveats, of
>  course)
>
>  With this feature, the JCR structure above could be queried with XPATH
>  against it's virtual Normal View (in addition to the Document View):
>
>  <people>
>   <my:employee jcr:name="john.smith">
>     <my:name first="John" last="Smith"/>
>     <my:dob value="10/01/1970">
>   </my:employee>
>   <my:employee jcr:name="john.smith">
>     <my:name first="Mary" last="Smith"/>
>     <my:dob value="11/07/1973">
>   </my:employee>
>  </people>
>
>  so, for instance, i could write:
>
>  *//people//my:employee[2]/my:name* as an XPATH expression for the Normal
>  View to find my second employee,
>  *//people//my:employee[@id='john.smith']/my:dob* to find when the employee
>  (not the freelancer) with id john.smith was born
>
>  And what if no nodetypes are defined? Then the regular Document View based
>  JCR/XPATH would be probably better suited, as the intent of the alternative
>  Normal View is to express queries using XML style XPATH for nodes that are
>  typed, and to disambiguate the way that XML documents are seen once imported
>  in the JCR.
>
>  So what about importing and exporting this view?
>
>  In the JCR paradigm, or at least in Jackrabbit, importing XML (that is not
>  generated by a System View export) means to map each element to a node, each
>  attribute to a property. If the element does not have a jcr:primaryType
>  attribute, then the element is created as nt:unstructured, the attributes as
>  string and XML text nodes are created  as jcr:xmltext children with a single
>  property of type string (jcr:xmlcharacters). If instead the jcr:primaryType
>  attribute is present, then Jackrabbit tries to map the XML to the
>  corresponding nodetype, throwing an exception if it can't (for instance
>  because of a conflicting structure).
>
>  So, in addition to this behavior during import, another one could be
>  introduced:
>
>  *During import, each element that has a property jcr:name would be created
>  as a node with name equal to the value of jcr:name, and with a node type
>  equal to the element's name. If the node type is not present, it could be
>  either created on the fly (as inherited fom nt:unstructured), or an
>  exception could be thrown. Similarly, if an element does not have a
>  jcr:name, or has a jcr:name identical to a sibling, an exception could be
>  thrown, or a new id could be assigned silently.
>  *
>  For export, a new method exportNormalView() could be added to the already
>  present exportDocumentView() and exportSystemView() and would export a
>  materialized view of the virtual Normal View.
>
>  In this way, importing XML in the repository would not create SNS, the
>  element() function would be needed only when the type inheritance hierarchy
>  needs to be evaluated, and, most important, people would not be confused
>  anymore with modeling "the XML way" vs "the JCR way".
>
>  Finally, the technical question. Is there a simple way to extend the XPATH
>  parser to handle this type of queries? Or, has anybody had any experience
>  plugging Jaxen in the JCR? Everything else seems to be a pretty
>  straightforward thing to implement, even if just to see how it behaves in
>  the real world.
>
>  Of course, any thought, even an utterly critical thought, is welcome.
>  Alessandro
>

Re: XML, SNS, and JCR

Posted by Alessandro Bologna <al...@gmail.com>.
Thanks Marcel,

sounds easy enough :) , will give it a try and let you know.

Alessandro

On Tue, Apr 1, 2008 at 5:35 AM, Marcel Reutegger <ma...@gmx.net>
wrote:

> Alessandro Bologna wrote:
> > Finally, the technical question. Is there a simple way to extend the
> XPATH
> > parser to handle this type of queries?
>
> you might want to try this:
>
> - create your own XPath QueryBuilder, which extends from
> org.apache.jackrabbit.spi.commons.query.xpath.QueryBuilder
> - overwrite createQueryTree() and transform the query tree built by the
> base class.
> - IIUC you'd have to modify every LocationStepQueryNode and do the
> following:
>        - take nameTest and create a NodeTypeQueryNode (or rather a
>           RelationQueryNode for jcr:primaryType?) with that name
>        - add the NodeTypeQueryNode to the LocationStepQueryNode as a
> predicate
>        - set nameTest to null in LocationStepQueryNode
>
> done ;)
>
> regards
>   marcel
>

Re: XML, SNS, and JCR

Posted by Marcel Reutegger <ma...@gmx.net>.
Alessandro Bologna wrote:
> Finally, the technical question. Is there a simple way to extend the XPATH
> parser to handle this type of queries?

you might want to try this:

- create your own XPath QueryBuilder, which extends from 
org.apache.jackrabbit.spi.commons.query.xpath.QueryBuilder
- overwrite createQueryTree() and transform the query tree built by the base class.
- IIUC you'd have to modify every LocationStepQueryNode and do the following:
	- take nameTest and create a NodeTypeQueryNode (or rather a
           RelationQueryNode for jcr:primaryType?) with that name
	- add the NodeTypeQueryNode to the LocationStepQueryNode as a predicate
	- set nameTest to null in LocationStepQueryNode

done ;)

regards
  marcel