You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Greg Kick <gk...@kickstyle.net> on 2006/04/15 05:41:50 UTC

residual definition xml implementation

hello,

i have a quick question about node type definitions.  jackrabbit  
consistently (NodeTypeWriter, custom_nodetypes.xml, etc.) uses  
name="*" for residual definitions in the xml format.  however, p. 144  
of the jsr170 spec says that * is not a valid name and the attribute  
should be dropped for residual definitions.  so the question is  
whether it was used on purpose in jackrabbit or if this is a  
mistake.  i haven't looked at the code much, but it seems like it  
would actually be harder to implement with the * than without...   
anyway, a little insight would be appreciated because googleing the  
archives gave me nothing.

thanks,

greg kick
http://kickstyle.net/




Re: residual definition xml implementation

Posted by Greg Kick <gk...@kickstyle.net>.
hi,

On Apr 15, 2006, at 3:05 PM, Jukka Zitting wrote:

> Hi,
>
> On 4/15/06, Greg Kick <gk...@kickstyle.net> wrote:
>> well, i'm glad that jackrabbit behaves correctly internally.  the
>> reason i brought this up originally was because NodeTypeWriter
>> creates documents with the *.  i had wanted to create an xml schema
>> to check that a node definition was valid (i needed xml, not cnd).
>> so, i figured that i would do it by using xslt to transform the
>> definitions outputted by NodeTypeWriter into a schema, but since *
>> isn't a valid NCName it would fail. so the question is, if the *
>> isn't used internally, why is it reintroduced in the xml output?
>
> The reasons I outlined before apply to the node type xml format as
> well: 1) the JCR spec uses "*" as the name of residual definitions,
> and 2) the internal ItemDef.ANY_NAME matches "*" in string format.
>

I think that that is actually the crux right there.  When you say  
that the JCR spec uses "*" as the name, that's true, but only if it's  
the string representation in cases like ItemDefinition.getName().   
This seems to me to say since residual definitions have no name  
property (of type NAME), return "*".  And this has obvious advantages  
over returning null or something.  So with that, we're on the same  
page.  And further, I hadn't realized at first glance that  
jackrabbit's QName wasn't the standard java implementation, so that  
allows for some latitude even though QName.checkFormat 
(ItemDef.ANY_NAME.getLocalName()) still fails.

I just think that there needs to be a clearer differentiation between  
when the name is a NAME (from the spec) and when it is the String (in  
java) representation of that NAME.  I've spend so long debating this  
mostly because it seems that hacks like the QName misnomer and your  
xsl catch will continue to propagate throughout code for both  
jackrabbit and client apps if it isn't consistent.  Even right now,  
the name shows up in the NodeTypeWriter output as a  *, but not in  
the xml document or system views.  And we've identified that behavior  
throughout this discussion, but does it make sense that the  
marshaling mechanisms should give different results???  or results of  
different types (xs:string vs. xs:QName)???

<rant>We're now trying to juggle two different representations of a  
single concept as a java String, java QName, xml string and xml QName  
in an implementation that doesn't really make a clear distinction  
when it is going to be using which!</rant>

> Can you use something like the following in your XSLT to cover case:
>
>     <xsl:template match="propertyDefinition[@name='*']">
>         <!-- handle residual definition -->
>     </xsl:template>
>     <xsl:template match="propertyDefinition">
>         <!-- handle named definition -->
>     </xsl:template>
>
> + the same for childNodeDefinition.

And that's why your stylesheet isn't an ideal solution.  Where I  
would have been able to transform the definition into a schema that  
would check that the name property is an xs:QName (after being  
escaped per 6.4.3), this defers handling that property until it is  
already part of a specific propertyDefinition or childNodeDefinition  
element.  (instead of validating a general nt:NodeType, i'm  
validating each specific type)  However, since the document view  
gives a more type-safe, intuitive version, I think that I'll opt to  
use that instead.  Actually, as an aside, why is it a separate  
representation anyway?

>
>> can this be considered a bug or just a design choice i don't agree
>> with? :-)  i'm even willing to do the work to remove it and submit a
>> diff, but i don't want to go through all of the effort if there is a
>> reason it is being used here...
>
> The main reason for keeping the current internal implementation (using
> ItemDef.ANY_NAME) over an alternative is that the current design is
> proven to work and there is no compelling enough reason to change it.
>
> As for the node type XML format, I wouldn't change that even if the
> alternative design (not having a name attribute in residual definition
> elements) was considered better or more correct. The current format
> works and changing it would break backwards compatibility of the
> JackrabbitNodeTypeManager.registerNodeTypes methods.

As a final thought, I come into this as someone who hasn't coded any  
portion of the project, but has read the spec many times.  So, while  
I have a pretty solid understanding, I lack the pragmatic  
perspective.  But, I really would hate to see it all get more  
confusing as the project progresses just because opting against some  
clarification now was protecting the backwards compatibility of a  
project just out of the incubator.

Anyway, I rest my case and thanks for the discussion regardless.

>
> BR,
>
> Jukka Zitting
>
> --
> Yukatan - http://yukatan.fi/ - info@yukatan.fi
> Software craftsmanship, JCR consulting, and Java development

greg kick
http://kickstyle.net/



Re: residual definition xml implementation

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 4/15/06, Greg Kick <gk...@kickstyle.net> wrote:
> well, i'm glad that jackrabbit behaves correctly internally.  the
> reason i brought this up originally was because NodeTypeWriter
> creates documents with the *.  i had wanted to create an xml schema
> to check that a node definition was valid (i needed xml, not cnd).
> so, i figured that i would do it by using xslt to transform the
> definitions outputted by NodeTypeWriter into a schema, but since *
> isn't a valid NCName it would fail. so the question is, if the *
> isn't used internally, why is it reintroduced in the xml output?

The reasons I outlined before apply to the node type xml format as
well: 1) the JCR spec uses "*" as the name of residual definitions,
and 2) the internal ItemDef.ANY_NAME matches "*" in string format.

Can you use something like the following in your XSLT to cover case:

    <xsl:template match="propertyDefinition[@name='*']">
        <!-- handle residual definition -->
    </xsl:template>
    <xsl:template match="propertyDefinition">
        <!-- handle named definition -->
    </xsl:template>

+ the same for childNodeDefinition.

> can this be considered a bug or just a design choice i don't agree
> with? :-)  i'm even willing to do the work to remove it and submit a
> diff, but i don't want to go through all of the effort if there is a
> reason it is being used here...

The main reason for keeping the current internal implementation (using
ItemDef.ANY_NAME) over an alternative is that the current design is
proven to work and there is no compelling enough reason to change it.

As for the node type XML format, I wouldn't change that even if the
alternative design (not having a name attribute in residual definition
elements) was considered better or more correct. The current format
works and changing it would break backwards compatibility of the
JackrabbitNodeTypeManager.registerNodeTypes methods.

BR,

Jukka Zitting

--
Yukatan - http://yukatan.fi/ - info@yukatan.fi
Software craftsmanship, JCR consulting, and Java development

Re: residual definition xml implementation

Posted by Greg Kick <gk...@kickstyle.net>.
well, i'm glad that jackrabbit behaves correctly internally.  the  
reason i brought this up originally was because NodeTypeWriter  
creates documents with the *.  i had wanted to create an xml schema  
to check that a node definition was valid (i needed xml, not cnd).   
so, i figured that i would do it by using xslt to transform the  
definitions outputted by NodeTypeWriter into a schema, but since *  
isn't a valid NCName it would fail. so the question is, if the *  
isn't used internally, why is it reintroduced in the xml output?

can this be considered a bug or just a design choice i don't agree  
with? :-)  i'm even willing to do the work to remove it and submit a  
diff, but i don't want to go through all of the effort if there is a  
reason it is being used here...

greg kick
http://kickstyle.net/

On Apr 15, 2006, at 2:06 AM, Jukka Zitting wrote:

> Hi,
>
> On 4/15/06, Greg Kick <gk...@kickstyle.net> wrote:
>> your response actually outlines the reason i brought it up.  although
>> the spec uses the * notation in its definitions, it pretty clearly
>> states:
>
> You are right. However, the "*" in the item definitions is rather
> treated as a special marker than a real name. There are predicates
> like def.getName().equals(ItemDef.ANY_NAME) and the more "correct"
> def.definesResidual() in Jackrabbit sources that explicitly decide
> whether a definition is residual or not, and if it is, then the name
> is not used for anything else.
>
>> further, it seems that the only reason that new QName("", "*")
>> doesn't fail with an exception is that "The local part is not
>> validated as a NCName as specified in Namespaces in XML" as stated in
>> the javadoc for QName.
>
> I kind of agree with you here; having an invalid QName instance feels
> a bit troublesome even if it is just a marker constant. Enforcing the
> use of ItemDef.definesResidual() throughout Jackrabbit sources would
> allow us to hide the QName.ANY_NAME constant and allow us to use some
> other way to identify residual definitions (a null name for example).
>
>> specifically, the fact that many node type definitions aren't valid
>> under nt:nodeType is worrisome.
>
> I just doublechecked that Jackrabbit is conformant in that there is no
> jcr:name property in residual item definitions under
> /jcr:system/jcr:nodeTypes. You can find definesResidual() calls in
> VirtualNodeTypeStateProvider guarding whether the jcr:name properties
> are exposed in item definition nodes. So this is already being taken
> care of.
>
> BR,
>
> Jukka Zitting
>
> --
> Yukatan - http://yukatan.fi/ - info@yukatan.fi
> Software craftsmanship, JCR consulting, and Java development


Re: residual definition xml implementation

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 4/15/06, Greg Kick <gk...@kickstyle.net> wrote:
> your response actually outlines the reason i brought it up.  although
> the spec uses the * notation in its definitions, it pretty clearly
> states:

You are right. However, the "*" in the item definitions is rather
treated as a special marker than a real name. There are predicates
like def.getName().equals(ItemDef.ANY_NAME) and the more "correct"
def.definesResidual() in Jackrabbit sources that explicitly decide
whether a definition is residual or not, and if it is, then the name
is not used for anything else.

> further, it seems that the only reason that new QName("", "*")
> doesn't fail with an exception is that "The local part is not
> validated as a NCName as specified in Namespaces in XML" as stated in
> the javadoc for QName.

I kind of agree with you here; having an invalid QName instance feels
a bit troublesome even if it is just a marker constant. Enforcing the
use of ItemDef.definesResidual() throughout Jackrabbit sources would
allow us to hide the QName.ANY_NAME constant and allow us to use some
other way to identify residual definitions (a null name for example).

> specifically, the fact that many node type definitions aren't valid
> under nt:nodeType is worrisome.

I just doublechecked that Jackrabbit is conformant in that there is no
jcr:name property in residual item definitions under
/jcr:system/jcr:nodeTypes. You can find definesResidual() calls in
VirtualNodeTypeStateProvider guarding whether the jcr:name properties
are exposed in item definition nodes. So this is already being taken
care of.

BR,

Jukka Zitting

--
Yukatan - http://yukatan.fi/ - info@yukatan.fi
Software craftsmanship, JCR consulting, and Java development

Re: residual definition xml implementation

Posted by Greg Kick <gk...@kickstyle.net>.
your response actually outlines the reason i brought it up.  although  
the spec uses the * notation in its definitions, it pretty clearly  
states:

"...to indicate that a property or child node definition is residual,  
the value returned by ItemDefinition.getName() is “*”. However, “*”  
is not a valid value for the property jcr:name in a
nt:propertyDefinition or nt:childNodeDefinition node (because  
jcr:name it is a NAME property, not a STRING).  As a result, an in- 
content definition of a residual item will simply not have a jcr:name  
property."

so, i would refute your #1 by saying that although the JCR spec is  
littered with *s for readability, it quite specifically doesn't imply  
the use of "*" as a name because it simply isn't of type name.  in  
fact, the ebnf starting on page 67 states that a onecharsimplename,  
the only applicable non-terminal for a one character name, is "(* Any  
Unicode character except: '.', '/', ':', '[', ']', '*', ''', '"', '|'  
or any whitespace character *)".  so i would say that it is pretty  
obvious that * is not used as a valid value for that attribute.

further, it seems that the only reason that new QName("", "*")  
doesn't fail with an exception is that "The local part is not  
validated as a NCName as specified in Namespaces in XML" as stated in  
the javadoc for QName.  If it were, it would have to be either a  
"Letter" as defined by http://www.w3.org/TR/REC-xml/ or an '_'.   
again, it doesn't qualify.

so i guess that my point would be that i agree that neither is  
clearer, but i definitely question the validity.  specifically, the  
fact that many node type definitions aren't valid under nt:nodeType  
is worrisome.

now, of course this isn't detrimental, but if future revisions opt to  
start type-checking in appropriate places, it could be a mess.  but  
if there is a better reason for the * than it was chosen to be  
implemented as such, i could certainly reconsider my position.

thanks for hearing me out,

greg kick
http://kickstyle.net/



On Apr 15, 2006, at 12:38 AM, Jukka Zitting wrote:

> Hi,
>
> On 4/15/06, Greg Kick <gk...@kickstyle.net> wrote:
>> i have a quick question about node type definitions.  jackrabbit
>> consistently (NodeTypeWriter, custom_nodetypes.xml, etc.) uses
>> name="*" for residual definitions in the xml format.  however, p. 144
>> of the jsr170 spec says that * is not a valid name and the attribute
>> should be dropped for residual definitions.  so the question is
>> whether it was used on purpose in jackrabbit or if this is a
>> mistake.  i haven't looked at the code much, but it seems like it
>> would actually be harder to implement with the * than without...
>> anyway, a little insight would be appreciated because googleing the
>> archives gave me nothing.
>
> It is arguable whether name="*" or no name attribute is clearer, but
> both are valid solutions as there is no fear of collision between "*"
> and any valid property name. I suppose the main reasons for using "*"
> is that 1) the JCR specification uses "*" as the "name" of residual
> definitions, and that 2) the Jackrabbit internals use the
> ItemDef.ANY_NAME constant (defined as: new QName("", "*")) to identify
> residual definitions.
>
> BR,
>
> Jukka Zitting
>
> --
> Yukatan - http://yukatan.fi/ - info@yukatan.fi
> Software craftsmanship, JCR consulting, and Java development


Re: residual definition xml implementation

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 4/15/06, Greg Kick <gk...@kickstyle.net> wrote:
> i have a quick question about node type definitions.  jackrabbit
> consistently (NodeTypeWriter, custom_nodetypes.xml, etc.) uses
> name="*" for residual definitions in the xml format.  however, p. 144
> of the jsr170 spec says that * is not a valid name and the attribute
> should be dropped for residual definitions.  so the question is
> whether it was used on purpose in jackrabbit or if this is a
> mistake.  i haven't looked at the code much, but it seems like it
> would actually be harder to implement with the * than without...
> anyway, a little insight would be appreciated because googleing the
> archives gave me nothing.

It is arguable whether name="*" or no name attribute is clearer, but
both are valid solutions as there is no fear of collision between "*"
and any valid property name. I suppose the main reasons for using "*"
is that 1) the JCR specification uses "*" as the "name" of residual
definitions, and that 2) the Jackrabbit internals use the
ItemDef.ANY_NAME constant (defined as: new QName("", "*")) to identify
residual definitions.

BR,

Jukka Zitting

--
Yukatan - http://yukatan.fi/ - info@yukatan.fi
Software craftsmanship, JCR consulting, and Java development