You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by Daniel John Debrunner <dj...@apache.org> on 2006/08/29 05:33:28 UTC

XML type description - possible changes?

The XML type has this description in the reference manual:

http://db.apache.org/derby/docs/dev/ref/rrefsqljtypexml.html

"An XML column is used to store Unicode character-based data, such as
large documents in any character set, that conform to the SQL/XML
definition of a well-formed XML(DOCUMENT(ANY)) value.

An XML value can be up to 2,147,483,647 characters long"

That seems wrong to me, the use of "store Unicode character-based", any
implementation details are not important here. Also the use of "large"
is not relevant, what about small XML documents?

Looking at the other type definitions it seems something like this would
match, though it seems unwieldy.

"XML provides for storage of Extensible Markup Language (XML) documents,
that conform to the SQL/XML definition of a well-formed
XML(DOCUMENT(ANY)) value."

Also I'm not sure about the character limitation on the XML value, is
that the limit on the input string to the document, or its final
storage? Not sure describing the final stored limit in terms of
characters makes sense here.

Dan.

Re: XML type description - possible changes?

Posted by "Jean T. Anderson" <jt...@bristowhill.com>.

This sounds like a larger style issue that should be tackled post 10.2.
In the meantime, let's be consistent with current usage.

 -jean


Laura Stewart wrote:
> On 8/29/06, Jean T. Anderson <jt...@bristowhill.com> wrote:
> 
>> Laura Stewart wrote:
>> ...
>> > FYI - I would prefer to keep the phrase "data type" in the title and
>> > first paragraph of this file as a qualifier.  In the near future (post
>> > 10.2) I intend to update all of the other files that describe data
>> > types to add this qualifier.  There are many files in Derby that don't
>> > have qualifiers in the titles and it makes locating the correct info
>> > difficult for Derby users.
>>
>> I don't agree with this change. I wouldn't like to see everything
>> expanded -- "BLOB data type", "Date data type", etc.
>>
>> The XML data type appears in the "Data Types" chapter -- I haven't heard
>> any users complain that this isn't clear. But I may have missed some
>> posts. Could you include a link to the posts that showed the difficulty?
>>
>> thanks,
>>
>>  -jean
> 
> 
> One of the benefits of using DITA is that the documentation is
> "componentized" into "topics".
> There are several benefits to this:
> -- Combined with the information from other products
> -- Alternative ways to view the information.
> 
> COMBINED INFO
> When another products are combined with Derby, the information for
> those products can  be combined with the information for Derby. It is
> important that the titles of the topics state clearly what the topic
> is about.  When users look at a Table of Contents or Navigation Tree
> to find information, the qualifiers help users find the information
> that they want more quickly.  Adding the qualifiers also helps  people
> for whom English is not their native language. In this case, we are
> describing different data types and that qualifier should be applied
> to the title.  What if there is a topic entitled XML which discusses
> the data type, and another topic entitled XML which describes an
> overview of the Extensible Markup Language, and a third topic entitled
> XML which describes what "well-formed" XML documents are.  In this
> example 3 topics with the same title is confusing. And while the
> topics appear in different sections in a book (which might help
> distinguish the topics), that implied qualifer doesn't help when the
> information is viewed in a different way (see below).
> 
> METHODS OF VIEWING INFO
> Another advantage of DITA is the ability to view the documentation in
> different ways.  It can be viewed in the traditional PDF/Book format
> or it can be viewed in non-traditional ways, such as is categories of
> information or by subject matter (such as all troubleshooting info).
> Viewing information in categoies is often the way information is
> organized in Information Centers. In Info Centers, the organization of
> the information is often not in the sequential book format.  In these
> situations, the implied qualifier of the chapter or section title is
> always present for a topic.
> 
> Bottom line. Topics need to be able to stand alone. Their titles
> should be clear and concise and they should have links to the
> appropriate information.
> 
> So I don' t have any links to specific Derby posts about this. I am
> just explaining what the trend is in technical writing and
> information.
>

Re: XML type description - possible changes?

Posted by Laura Stewart <sc...@gmail.com>.

On 8/29/06, Jean T. Anderson <jt...@bristowhill.com> wrote:
> Laura Stewart wrote:
> ...
> > FYI - I would prefer to keep the phrase "data type" in the title and
> > first paragraph of this file as a qualifier.  In the near future (post
> > 10.2) I intend to update all of the other files that describe data
> > types to add this qualifier.  There are many files in Derby that don't
> > have qualifiers in the titles and it makes locating the correct info
> > difficult for Derby users.
>
> I don't agree with this change. I wouldn't like to see everything
> expanded -- "BLOB data type", "Date data type", etc.
>
> The XML data type appears in the "Data Types" chapter -- I haven't heard
> any users complain that this isn't clear. But I may have missed some
> posts. Could you include a link to the posts that showed the difficulty?
>
> thanks,
>
>  -jean

One of the benefits of using DITA is that the documentation is
"componentized" into "topics".
There are several benefits to this:
-- Combined with the information from other products
-- Alternative ways to view the information.

COMBINED INFO
When another products are combined with Derby, the information for
those products can  be combined with the information for Derby. It is
important that the titles of the topics state clearly what the topic
is about.  When users look at a Table of Contents or Navigation Tree
to find information, the qualifiers help users find the information
that they want more quickly.  Adding the qualifiers also helps  people
for whom English is not their native language. In this case, we are
describing different data types and that qualifier should be applied
to the title.  What if there is a topic entitled XML which discusses
the data type, and another topic entitled XML which describes an
overview of the Extensible Markup Language, and a third topic entitled
XML which describes what "well-formed" XML documents are.  In this
example 3 topics with the same title is confusing. And while the
topics appear in different sections in a book (which might help
distinguish the topics), that implied qualifer doesn't help when the
information is viewed in a different way (see below).

METHODS OF VIEWING INFO
Another advantage of DITA is the ability to view the documentation in
different ways.  It can be viewed in the traditional PDF/Book format
or it can be viewed in non-traditional ways, such as is categories of
information or by subject matter (such as all troubleshooting info).
Viewing information in categoies is often the way information is
organized in Information Centers. In Info Centers, the organization of
the information is often not in the sequential book format.  In these
situations, the implied qualifier of the chapter or section title is
always present for a topic.

Bottom line. Topics need to be able to stand alone. Their titles
should be clear and concise and they should have links to the
appropriate information.

So I don' t have any links to specific Derby posts about this. I am
just explaining what the trend is in technical writing and
information.

-- 
Laura Stewart

Re: XML type description - possible changes?

Posted by "Jean T. Anderson" <jt...@bristowhill.com>.

Laura Stewart wrote:
...
> FYI - I would prefer to keep the phrase "data type" in the title and
> first paragraph of this file as a qualifier.  In the near future (post
> 10.2) I intend to update all of the other files that describe data
> types to add this qualifier.  There are many files in Derby that don't
> have qualifiers in the titles and it makes locating the correct info
> difficult for Derby users.

I don't agree with this change. I wouldn't like to see everything
expanded -- "BLOB data type", "Date data type", etc.

The XML data type appears in the "Data Types" chapter -- I haven't heard
any users complain that this isn't clear. But I may have missed some
posts. Could you include a link to the posts that showed the difficulty?

thanks,

 -jean

Re: XML type description - possible changes?

Posted by Laura Stewart <sc...@gmail.com>.

On 8/29/06, Daniel John Debrunner <dj...@apache.org> wrote:
> Army wrote:
> > Daniel John Debrunner wrote:
> >
> > <snip old XML type text>
> >
> >>
> >> That seems wrong to me, the use of "store Unicode character-based", any
> >> implementation details are not important here. Also the use of "large"
> >> is not relevant, what about small XML documents?
> >
> >
> > I have to admit, I'm guilty of copy-paste here.  The current
> > documentation for the CLOB type has the following sentence, which is the
> > root of the questions raised in this email:
> >
> > <begin quote>
> >
> > A CLOB (character large object) value can be up to 2,147,483,647
> > characters long. A CLOB is used to store unicode character-based data,
> > such as large documents in any character set.
> >
> > <end quote>
> >
> > I copied that as my starting point and failed to clean it up...sorry.
> >
> >> Looking at the other type definitions it seems something like this would
> >> match, though it seems unwieldy.
> >>
> >> "XML provides for storage of Extensible Markup Language (XML) documents,
> >> that conform to the SQL/XML definition of a well-formed
> >> XML(DOCUMENT(ANY)) value."
> >
> >
> > I agree, I think this is a better wording.  My one reservation is that
> > the XML type can also be used transiently for XML values that are not
> > well-formed documents.  In particular, the XMLQUERY operator returns a
> > value of type XML that is not guaranteed to be XML(DOCUMENT(ANY)).
> >
> > So while I agree with the new text proposal, I think it'd be good (or at
> > least, more accurate) to mention that XML is not restricted to "storage
> > of ... well-formed XML(DOCUMENT(ANY)) values"; it also provides for
> > transient use of XML(SEQUENCE) values, which may or may not be
> > well-formed XML(DOCUMENT(ANY)) values.
>
> I agree with your reservation. It can also be applied to most, if not
> all of the data type decriptions. Many talk about "providing storage"
> but they can all be used for transient types.
>
> Dan.
>
>
>

Another way to word the description is:

The XML data type is used to store an internal representation of
Extensible Markup Language (XML) documents. The documents in an XML
column must conform to the SQL/XML definition of well-formed XML
documents.

FYI - I would prefer to keep the phrase "data type" in the title and
first paragraph of this file as a qualifier.  In the near future (post
10.2) I intend to update all of the other files that describe data
types to add this qualifier.  There are many files in Derby that don't
have qualifiers in the titles and it makes locating the correct info
difficult for Derby users.

-- 
Laura Stewart

Re: XML type description - possible changes?

Posted by Daniel John Debrunner <dj...@apache.org>.

Army wrote:
> Daniel John Debrunner wrote:
> 
> <snip old XML type text>
> 
>>
>> That seems wrong to me, the use of "store Unicode character-based", any
>> implementation details are not important here. Also the use of "large"
>> is not relevant, what about small XML documents?
> 
> 
> I have to admit, I'm guilty of copy-paste here.  The current
> documentation for the CLOB type has the following sentence, which is the
> root of the questions raised in this email:
> 
> <begin quote>
> 
> A CLOB (character large object) value can be up to 2,147,483,647
> characters long. A CLOB is used to store unicode character-based data,
> such as large documents in any character set.
> 
> <end quote>
> 
> I copied that as my starting point and failed to clean it up...sorry.
> 
>> Looking at the other type definitions it seems something like this would
>> match, though it seems unwieldy.
>>
>> "XML provides for storage of Extensible Markup Language (XML) documents,
>> that conform to the SQL/XML definition of a well-formed
>> XML(DOCUMENT(ANY)) value."
> 
> 
> I agree, I think this is a better wording.  My one reservation is that
> the XML type can also be used transiently for XML values that are not
> well-formed documents.  In particular, the XMLQUERY operator returns a
> value of type XML that is not guaranteed to be XML(DOCUMENT(ANY)).
> 
> So while I agree with the new text proposal, I think it'd be good (or at
> least, more accurate) to mention that XML is not restricted to "storage
> of ... well-formed XML(DOCUMENT(ANY)) values"; it also provides for
> transient use of XML(SEQUENCE) values, which may or may not be
> well-formed XML(DOCUMENT(ANY)) values.

I agree with your reservation. It can also be applied to most, if not
all of the data type decriptions. Many talk about "providing storage"
but they can all be used for transient types.

Dan.

Re: XML type description - possible changes?

Posted by Army <qo...@gmail.com>.

Daniel John Debrunner wrote:

<snip old XML type text>
> 
> That seems wrong to me, the use of "store Unicode character-based", any
> implementation details are not important here. Also the use of "large"
> is not relevant, what about small XML documents?

I have to admit, I'm guilty of copy-paste here.  The current documentation for 
the CLOB type has the following sentence, which is the root of the questions 
raised in this email:

<begin quote>

A CLOB (character large object) value can be up to 2,147,483,647 characters 
long. A CLOB is used to store unicode character-based data, such as large 
documents in any character set.

<end quote>

I copied that as my starting point and failed to clean it up...sorry.

> Looking at the other type definitions it seems something like this would
> match, though it seems unwieldy.
> 
> "XML provides for storage of Extensible Markup Language (XML) documents,
> that conform to the SQL/XML definition of a well-formed
> XML(DOCUMENT(ANY)) value."

I agree, I think this is a better wording.  My one reservation is that the XML 
type can also be used transiently for XML values that are not well-formed 
documents.  In particular, the XMLQUERY operator returns a value of type XML 
that is not guaranteed to be XML(DOCUMENT(ANY)).

So while I agree with the new text proposal, I think it'd be good (or at least, 
more accurate) to mention that XML is not restricted to "storage of ... 
well-formed XML(DOCUMENT(ANY)) values"; it also provides for transient use of 
XML(SEQUENCE) values, which may or may not be well-formed XML(DOCUMENT(ANY)) values.

> Also I'm not sure about the character limitation on the XML value, is
> that the limit on the input string to the document, or its final
> storage?

My impression is that this is the limit on the final storage--and that limit is, 
so far as I know, a Derby limitation, not a SQL/XML one.  At least, that's what 
I assumed based on the fact that CLOBs in Derby have the same hard limit.  I 
admit I could be wrong, here...please correct me if needed.

> Not sure describing the final stored limit in terms of
> characters makes sense here.

Agreed.  Especially since it's possible that the way in which Derby stores XML 
could change in the future for better performance and/or extended functionality 
(XML indexing comes to mind).  So no, we probably should not be describing the 
stored limit in terms of characters.  I think that line could be removed entirely.

Should I add this info to the Reference Guide documentation wiki?

http://wiki.apache.org/db-derby/ReferenceManualTenTwo#sqlreference

Or would you like to? :)

Thanks for taking the time to buddy test the XML doc and features; I appreciate 
the feedback...

Army