You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by Mike Matrigali <mi...@sbcglobal.net> on 2007/04/03 08:22:56 UTC

some comments on collation wiki page

I have included some comments based on th following wiki page:

 > 6)Store needs a way to determine the collation type for a given DVD. 
This collation type will then be saved in the column metadata. Provide 
the api on DVD to return the correct collation type.
     Just an addition here.  The way I expect this info to get down to store
     is that the template passed in when creating a conglomerate will have
     DVD's that have the correct collation id associated with them.
     This should be true for both btree and heap conglomerate creates, you
     should update iapi interface doc, but I don't think the actual args
     to the interface change.

     I assume the api is something like int getCollateId() on a DVD.

 >
 > 7)BasicDatabase needs to set the Locale value in the DVF after DVF 
has booted. Probably with an api like DVF.setLocale(Locale). DVF will 
use this Locale object to construct the correct Collator in case user 
has requested territory based collation through jdbc url at database 
create time.
 >
Need to somehow test a store btree recovery to make sure this works in the
redo recovery case.

 > Store changes
 >
 > 1)Store column level metadata for collate in Store. Store keeps a 
version number that describes the structure of column level metadata. 
For existing pre-10.3 databases which get upgraded to 10.3 and for new 
10.3 databases with default collatoin(UCS_BASIC), the structure of 
column level metadata will remain same as 10.2 structure of column level 
metadata, ie they will not include collate information in their store 
metadata. A new version would be used in Store for structure of column 
level metadata if the newly created 10.3 database has asked for 
territory based collation. In other words, information about collate 
will be kept in Store column level metadata only if we are working with 
a 10.3 newly created database with territory based collation. This 
approach will make sure that we do not have to do an on-disk store 
metadata upgrade when upgrading a pre-10.3 database to 10.3 version.
 >
This is not exactly what I expected.  I think there should be a single 
10.3 metadata format that includes collation information (whether it is 
default or not).  The actual format of the data can optmize as 
appropriate for non-default values if we think that is worth it.  I 
think the right thing to do is do
only manage either 10.3 version store metadata or pre-10.3 - and not base it
on whether there is a non-default collate or not.  The actual written format
of the 10.3 metadata may optimize the case where there is no non-default
collate - but it is still a 10.3 version.

So the world breaks down into:
pre-10.3 db's, which includes dbs that are running against 10.3 but only
     soft upgrade.
     o These db's never get the 10.3 metadata format

hard upgrade 10.3 db's
     o Code will read both pre and post 10.3 metadata.  All new 
conglomerates
       always write the 10.3 version metadata.  The code has to read both
       old and new formats.

new 10.3 db's
     o code will only write 10.3 metadata.

So what I was expecting was:

1)Store column level metadata for collate in Store. Store keeps a 
version number that describes the structure of column level metadata. 
For existing pre-10.3 databases which get soft upgraded to 10.3, the 
structure of column level metadata will remain same as 10.2 structure of 
column level metadata, ie they will not include collate information in 
their store metadata.
For any conglomerate created in a 10.3 new database or a 10.3 hard 
upgraded database a new version would be used in Store to include 
information about the collation for each column's metadata stored.



 > 2)Currently, store uses Monitor to create DVD template rows. The 
logic of creating DVDs using formatids should be factored out from 
Monitor into DataValueFactory. Talking in terms of code, 
RowUtil.newClassInfoTemplate should call DVF.classFromIdentifier rather 
than Monitor.classFromIdentifier.
     I believe there are more than one of these Monitor interfaces that will
     have to be moved in DVF.
 >
 > 3)This item is related to item 2. With Derby 10.3, collation type 
will be the additional metadata in store for each column. When store 
will call DVF to create DVD template row, it will pass the formatids and 
the collation types. DVF will need to be able to assoicate the correct 
Collator with the DVD for Char datatypes depending on the collation 
type. And in order to find the correct Collator, DVF needs to know the 
locale of the database. This locale information will be set on DVF using 
a new method on DVF called void setLocale(Locale). This call will be 
made by BasicDatabase after DVF has finished booting and before store 
starts booting.
 >

I think you should add an item to track the work in the sorter.  Maybe 
you are
tracking this as part of the aggregate items?  I know there are some 
template stuff specific to the sorter.  You should at least make sure to 
test both a sort that is in memory and a sort that is too big to fit 
into memory that includes special collation stuff.

Re: some comments on collation wiki page

Posted by Rick Hillegas <Ri...@Sun.COM>.

Army wrote:
> Army wrote:
>>
>> Since a parameter maker does not have a "defined" schema, does 
>> "current schema" mean the schema when the statement is prepared, or 
>> the schema when it is executed?
>>
>> For example I can do the following in JDBC:
>>
>> // Default schema ("APP").
>>
>> PreparedStatement ps = conn.prepareStatement(
>>   "select tablename, tabletype from sys.systables where tablename = ?");
>
> <snip>
>
>> If "current schema" means the "schema when the statement is 
>> *prepared*" then both of the above statements would fail (because 
>> there's no CAST on the syscol). That consistency would probably be a 
>> good thing (less confusing for users).
>
> On the other hand, if the statement is changed to:
>
>    // Default schema APP.
>
>    PreparedStatement ps = conn.prepareStatement(
>        "select tablename, tabletype from sys.systables where " +
>        "CAST (tablename as varchar(128)) = ?");
>
> then it might be better to take "current schema" to mean the time of 
> *execution.*  That way the the above statement will run correctly 
> regardless of the current schema.  If "current schema" was determined 
> at compile time, the above statement would only work if "current 
> schema" was a non-system schema.
>
> Since I think we are going to encourage users to CAST system columns 
> when doing comparisons (at least that's what I gathered from the 
> various discussion threads), maybe it would be better to take the 
> "current schema" for a parameter marker at execution time, after all...?
>
> Army
>
I think that all bind-time decisions need to be resolved the same way. 
Otherwise the behavior of our sql interpreter will be very hard to 
explain. A similar issue comes up with the following statement:

  PreparedStatement ps = conn.prepareStatement( "select * from T" );

T could resolve to a different table depending on what your current 
schema is. What happens if this statement is prepared in one schema, 
then the user changes schema, then the user executes the prepared statement?

Regards,
-Rick

Re: some comments on collation wiki page

Posted by Army <qo...@gmail.com>.

Army wrote:

<snip>

Oops, clicked send too quickly on that last one.  Will try this again once I 
clarify the terminology in my head, then will post.

> Daniel John Debrunner wrote:
> 
>>
>> I'm a little lost by this. What do these two terms mean to you?
>>
>>  "run correctly"
> 
> 
> run without error
> 
>>  "would only work if"
> 
> 
> run without error
> 
> Sorry for the confusion.
> 
> Army
> 
>

Re: some comments on collation wiki page

Posted by Mamta Satoor <ms...@gmail.com>.

No worries, Army. In fact your mail thread brought up the hole in the wiki
page where I was vague about current schema when talking about parameters.

Mamta


On 4/3/07, Army <qo...@gmail.com> wrote:
>
> Daniel John Debrunner wrote:
> >
> > I don't see how when the current schema is defined makes a difference as
> > to if they run without error or not? How does the current schema being a
> > user schema or a system schema make this statement compile to an error?
> >
> >>    // Default schema APP.
> >>
> >>    PreparedStatement ps = conn.prepareStatement(
> >>        "select tablename, tabletype from sys.systables where " +
> >>        "CAST (tablename as varchar(128)) = ?");
> >
>
> Okay, you are of course correct.  I somehow got it in my head that the
> collation
> of the CAST was determined at execution time, but that's not at all true.
> Nothing on the wiki indicates that, it was just a weird conclusion I
> jumped to
> for some reason.
>
> Apologies for the noise.  Mamta, please feel free to disregard my comment
> and go
> ahead with your plans to set JDBC param character set at prepare time
> (which is
> the way that makes sense).
>
> Sorry again,
> Army
>
>

Re: some comments on collation wiki page

Posted by Army <qo...@gmail.com>.

Daniel John Debrunner wrote:
> 
> I don't see how when the current schema is defined makes a difference as 
> to if they run without error or not? How does the current schema being a 
> user schema or a system schema make this statement compile to an error?
> 
>>    // Default schema APP.
>>
>>    PreparedStatement ps = conn.prepareStatement(
>>        "select tablename, tabletype from sys.systables where " +
>>        "CAST (tablename as varchar(128)) = ?"); 
> 

Okay, you are of course correct.  I somehow got it in my head that the collation 
of the CAST was determined at execution time, but that's not at all true. 
Nothing on the wiki indicates that, it was just a weird conclusion I jumped to 
for some reason.

Apologies for the noise.  Mamta, please feel free to disregard my comment and go 
ahead with your plans to set JDBC param character set at prepare time (which is 
the way that makes sense).

Sorry again,
Army

Re: some comments on collation wiki page

Posted by Daniel John Debrunner <dj...@apache.org>.

Army wrote:
> Daniel John Debrunner wrote:
>>
>> I'm a little lost by this. What do these two terms mean to you?
>>
>>  "run correctly"
> 
> run without error
> 
>>  "would only work if"
> 
> run without error
> 
> Sorry for the confusion.

Still confused. :-)

I don't see how when the current schema is defined makes a difference as 
to if they run without error or not? How does the current schema being a 
user schema or a system schema make this statement compile to an error?

>    // Default schema APP.
> 
>    PreparedStatement ps = conn.prepareStatement(
>        "select tablename, tabletype from sys.systables where " +
>        "CAST (tablename as varchar(128)) = ?"); 

Thanks,
Dan.

Re: some comments on collation wiki page

Posted by Army <qo...@gmail.com>.

Daniel John Debrunner wrote:
> 
> I'm a little lost by this. What do these two terms mean to you?
> 
>  "run correctly"

run without error

>  "would only work if"

run without error

Sorry for the confusion.

Army

Re: some comments on collation wiki page

Posted by Daniel John Debrunner <dj...@apache.org>.

Army wrote:

> On the other hand, if the statement is changed to:
> 
>    // Default schema APP.
> 
>    PreparedStatement ps = conn.prepareStatement(
>        "select tablename, tabletype from sys.systables where " +
>        "CAST (tablename as varchar(128)) = ?");
> 
> then it might be better to take "current schema" to mean the time of 
> *execution.*  That way the the above statement will run correctly 
> regardless of the current schema.  If "current schema" was determined at 
> compile time, the above statement would only work if "current schema" 
> was a non-system schema.

I'm a little lost by this. What do these two terms mean to you?

  "run correctly"
  "would only work if"

Thanks,
Dan.

Re: some comments on collation wiki page

Posted by Mamta Satoor <ms...@gmail.com>.

Army, riding on the CAST wagon a little bit more, a user can rewrite the
query as follows and then it won't matter in what schema the query is
getting run.
   PreparedStatement ps = conn.prepareStatement(
       "select tablename, tabletype from sys.systables where " +
       "CAST (tablename as varchar(128)) = CAST(? as varchar(128))");

So, we can define the ? to take the character set of the schema where the
statement is getting prepared. And user can use CAST to make the collation
work the way they desire.

Does that sound good?
Mamta


On 4/3/07, Mamta Satoor <ms...@gmail.com> wrote:
>
> I was leaning towards using the current schema at the prepared time
> because I am hoping to do all the collation validation at the compile phase
> rather than the execute phase. Picking up the schema at the prepared time
> will help me enable that.
>
> Let me look at your following mail a little bit more to see if I need to
> change my mind :)
>
> Mamta
>
>
> On 4/3/07, Army <qo...@gmail.com> wrote:
> >
> > Army wrote:
> > >
> > > Since a parameter maker does not have a "defined" schema, does
> > "current
> > > schema" mean the schema when the statement is prepared, or the schema
> > > when it is executed?
> > >
> > > For example I can do the following in JDBC:
> > >
> > > // Default schema ("APP").
> > >
> > > PreparedStatement ps = conn.prepareStatement(
> > >   "select tablename, tabletype from sys.systables where tablename =
> > ?");
> >
> > <snip>
> >
> > > If "current schema" means the "schema when the statement is
> > *prepared*"
> > > then both of the above statements would fail (because there's no CAST
> > on
> > > the syscol). That consistency would probably be a good thing (less
> > > confusing for users).
> >
> > On the other hand, if the statement is changed to:
> >
> >    // Default schema APP.
> >
> >    PreparedStatement ps = conn.prepareStatement(
> >        "select tablename, tabletype from sys.systables where " +
> >        "CAST (tablename as varchar(128)) = ?");
> >
> > then it might be better to take "current schema" to mean the time of
> > *execution.*  That way the the above statement will run correctly
> > regardless of
> > the current schema.  If "current schema" was determined at compile time,
> > the
> > above statement would only work if "current schema" was a non-system
> > schema.
> >
> > Since I think we are going to encourage users to CAST system columns
> > when doing
> > comparisons (at least that's what I gathered from the various discussion
> >
> > threads), maybe it would be better to take the "current schema" for a
> > parameter
> > marker at execution time, after all...?
> >
> > Army
> >
> >
>

Re: some comments on collation wiki page

Posted by Mamta Satoor <ms...@gmail.com>.

I was leaning towards using the current schema at the prepared time because
I am hoping to do all the collation validation at the compile phase rather
than the execute phase. Picking up the schema at the prepared time will help
me enable that.

Let me look at your following mail a little bit more to see if I need to
change my mind :)

Mamta


On 4/3/07, Army <qo...@gmail.com> wrote:
>
> Army wrote:
> >
> > Since a parameter maker does not have a "defined" schema, does "current
> > schema" mean the schema when the statement is prepared, or the schema
> > when it is executed?
> >
> > For example I can do the following in JDBC:
> >
> > // Default schema ("APP").
> >
> > PreparedStatement ps = conn.prepareStatement(
> >   "select tablename, tabletype from sys.systables where tablename = ?");
>
> <snip>
>
> > If "current schema" means the "schema when the statement is *prepared*"
> > then both of the above statements would fail (because there's no CAST on
> > the syscol). That consistency would probably be a good thing (less
> > confusing for users).
>
> On the other hand, if the statement is changed to:
>
>    // Default schema APP.
>
>    PreparedStatement ps = conn.prepareStatement(
>        "select tablename, tabletype from sys.systables where " +
>        "CAST (tablename as varchar(128)) = ?");
>
> then it might be better to take "current schema" to mean the time of
> *execution.*  That way the the above statement will run correctly
> regardless of
> the current schema.  If "current schema" was determined at compile time,
> the
> above statement would only work if "current schema" was a non-system
> schema.
>
> Since I think we are going to encourage users to CAST system columns when
> doing
> comparisons (at least that's what I gathered from the various discussion
> threads), maybe it would be better to take the "current schema" for a
> parameter
> marker at execution time, after all...?
>
> Army
>
>

Re: some comments on collation wiki page

Posted by Army <qo...@gmail.com>.

Army wrote:
> 
> Since a parameter maker does not have a "defined" schema, does "current 
> schema" mean the schema when the statement is prepared, or the schema 
> when it is executed?
> 
> For example I can do the following in JDBC:
> 
> // Default schema ("APP").
> 
> PreparedStatement ps = conn.prepareStatement(
>   "select tablename, tabletype from sys.systables where tablename = ?");

<snip>

> If "current schema" means the "schema when the statement is *prepared*" 
> then both of the above statements would fail (because there's no CAST on 
> the syscol). That consistency would probably be a good thing (less 
> confusing for users).

On the other hand, if the statement is changed to:

    // Default schema APP.

    PreparedStatement ps = conn.prepareStatement(
        "select tablename, tabletype from sys.systables where " +
        "CAST (tablename as varchar(128)) = ?");

then it might be better to take "current schema" to mean the time of 
*execution.*  That way the the above statement will run correctly regardless of 
the current schema.  If "current schema" was determined at compile time, the 
above statement would only work if "current schema" was a non-system schema.

Since I think we are going to encourage users to CAST system columns when doing 
comparisons (at least that's what I gathered from the various discussion 
threads), maybe it would be better to take the "current schema" for a parameter 
marker at execution time, after all...?

Army

Re: some comments on collation wiki page

Posted by Mamta Satoor <ms...@gmail.com>.

Thanks Army, for going through the wiki page and reviewing it.

I was thinking of using the character set of the schema where the statement
is prepared. Your example demonstrates that from user point of view, this
will be easier to understand and easier for us to document.

I will add your comment to the wiki page and address it at the end of the
day along with other comments if nonone has objection to this approach.

Mamta


On 4/3/07, Army <qo...@gmail.com> wrote:
>
> Thank you very much for the wiki page, Mamta.  It is indeed very helpful!
>
> Mamta Satoor wrote:
> > The character set of the schema the function is defined in. That keeps
> it
> > least confusing and easier for the users to understand.
>
> One question similar to Dan's, except mine is w.r.t:
>
> > 8)JDBC parameters (ie. ?) where the type of the parameter is a character
> > type will have the same collation as current schema's character set,
> based
> > on 3) above.
>
> Since a parameter maker does not have a "defined" schema, does "current
> schema"
> mean the schema when the statement is prepared, or the schema when it is
> executed?
>
> For example I can do the following in JDBC:
>
>    // Default schema ("APP").
>
>    PreparedStatement ps = conn.prepareStatement(
>        "select tablename, tabletype from sys.systables where tablename =
> ?");
>
>    ps.setString(1, "Acorn");
>
>    ResultSet rs = ps.executeQuery();
>    while (rs.next())
>        System.out.println("  -=> " + rs.getString(1) + " - " +
> rs.getString(2));
>
>    // Change current schema.
>    s.execute("set schema sys");
>
>    rs = ps.executeQuery();
>    while (rs.next())
>        System.out.println("  -=> " + rs.getString(1) + " - " +
> rs.getString(2));
>
> If we take "current schema" to mean the "schema when the statement is
> executed"
> then the first statement will fail (comparison of different collations))
> whereas
> the second one will succeed.
>
> If "current schema" means the "schema when the statement is prepared" then
> both
> of the above statements would fail (because there's no CAST on the
> syscol). That
> consistency would probably be a good thing (less confusing for users).
>
> Army
>
>

Re: some comments on collation wiki page

Posted by Army <qo...@gmail.com>.

Thank you very much for the wiki page, Mamta.  It is indeed very helpful!

Mamta Satoor wrote:
> The character set of the schema the function is defined in. That keeps it
> least confusing and easier for the users to understand.

One question similar to Dan's, except mine is w.r.t:

 > 8)JDBC parameters (ie. ?) where the type of the parameter is a character
 > type will have the same collation as current schema's character set, based
 > on 3) above.

Since a parameter maker does not have a "defined" schema, does "current schema" 
mean the schema when the statement is prepared, or the schema when it is executed?

For example I can do the following in JDBC:

    // Default schema ("APP").

    PreparedStatement ps = conn.prepareStatement(
        "select tablename, tabletype from sys.systables where tablename = ?");

    ps.setString(1, "Acorn");

    ResultSet rs = ps.executeQuery();
    while (rs.next())
        System.out.println("  -=> " + rs.getString(1) + " - " + rs.getString(2));

    // Change current schema.
    s.execute("set schema sys");

    rs = ps.executeQuery();
    while (rs.next())
        System.out.println("  -=> " + rs.getString(1) + " - " + rs.getString(2));

If we take "current schema" to mean the "schema when the statement is executed" 
then the first statement will fail (comparison of different collations)) whereas 
the second one will succeed.

If "current schema" means the "schema when the statement is prepared" then both 
of the above statements would fail (because there's no CAST on the syscol). That 
consistency would probably be a good thing (less confusing for users).

Army

Re: some comments on collation wiki page

Posted by Mamta Satoor <ms...@gmail.com>.

If no objections by the end of the day, then I will go ahead and update the
wiki page for the character set of the user defined functions.

thanks,
Mamta


On 4/3/07, Mamta Satoor <ms...@gmail.com> wrote:
>
> The character set of the schema the function is defined in. That keeps it
> least confusing and easier for the users to understand.
>
> Mamta
>
>
>  On 4/3/07, Daniel John Debrunner <dj...@apache.org> wrote:
> >
> >
> > http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478
> >
> > > 7)For user defined functions' that return character string type, the
> > return type's collation
> > > will have the same collation as current schema's character set.
> >
> > The "current schema's" character set, or the character set of the schema
> >
> > the function is defined in? And if it's the current schema, is it
> > current at function definition time, or current when used?
> >
> > Dan.
> >
> >
> >
> >
> >
> >
>

Re: some comments on collation wiki page

Posted by Mamta Satoor <ms...@gmail.com>.

The character set of the schema the function is defined in. That keeps it
least confusing and easier for the users to understand.

Mamta


On 4/3/07, Daniel John Debrunner <dj...@apache.org> wrote:
>
>
> http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478
>
> > 7)For user defined functions' that return character string type, the
> return type's collation
> > will have the same collation as current schema's character set.
>
> The "current schema's" character set, or the character set of the schema
> the function is defined in? And if it's the current schema, is it
> current at function definition time, or current when used?
>
> Dan.
>
>
>
>
>
>

Re: some comments on collation wiki page

Posted by Daniel John Debrunner <dj...@apache.org>.

http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478

> 7)For user defined functions' that return character string type, the return type's collation 
 > will have the same collation as current schema's character set.

The "current schema's" character set, or the character set of the schema 
the function is defined in? And if it's the current schema, is it 
current at function definition time, or current when used?

Dan.

Re: some comments on collation wiki page

Posted by Mike Matrigali <mi...@sbcglobal.net>.


Mamta Satoor wrote:
> Mike, thanks for going through the wiki page. Especially, the store 
> section since I am not too familiar with store code.
>  
> Following are my responses
> 1)Mike:Obtain collation id(type) from DataValueDescriptor(DVD) through 
> DVD.getCollateId method.
> The way I had collation in mind, collation type would be stored only in 
> DataTypeDescriptor(DTD) and DVD would only store Collator object in DVD, 
> ie the DVD will be unaware of how the Collator object was created using 
> a specific collation type. The DVD for non-default-collation subclass of 
> SQLChar is CollatorSQLChar and currently, it's constructor looks as follows
>      public CollatorSQLChar(String val, RuleBasedCollator 
> collatorForCharacterDatatypes)
> ie, no collation type info is passed to the CollatorSQLChar. May be to 
> support the store requirement, we should add another parameter to the 
> constructor above, called collationId. This collationId can later be 
> retrieved by store using DVF.getCollateId. Any comments from anyone?

Yes looks like you need to pass collateId in constructor and save it 
around just for this call, I don't think it should be used for anything
else.  I don't see another way unless it is possible to write a function
that returns a collate id based on an input RuleBasedCollator.

Re: some comments on collation wiki page

Posted by Mike Matrigali <mi...@sbcglobal.net>.


Suresh Thalamati wrote:
> Mike Matrigali wrote:
> 
>>
>> Mike Matrigali wrote:
>>
> <snip>
> 
>>>
>>> Ok, didn't realize this broke the model.  As long as the info gets down
>>> to store I don't really care how.  So if you can't get the info from
>>> the template we pass down, then we should just add another array 
>>> argument to createConglomerate and createAndLoadConglomerate which would
>>> make it look like (this was the approach taken to pass down the 
>>> columnOrdering which is basically ascend/descend info for indexes):
>>>
>>> long createConglomerate(
>>> String                  implementation,
>>> DataValueDescriptor[]   template,
>>> ColumnOrdering[]        columnOrder,
>>> CollationIds[]        collationIds,
>>> Properties              properties,
>>> int                     temporaryFlag)
>>>     throws StandardException;
>>
>>
>>
>>
>> I didn't mean to create a new datatype for the collation id's,
>> I think int or long is fine.
>>
>> long createConglomerate(
>> String                  implementation,
>> DataValueDescriptor[]   template,
>> ColumnOrdering[]        columnOrder,
>> int[]                   collationIds,
>> Properties              properties,
>> int                     temporaryFlag)
>>     throws StandardException;
>>
> 
> Mike ,
> 
> Any particular reason why you don't want add collationIds, to the 
> already existing ColumnOrdering information,  instead of
> passing it as separate int[] array.
That looks like a good idea.  It will take generating a ColumnOrdering 
when we create a heap, where we didn't before but that should not be
too hard.  We use this for sorting also so it does seem like a natural
place for it.
> 
> 
> Thanks
> -suresh
> 
>

Re: some comments on collation wiki page

Posted by Suresh Thalamati <su...@gmail.com>.

Mike Matrigali wrote:
> 
> Mike Matrigali wrote:
> 
<snip>
>>
>> Ok, didn't realize this broke the model.  As long as the info gets down
>> to store I don't really care how.  So if you can't get the info from
>> the template we pass down, then we should just add another array 
>> argument to createConglomerate and createAndLoadConglomerate which would
>> make it look like (this was the approach taken to pass down the 
>> columnOrdering which is basically ascend/descend info for indexes):
>>
>> long createConglomerate(
>> String                  implementation,
>> DataValueDescriptor[]   template,
>> ColumnOrdering[]        columnOrder,
>> CollationIds[]        collationIds,
>> Properties              properties,
>> int                     temporaryFlag)
>>     throws StandardException;
> 
> 
> 
> I didn't mean to create a new datatype for the collation id's,
> I think int or long is fine.
> 
> long createConglomerate(
> String                  implementation,
> DataValueDescriptor[]   template,
> ColumnOrdering[]        columnOrder,
> int[]                   collationIds,
> Properties              properties,
> int                     temporaryFlag)
>     throws StandardException;
> 

Mike ,

Any particular reason why you don't want add collationIds, to the 
already existing ColumnOrdering information,  instead of
passing it as separate int[] array.


Thanks
-suresh

Re: some comments on collation wiki page

Posted by Mike Matrigali <mi...@sbcglobal.net>.


Mike Matrigali wrote:
> 
> 
> Daniel John Debrunner wrote:
> 
>> Mamta Satoor wrote:
>>
>>> Mike, thanks for going through the wiki page. Especially, the store 
>>> section since I am not too familiar with store code.
>>>  
>>> Following are my responses
>>> 1)Mike:Obtain collation id(type) from DataValueDescriptor(DVD) 
>>> through DVD.getCollateId method.
>>> The way I had collation in mind, collation type would be stored only 
>>> in DataTypeDescriptor(DTD) and DVD would only store Collator object 
>>> in DVD, ie the DVD will be unaware of how the Collator object was 
>>> created using a specific collation type. 
>>
>>
>>
>> I think separating meta-data from values is the correct approach. 
>> Adding a collator id method to the DVD approach seems incorrect.
>>
>> Store already accepts meta-data for conglomerate creation, can this 
>> just be additional meta-data?
> 
> 
> Ok, didn't realize this broke the model.  As long as the info gets down
> to store I don't really care how.  So if you can't get the info from
> the template we pass down, then we should just add another array 
> argument to createConglomerate and createAndLoadConglomerate which would
> make it look like (this was the approach taken to pass down the 
> columnOrdering which is basically ascend/descend info for indexes):
> 
> long createConglomerate(
> String                  implementation,
> DataValueDescriptor[]   template,
> ColumnOrdering[]        columnOrder,
> CollationIds[]        collationIds,
> Properties              properties,
> int                     temporaryFlag)
>     throws StandardException;


I didn't mean to create a new datatype for the collation id's,
I think int or long is fine.

long createConglomerate(
String                  implementation,
DataValueDescriptor[]   template,
ColumnOrdering[]        columnOrder,
int[]                   collationIds,
Properties              properties,
int                     temporaryFlag)
     throws StandardException;

> 
> Seems like create table, alter table add column, and system catalog 
> creation will have the necessary info to fill it in.
> 
> If this seems reasonable I would be willing to make the store related
> changes for the collation work.  I would probably leave the calls
> setting basic collation until mamta checks in support to set the
> right collation.
> 
>>
>> Dan.
>>
>>
>>
>>
>>
> 
> 
>

Re: some comments on collation wiki page

Posted by Mike Matrigali <mi...@sbcglobal.net>.


Daniel John Debrunner wrote:
> Mamta Satoor wrote:
> 
>> Mike, thanks for going through the wiki page. Especially, the store 
>> section since I am not too familiar with store code.
>>  
>> Following are my responses
>> 1)Mike:Obtain collation id(type) from DataValueDescriptor(DVD) through 
>> DVD.getCollateId method.
>> The way I had collation in mind, collation type would be stored only 
>> in DataTypeDescriptor(DTD) and DVD would only store Collator object in 
>> DVD, ie the DVD will be unaware of how the Collator object was created 
>> using a specific collation type. 
> 
> 
> I think separating meta-data from values is the correct approach. Adding 
> a collator id method to the DVD approach seems incorrect.
> 
> Store already accepts meta-data for conglomerate creation, can this just 
> be additional meta-data?

Ok, didn't realize this broke the model.  As long as the info gets down
to store I don't really care how.  So if you can't get the info from
the template we pass down, then we should just add another array 
argument to createConglomerate and createAndLoadConglomerate which would
make it look like (this was the approach taken to pass down the 
columnOrdering which is basically ascend/descend info for indexes):

long createConglomerate(
String                  implementation,
DataValueDescriptor[]   template,
ColumnOrdering[]        columnOrder,
CollationIds[]		collationIds,
Properties              properties,
int                     temporaryFlag)
     throws StandardException;

Seems like create table, alter table add column, and system catalog 
creation will have the necessary info to fill it in.

If this seems reasonable I would be willing to make the store related
changes for the collation work.  I would probably leave the calls
setting basic collation until mamta checks in support to set the
right collation.
> 
> Dan.
> 
> 
> 
> 
>

Re: some comments on collation wiki page

Posted by Daniel John Debrunner <dj...@apache.org>.

Mamta Satoor wrote:
> Mike, thanks for going through the wiki page. Especially, the store 
> section since I am not too familiar with store code.
>  
> Following are my responses
> 1)Mike:Obtain collation id(type) from DataValueDescriptor(DVD) through 
> DVD.getCollateId method.
> The way I had collation in mind, collation type would be stored only in 
> DataTypeDescriptor(DTD) and DVD would only store Collator object in DVD, 
> ie the DVD will be unaware of how the Collator object was created using 
> a specific collation type. 

I think separating meta-data from values is the correct approach. Adding 
a collator id method to the DVD approach seems incorrect.

Store already accepts meta-data for conglomerate creation, can this just 
be additional meta-data?

Dan.

Re: some comments on collation wiki page

Posted by Mamta Satoor <ms...@gmail.com>.

Mike, thanks for going through the wiki page. Especially, the store section
since I am not too familiar with store code.

Following are my responses
1)Mike:Obtain collation id(type) from DataValueDescriptor(DVD) through
DVD.getCollateId method.
The way I had collation in mind, collation type would be stored only in
DataTypeDescriptor(DTD) and DVD would only store Collator object in DVD, ie
the DVD will be unaware of how the Collator object was created using a
specific collation type. The DVD for non-default-collation subclass of
SQLChar is CollatorSQLChar and currently, it's constructor looks as follows
     public CollatorSQLChar(String val, RuleBasedCollator
collatorForCharacterDatatypes)
ie, no collation type info is passed to the CollatorSQLChar. May be to
support the store requirement, we should add another parameter to the
constructor above, called collationId. This collationId can later be
retrieved by store using DVF.getCollateId. Any comments from anyone?

2)Mike:Need to somehow test a store btree recovery to make sure this works
in the redo recovery case.
I will add this test case requirement under Testing section on the wiki
page.

3)Mike:Version number of store level column metadata
I will replace 1st item under Store section with what you have proposed.
Just to be clear, here is what I think Store will be doing. In order to
support soft-upgrade mode, Derby 10.3 will have to recognize and support the
pre-10.3 store level column metadata which will not have collation info in
it. In addition, Derby 10.3 will have the new store level column metadata
which includes collation info for all 10.3 databases (ie upgraded
pre-10.3database, newly created
10.3 database with default collation and newly created 10.3 database with
territory based collation).
I will also add test soft-upgrade and hard-upgrade mode under Testing
section of the wiki page.

4)Mike:I believe there are more than one of these Monitor interfaces that
will have to be moved in DVF.
That is very possible. I didn't look through the store code in detail to see
what other apis should be moved to DVF from Monitor interface.

5)Mike:impact on sorter
I will add a line item for sorter and will also add test requirement for
sorter that will fit in memory and the sorter that will spill over to the
disk.

Please let me know if I missed anything.
Mamta

On 4/2/07, Mike Matrigali <mi...@sbcglobal.net> wrote:
>
> I have included some comments based on th following wiki page:
>
> > 6)Store needs a way to determine the collation type for a given DVD.
> This collation type will then be saved in the column metadata. Provide
> the api on DVD to return the correct collation type.
>     Just an addition here.  The way I expect this info to get down to
> store
>     is that the template passed in when creating a conglomerate will have
>     DVD's that have the correct collation id associated with them.
>     This should be true for both btree and heap conglomerate creates, you
>     should update iapi interface doc, but I don't think the actual args
>     to the interface change.
>
>     I assume the api is something like int getCollateId() on a DVD.
>
> >
> > 7)BasicDatabase needs to set the Locale value in the DVF after DVF
> has booted. Probably with an api like DVF.setLocale(Locale). DVF will
> use this Locale object to construct the correct Collator in case user
> has requested territory based collation through jdbc url at database
> create time.
> >
> Need to somehow test a store btree recovery to make sure this works in the
> redo recovery case.
>
> > Store changes
> >
> > 1)Store column level metadata for collate in Store. Store keeps a
> version number that describes the structure of column level metadata.
> For existing pre-10.3 databases which get upgraded to 10.3 and for new
> 10.3 databases with default collatoin(UCS_BASIC), the structure of
> column level metadata will remain same as 10.2 structure of column level
> metadata, ie they will not include collate information in their store
> metadata. A new version would be used in Store for structure of column
> level metadata if the newly created 10.3 database has asked for
> territory based collation. In other words, information about collate
> will be kept in Store column level metadata only if we are working with
> a 10.3 newly created database with territory based collation. This
> approach will make sure that we do not have to do an on-disk store
> metadata upgrade when upgrading a pre-10.3 database to 10.3 version.
> >
> This is not exactly what I expected.  I think there should be a single
> 10.3 metadata format that includes collation information (whether it is
> default or not).  The actual format of the data can optmize as
> appropriate for non-default values if we think that is worth it.  I
> think the right thing to do is do
> only manage either 10.3 version store metadata or pre-10.3 - and not base
> it
> on whether there is a non-default collate or not.  The actual written
> format
> of the 10.3 metadata may optimize the case where there is no non-default
> collate - but it is still a 10.3 version.
>
> So the world breaks down into:
> pre-10.3 db's, which includes dbs that are running against 10.3 but only
>     soft upgrade.
>     o These db's never get the 10.3 metadata format
>
> hard upgrade 10.3 db's
>     o Code will read both pre and post 10.3 metadata.  All new
> conglomerates
>       always write the 10.3 version metadata.  The code has to read both
>       old and new formats.
>
> new 10.3 db's
>     o code will only write 10.3 metadata.
>
> So what I was expecting was:
>
> 1)Store column level metadata for collate in Store. Store keeps a
> version number that describes the structure of column level metadata.
> For existing pre-10.3 databases which get soft upgraded to 10.3, the
> structure of column level metadata will remain same as 10.2 structure of
> column level metadata, ie they will not include collate information in
> their store metadata.
> For any conglomerate created in a 10.3 new database or a 10.3 hard
> upgraded database a new version would be used in Store to include
> information about the collation for each column's metadata stored.
>
>
>
> > 2)Currently, store uses Monitor to create DVD template rows. The
> logic of creating DVDs using formatids should be factored out from
> Monitor into DataValueFactory. Talking in terms of code,
> RowUtil.newClassInfoTemplate should call DVF.classFromIdentifier rather
> than Monitor.classFromIdentifier.
>     I believe there are more than one of these Monitor interfaces that
> will
>     have to be moved in DVF.
> >
> > 3)This item is related to item 2. With Derby 10.3, collation type
> will be the additional metadata in store for each column. When store
> will call DVF to create DVD template row, it will pass the formatids and
> the collation types. DVF will need to be able to assoicate the correct
> Collator with the DVD for Char datatypes depending on the collation
> type. And in order to find the correct Collator, DVF needs to know the
> locale of the database. This locale information will be set on DVF using
> a new method on DVF called void setLocale(Locale). This call will be
> made by BasicDatabase after DVF has finished booting and before store
> starts booting.
> >
>
> I think you should add an item to track the work in the sorter.  Maybe
> you are
> tracking this as part of the aggregate items?  I know there are some
> template stuff specific to the sorter.  You should at least make sure to
> test both a sort that is in memory and a sort that is too big to fit
> into memory that includes special collation stuff.
>
>
>