You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metamodel.apache.org by Kasper Sørensen <i....@gmail.com> on 2014/03/25 15:54:40 UTC

[DISCUSS] Concern regarding ColumnType.getJavaEquivalentClass()

Hi all,

I just came across a potential issue in MetaModel's design and want to
share it and maybe start thinking of ways to fix or work around it...

In our ColumnType enum we have the method getJavaEquivalentClass() which is
used to tell which java type to expect when querying a particular column.
For instance, of I query a VARCHAR column, I can expect a java.lang.String
value.

Now the trouble is that in our JDBC module we have a system property which
allows for eager loading of BLOBs and CLOBs so that they are automatically
read into byte-arrays and Strings respectively. This is a great utility
because otherwise the user has to do a lot of tedious work with
inputstreams etc which in most cases isn't particularly useful - in most
cases you just want the byte[ ] or String.

Now the trouble is that if you turn that system property on, you get
Strings or byte-arrays but the column type is still CLOB/BLOB and that
means that the "expected equivalent Java class" is still java.sql.Clob or
java.sql.Blob! If you build your code to expect that, then it will
eventually break because you get a String or a byte-array instead.

How to make this consistent? I can think of a few ways, but none that I
really love:

1) Probably the easiest way to do it is to let the JDBC datacontext give
the columns other ColumnTypes. But that sorta doesn't feel good now that
the "real" datatype is CLOB, that we will then call it e.g. VARCHAR.

2) We can remove the getEquivalentJavaClass() from ColumnType and instead
make it a direct member of the Column class. This will break backwards
compatibility of the API.

3) We can make ColumnType an interface or class instead of a enum. Then the
behaviour can simply be plugged in by the JDBC DataContext. I do like this
approach the best for many reasons, but it has the downside that it also
breaks backwards compatibility of the API and that there will no longer be
an enumerable and finite list of ColumnTypes. Maybe we could alleviate that
problem by ALSO having an enum with the typical implementations or
something like that.

Maybe there are other solutions that I didn't think of.

Regards,
Kasper

Re: [DISCUSS] Concern regarding ColumnType.getJavaEquivalentClass()

Posted by Kasper Sørensen <i....@gmail.com>.
Hi all,

I wish to bump this thread because I see that it is one of the few great
API breakages that we should make a decision on. If we do this it will
break backwards compatibility, and I am usually very cautious to avoid
that. But in this case I personally think it is a very important change
since it make the API more flexibility and also removes an abstraction leak
of datatypes from JDBC into the core of MetaModel.

Should we do a vote on the change, or does anyone (please) want to share
their view on the matter?

Kasper


2014-04-06 11:35 GMT+02:00 Kasper Sørensen <i....@gmail.com>:

> I was going through our issue list, and realized that changing ColumnType
> to an interface instead of a enum would also have other benefits. For
> example, it would make solving this issue a lot easier:
>
> https://issues.apache.org/jira/browse/METAMODEL-8 -  Further details
> added to ColumnType
>
>
>
> 2014-04-04 7:52 GMT+02:00 Kasper Sørensen <i.am.kasper.sorensen@gmail.com
> >:
>
> >
> > I did post my patch just now, which converts ColumnType to an interface:
> https://reviews.apache.org/r/20028/
> >
> > Would be interesting if someone can check it out and give it some
> thought. Not a trivial change for backwards compatibility I think, but
> might be the best from an API standpoint.
> >
> >
> > 2014-04-01 20:15 GMT+02:00 Kasper Sørensen <
> i.am.kasper.sorensen@gmail.com>:
> >
> >> Hi Ankit,
> >>
> >> I think it is fair to assume that a CLOB will always be convertible to
> a String - if memory on the client permits at least. Normally a
> java.sql.Clob provides a getCharacterStream() method which returns a
> Reader. By setting the system property 'metamodel.jdbc.convert.lobs' to
> "true" we offer the service of eagerly reading that Reader to the end and
> returning a String. The good thing about that is that the client avoids
> issues of the DataSet is closed or even sometimes just moving the cursor to
> the next record may invalidate the Reader. In other words: With the LOB
> conversion we offer a way to make the MetaModel Row objects useable even
> though the connection to the statement/resultset/connection is closed. It's
> of course optional.
> >>
> >> The same idea is applied to BLOBs. We offer to automatically open the
> java.sql.Blob InputStream and load it into a byte-array.
> >>
> >> In my opinion the fact that something is a CLOB or a VARCHAR is a
> different type of information than whether it's values are represented as
> java.sql.Clob or String objects. The information on the column type is
> valuable in it's own right and I would not want us to "make believe" that a
> column which is really a CLOB is shown as a VARCHAR.
> >>
> >> Now that we're on the topic, we might even also consider if using the
> JDBC data type names is even appropriate as a default behaviour. I mean,
> we're acting in the Java world and a type like VARCHAR or TINYINT etc. is
> only used in SQL, not in native Java apps. Other types of datastores have
> different data types and I guess many of the datastores only have simple
> data types like String, Integer, Date etc. We could choose to go for a
> non-enum solution for the ColumnType and thereby also provide more
> flexibility. In the CSV module we could thus provide just a single
> ColumnType, e.g.:
> >>
> >>   public static final ColumnType CSV_COLUMN_TYPE = new
> ColumnTypeImpl("String", String.class);
> >>
> >> Read the example constructor arguments as this: Name would be "String"
> (or maybe "Text" is more appropriate for "text files"?) and expected Java
> class is String.class
> >>
> >> In the JDBC module we could have other column types available, e.g.:
> >>
> >>   public static final ColumnType VARCHAR = new
> ColumnTypeImpl("VARCHAR", String.class);
> >>   public static final ColumnType CLOB = new ColumnTypeImpl("CLOB",
> java.sql.Clob.class);
> >>   public static final ColumnType CLOB_CONVERTED = new
> ColumnTypeImpl("CLOB", String.class);
> >>   public static final ColumnType BLOB = new ColumnTypeImpl("BLOB",
> java.sql.Blob.class);
> >>   public static final ColumnType BLOB_CONVERTED = new
> ColumnTypeImpl("BLOB", byte[].class);
> >>
> >> (and there you then have my suggested solution for solving the Clob
> issue - to use a String for the name and a Class for the expected Java type)
> >>
> >> I can try to make a patch for this solution. That would sort of
> validate if it will work or not. But first ... what do you guys think?
> >>
> >> Only pitfall I see is that it breaks backwards compatibility. I think
> we can retain most of the compile-time compatibility, but not runtime
> compatibility (as in "drop in a new jar without recompile"). Also, old
> objects serialized with the old enums would not be properly deserialized
> into using these constants. Maybe that can be solved in the
> LegacyDeserializationObjectInputStream [1] somehow.
> >>
> >> Best regards,
> >> Kasper
> >>
> >>
> >> [1] Described here:
> http://wiki.apache.org/metamodel/MigratingFromEobjectsMetaModel
> >>
> >>
> >> 2014-04-01 10:28 GMT+02:00 Ankit Kumar <ak...@gmail.com>:
> >>
> >>> Hi Kasper,
> >>>
> >>> Thanks for initiating this discussion as we are hit by this in MM
> right now.
> >>>
> >>> Thinking just a bit from a client of MM perspective, I guess the CLOB
> could
> >>> contain a Object/File/Image/???/Large String text, so while getting
> CLOB's
> >>> from the real database we still can check the content and if it smells
> more
> >>> like a String then we return the VARCHAR otherwise an Object. May be
> this
> >>> is not the neatest solution but I guess for a client this provides the
> >>> flexibility needed.
> >>>
> >>> An additional point worth mentioning with respect to the above remark,
> >>> playing with different databases we see some databases allow large
> text to
> >>> be stored in VARCHAR kind of fields whereas some are still having
> limits of
> >>> 4000 characters like Oracle. So when using MM as the abstraction layer
> >>> above the databases this feature might be quite handy for a client.
> >>>
> >>> Excuse me if I sound a bit too biased here.
> >>>
> >>> Regards
> >>> Ankit
> >>>
> >>>
> >>> On Tue, Mar 25, 2014 at 3:54 PM, Kasper Sørensen <
> >>> i.am.kasper.sorensen@gmail.com> wrote:
> >>>
> >>> > Hi all,
> >>> >
> >>> > I just came across a potential issue in MetaModel's design and want
> to
> >>> > share it and maybe start thinking of ways to fix or work around it...
> >>> >
> >>> > In our ColumnType enum we have the method getJavaEquivalentClass()
> which is
> >>> > used to tell which java type to expect when querying a particular
> column.
> >>> > For instance, of I query a VARCHAR column, I can expect a
> java.lang.String
> >>> > value.
> >>> >
> >>> > Now the trouble is that in our JDBC module we have a system property
> which
> >>> > allows for eager loading of BLOBs and CLOBs so that they are
> automatically
> >>> > read into byte-arrays and Strings respectively. This is a great
> utility
> >>> > because otherwise the user has to do a lot of tedious work with
> >>> > inputstreams etc which in most cases isn't particularly useful - in
> most
> >>> > cases you just want the byte[ ] or String.
> >>> >
> >>> > Now the trouble is that if you turn that system property on, you get
> >>> > Strings or byte-arrays but the column type is still CLOB/BLOB and
> that
> >>> > means that the "expected equivalent Java class" is still
> java.sql.Clob or
> >>> > java.sql.Blob! If you build your code to expect that, then it will
> >>> > eventually break because you get a String or a byte-array instead.
> >>> >
> >>> > How to make this consistent? I can think of a few ways, but none
> that I
> >>> > really love:
> >>> >
> >>> > 1) Probably the easiest way to do it is to let the JDBC datacontext
> give
> >>> > the columns other ColumnTypes. But that sorta doesn't feel good now
> that
> >>> > the "real" datatype is CLOB, that we will then call it e.g. VARCHAR.
> >>> >
> >>> > 2) We can remove the getEquivalentJavaClass() from ColumnType and
> instead
> >>> > make it a direct member of the Column class. This will break
> backwards
> >>> > compatibility of the API.
> >>> >
> >>> > 3) We can make ColumnType an interface or class instead of a enum.
> Then the
> >>> > behaviour can simply be plugged in by the JDBC DataContext. I do
> like this
> >>> > approach the best for many reasons, but it has the downside that it
> also
> >>> > breaks backwards compatibility of the API and that there will no
> longer be
> >>> > an enumerable and finite list of ColumnTypes. Maybe we could
> alleviate that
> >>> > problem by ALSO having an enum with the typical implementations or
> >>> > something like that.
> >>> >
> >>> > Maybe there are other solutions that I didn't think of.
> >>> >
> >>> > Regards,
> >>> > Kasper
> >>> >
> >>
> >>
> >
>

Re: [DISCUSS] Concern regarding ColumnType.getJavaEquivalentClass()

Posted by Kasper Sørensen <i....@gmail.com>.
I was going through our issue list, and realized that changing ColumnType
to an interface instead of a enum would also have other benefits. For
example, it would make solving this issue a lot easier:

https://issues.apache.org/jira/browse/METAMODEL-8 -  Further details added
to ColumnType



2014-04-04 7:52 GMT+02:00 Kasper Sørensen <i....@gmail.com>:
>
> I did post my patch just now, which converts ColumnType to an interface:
https://reviews.apache.org/r/20028/
>
> Would be interesting if someone can check it out and give it some
thought. Not a trivial change for backwards compatibility I think, but
might be the best from an API standpoint.
>
>
> 2014-04-01 20:15 GMT+02:00 Kasper Sørensen <i.am.kasper.sorensen@gmail.com
>:
>
>> Hi Ankit,
>>
>> I think it is fair to assume that a CLOB will always be convertible to a
String - if memory on the client permits at least. Normally a java.sql.Clob
provides a getCharacterStream() method which returns a Reader. By setting
the system property 'metamodel.jdbc.convert.lobs' to "true" we offer the
service of eagerly reading that Reader to the end and returning a String.
The good thing about that is that the client avoids issues of the DataSet
is closed or even sometimes just moving the cursor to the next record may
invalidate the Reader. In other words: With the LOB conversion we offer a
way to make the MetaModel Row objects useable even though the connection to
the statement/resultset/connection is closed. It's of course optional.
>>
>> The same idea is applied to BLOBs. We offer to automatically open the
java.sql.Blob InputStream and load it into a byte-array.
>>
>> In my opinion the fact that something is a CLOB or a VARCHAR is a
different type of information than whether it's values are represented as
java.sql.Clob or String objects. The information on the column type is
valuable in it's own right and I would not want us to "make believe" that a
column which is really a CLOB is shown as a VARCHAR.
>>
>> Now that we're on the topic, we might even also consider if using the
JDBC data type names is even appropriate as a default behaviour. I mean,
we're acting in the Java world and a type like VARCHAR or TINYINT etc. is
only used in SQL, not in native Java apps. Other types of datastores have
different data types and I guess many of the datastores only have simple
data types like String, Integer, Date etc. We could choose to go for a
non-enum solution for the ColumnType and thereby also provide more
flexibility. In the CSV module we could thus provide just a single
ColumnType, e.g.:
>>
>>   public static final ColumnType CSV_COLUMN_TYPE = new
ColumnTypeImpl("String", String.class);
>>
>> Read the example constructor arguments as this: Name would be "String"
(or maybe "Text" is more appropriate for "text files"?) and expected Java
class is String.class
>>
>> In the JDBC module we could have other column types available, e.g.:
>>
>>   public static final ColumnType VARCHAR = new ColumnTypeImpl("VARCHAR",
String.class);
>>   public static final ColumnType CLOB = new ColumnTypeImpl("CLOB",
java.sql.Clob.class);
>>   public static final ColumnType CLOB_CONVERTED = new
ColumnTypeImpl("CLOB", String.class);
>>   public static final ColumnType BLOB = new ColumnTypeImpl("BLOB",
java.sql.Blob.class);
>>   public static final ColumnType BLOB_CONVERTED = new
ColumnTypeImpl("BLOB", byte[].class);
>>
>> (and there you then have my suggested solution for solving the Clob
issue - to use a String for the name and a Class for the expected Java type)
>>
>> I can try to make a patch for this solution. That would sort of validate
if it will work or not. But first ... what do you guys think?
>>
>> Only pitfall I see is that it breaks backwards compatibility. I think we
can retain most of the compile-time compatibility, but not runtime
compatibility (as in "drop in a new jar without recompile"). Also, old
objects serialized with the old enums would not be properly deserialized
into using these constants. Maybe that can be solved in the
LegacyDeserializationObjectInputStream [1] somehow.
>>
>> Best regards,
>> Kasper
>>
>>
>> [1] Described here:
http://wiki.apache.org/metamodel/MigratingFromEobjectsMetaModel
>>
>>
>> 2014-04-01 10:28 GMT+02:00 Ankit Kumar <ak...@gmail.com>:
>>
>>> Hi Kasper,
>>>
>>> Thanks for initiating this discussion as we are hit by this in MM right
now.
>>>
>>> Thinking just a bit from a client of MM perspective, I guess the CLOB
could
>>> contain a Object/File/Image/???/Large String text, so while getting
CLOB's
>>> from the real database we still can check the content and if it smells
more
>>> like a String then we return the VARCHAR otherwise an Object. May be
this
>>> is not the neatest solution but I guess for a client this provides the
>>> flexibility needed.
>>>
>>> An additional point worth mentioning with respect to the above remark,
>>> playing with different databases we see some databases allow large text
to
>>> be stored in VARCHAR kind of fields whereas some are still having
limits of
>>> 4000 characters like Oracle. So when using MM as the abstraction layer
>>> above the databases this feature might be quite handy for a client.
>>>
>>> Excuse me if I sound a bit too biased here.
>>>
>>> Regards
>>> Ankit
>>>
>>>
>>> On Tue, Mar 25, 2014 at 3:54 PM, Kasper Sørensen <
>>> i.am.kasper.sorensen@gmail.com> wrote:
>>>
>>> > Hi all,
>>> >
>>> > I just came across a potential issue in MetaModel's design and want to
>>> > share it and maybe start thinking of ways to fix or work around it...
>>> >
>>> > In our ColumnType enum we have the method getJavaEquivalentClass()
which is
>>> > used to tell which java type to expect when querying a particular
column.
>>> > For instance, of I query a VARCHAR column, I can expect a
java.lang.String
>>> > value.
>>> >
>>> > Now the trouble is that in our JDBC module we have a system property
which
>>> > allows for eager loading of BLOBs and CLOBs so that they are
automatically
>>> > read into byte-arrays and Strings respectively. This is a great
utility
>>> > because otherwise the user has to do a lot of tedious work with
>>> > inputstreams etc which in most cases isn't particularly useful - in
most
>>> > cases you just want the byte[ ] or String.
>>> >
>>> > Now the trouble is that if you turn that system property on, you get
>>> > Strings or byte-arrays but the column type is still CLOB/BLOB and that
>>> > means that the "expected equivalent Java class" is still
java.sql.Clob or
>>> > java.sql.Blob! If you build your code to expect that, then it will
>>> > eventually break because you get a String or a byte-array instead.
>>> >
>>> > How to make this consistent? I can think of a few ways, but none that
I
>>> > really love:
>>> >
>>> > 1) Probably the easiest way to do it is to let the JDBC datacontext
give
>>> > the columns other ColumnTypes. But that sorta doesn't feel good now
that
>>> > the "real" datatype is CLOB, that we will then call it e.g. VARCHAR.
>>> >
>>> > 2) We can remove the getEquivalentJavaClass() from ColumnType and
instead
>>> > make it a direct member of the Column class. This will break backwards
>>> > compatibility of the API.
>>> >
>>> > 3) We can make ColumnType an interface or class instead of a enum.
Then the
>>> > behaviour can simply be plugged in by the JDBC DataContext. I do like
this
>>> > approach the best for many reasons, but it has the downside that it
also
>>> > breaks backwards compatibility of the API and that there will no
longer be
>>> > an enumerable and finite list of ColumnTypes. Maybe we could
alleviate that
>>> > problem by ALSO having an enum with the typical implementations or
>>> > something like that.
>>> >
>>> > Maybe there are other solutions that I didn't think of.
>>> >
>>> > Regards,
>>> > Kasper
>>> >
>>
>>
>

Re: [DISCUSS] Concern regarding ColumnType.getJavaEquivalentClass()

Posted by Kasper Sørensen <i....@gmail.com>.
I did post my patch just now, which converts ColumnType to an interface:
https://reviews.apache.org/r/20028/

Would be interesting if someone can check it out and give it some thought.
Not a trivial change for backwards compatibility I think, but might be the
best from an API standpoint.


2014-04-01 20:15 GMT+02:00 Kasper Sørensen <i....@gmail.com>:

> Hi Ankit,
>
> I think it is fair to assume that a CLOB will always be convertible to a
> String - if memory on the client permits at least. Normally a java.sql.Clob
> provides a getCharacterStream() method which returns a Reader. By setting
> the system property 'metamodel.jdbc.convert.lobs' to "true" we offer the
> service of eagerly reading that Reader to the end and returning a String.
> The good thing about that is that the client avoids issues of the DataSet
> is closed or even sometimes just moving the cursor to the next record may
> invalidate the Reader. In other words: With the LOB conversion we offer a
> way to make the MetaModel Row objects useable even though the connection to
> the statement/resultset/connection is closed. It's of course optional.
>
> The same idea is applied to BLOBs. We offer to automatically open the
> java.sql.Blob InputStream and load it into a byte-array.
>
> In my opinion the fact that something is a CLOB or a VARCHAR is a
> different type of information than whether it's values are represented as
> java.sql.Clob or String objects. The information on the column type is
> valuable in it's own right and I would not want us to "make believe" that a
> column which is *really *a CLOB is shown as a VARCHAR.
>
> Now that we're on the topic, we might even also consider if using the JDBC
> data type names is even appropriate as a default behaviour. I mean, we're
> acting in the Java world and a type like VARCHAR or TINYINT etc. is only
> used in SQL, not in native Java apps. Other types of datastores have
> different data types and I guess many of the datastores only have simple
> data types like String, Integer, Date etc. We could choose to go for a
> non-enum solution for the ColumnType and thereby also provide more
> flexibility. In the CSV module we could thus provide just a single
> ColumnType, e.g.:
>
>   public static final ColumnType CSV_COLUMN_TYPE = new
> ColumnTypeImpl("String", String.class);
>
> Read the example constructor arguments as this: Name would be "String" (or
> maybe "Text" is more appropriate for "text files"?) and expected Java class
> is String.class
>
> In the JDBC module we could have other column types available, e.g.:
>
>   public static final ColumnType VARCHAR = new ColumnTypeImpl("VARCHAR",
> String.class);
>   public static final ColumnType CLOB = new ColumnTypeImpl("CLOB",
> java.sql.Clob.class);
>   public static final ColumnType CLOB_CONVERTED = new
> ColumnTypeImpl("CLOB", String.class);
>   public static final ColumnType BLOB = new ColumnTypeImpl("BLOB",
> java.sql.Blob.class);
>   public static final ColumnType BLOB_CONVERTED = new
> ColumnTypeImpl("BLOB", byte[].class);
>
> (and there you then have my suggested solution for solving the Clob issue
> - to use a String for the name and a Class for the expected Java type)
>
> I can try to make a patch for this solution. That would sort of validate
> if it will work or not. But first ... what do you guys think?
>
> Only pitfall I see is that it breaks backwards compatibility. I think we
> can retain most of the compile-time compatibility, but not runtime
> compatibility (as in "drop in a new jar without recompile"). Also, old
> objects serialized with the old enums would not be properly deserialized
> into using these constants. Maybe that can be solved in
> the LegacyDeserializationObjectInputStream [1] somehow.
>
> Best regards,
> Kasper
>
>
> [1] Described here:
> http://wiki.apache.org/metamodel/MigratingFromEobjectsMetaModel
>
>
> 2014-04-01 10:28 GMT+02:00 Ankit Kumar <ak...@gmail.com>:
>
> Hi Kasper,
>>
>> Thanks for initiating this discussion as we are hit by this in MM right
>> now.
>>
>> Thinking just a bit from a client of MM perspective, I guess the CLOB
>> could
>> contain a Object/File/Image/???/Large String text, so while getting CLOB's
>> from the real database we still can check the content and if it smells
>> more
>> like a String then we return the VARCHAR otherwise an Object. May be this
>> is not the neatest solution but I guess for a client this provides the
>> flexibility needed.
>>
>> An additional point worth mentioning with respect to the above remark,
>> playing with different databases we see some databases allow large text to
>> be stored in VARCHAR kind of fields whereas some are still having limits
>> of
>> 4000 characters like Oracle. So when using MM as the abstraction layer
>> above the databases this feature might be quite handy for a client.
>>
>> Excuse me if I sound a bit too biased here.
>>
>> Regards
>> Ankit
>>
>>
>> On Tue, Mar 25, 2014 at 3:54 PM, Kasper Sørensen <
>> i.am.kasper.sorensen@gmail.com> wrote:
>>
>> > Hi all,
>> >
>> > I just came across a potential issue in MetaModel's design and want to
>> > share it and maybe start thinking of ways to fix or work around it...
>> >
>> > In our ColumnType enum we have the method getJavaEquivalentClass()
>> which is
>> > used to tell which java type to expect when querying a particular
>> column.
>> > For instance, of I query a VARCHAR column, I can expect a
>> java.lang.String
>> > value.
>> >
>> > Now the trouble is that in our JDBC module we have a system property
>> which
>> > allows for eager loading of BLOBs and CLOBs so that they are
>> automatically
>> > read into byte-arrays and Strings respectively. This is a great utility
>> > because otherwise the user has to do a lot of tedious work with
>> > inputstreams etc which in most cases isn't particularly useful - in most
>> > cases you just want the byte[ ] or String.
>> >
>> > Now the trouble is that if you turn that system property on, you get
>> > Strings or byte-arrays but the column type is still CLOB/BLOB and that
>> > means that the "expected equivalent Java class" is still java.sql.Clob
>> or
>> > java.sql.Blob! If you build your code to expect that, then it will
>> > eventually break because you get a String or a byte-array instead.
>> >
>> > How to make this consistent? I can think of a few ways, but none that I
>> > really love:
>> >
>> > 1) Probably the easiest way to do it is to let the JDBC datacontext give
>> > the columns other ColumnTypes. But that sorta doesn't feel good now that
>> > the "real" datatype is CLOB, that we will then call it e.g. VARCHAR.
>> >
>> > 2) We can remove the getEquivalentJavaClass() from ColumnType and
>> instead
>> > make it a direct member of the Column class. This will break backwards
>> > compatibility of the API.
>> >
>> > 3) We can make ColumnType an interface or class instead of a enum. Then
>> the
>> > behaviour can simply be plugged in by the JDBC DataContext. I do like
>> this
>> > approach the best for many reasons, but it has the downside that it also
>> > breaks backwards compatibility of the API and that there will no longer
>> be
>> > an enumerable and finite list of ColumnTypes. Maybe we could alleviate
>> that
>> > problem by ALSO having an enum with the typical implementations or
>> > something like that.
>> >
>> > Maybe there are other solutions that I didn't think of.
>> >
>> > Regards,
>> > Kasper
>> >
>>
>
>

Re: [DISCUSS] Concern regarding ColumnType.getJavaEquivalentClass()

Posted by Kasper Sørensen <i....@gmail.com>.
Hi Ankit,

I think it is fair to assume that a CLOB will always be convertible to a
String - if memory on the client permits at least. Normally a java.sql.Clob
provides a getCharacterStream() method which returns a Reader. By setting
the system property 'metamodel.jdbc.convert.lobs' to "true" we offer the
service of eagerly reading that Reader to the end and returning a String.
The good thing about that is that the client avoids issues of the DataSet
is closed or even sometimes just moving the cursor to the next record may
invalidate the Reader. In other words: With the LOB conversion we offer a
way to make the MetaModel Row objects useable even though the connection to
the statement/resultset/connection is closed. It's of course optional.

The same idea is applied to BLOBs. We offer to automatically open the
java.sql.Blob InputStream and load it into a byte-array.

In my opinion the fact that something is a CLOB or a VARCHAR is a different
type of information than whether it's values are represented as
java.sql.Clob or String objects. The information on the column type is
valuable in it's own right and I would not want us to "make believe" that a
column which is *really *a CLOB is shown as a VARCHAR.

Now that we're on the topic, we might even also consider if using the JDBC
data type names is even appropriate as a default behaviour. I mean, we're
acting in the Java world and a type like VARCHAR or TINYINT etc. is only
used in SQL, not in native Java apps. Other types of datastores have
different data types and I guess many of the datastores only have simple
data types like String, Integer, Date etc. We could choose to go for a
non-enum solution for the ColumnType and thereby also provide more
flexibility. In the CSV module we could thus provide just a single
ColumnType, e.g.:

  public static final ColumnType CSV_COLUMN_TYPE = new
ColumnTypeImpl("String", String.class);

Read the example constructor arguments as this: Name would be "String" (or
maybe "Text" is more appropriate for "text files"?) and expected Java class
is String.class

In the JDBC module we could have other column types available, e.g.:

  public static final ColumnType VARCHAR = new ColumnTypeImpl("VARCHAR",
String.class);
  public static final ColumnType CLOB = new ColumnTypeImpl("CLOB",
java.sql.Clob.class);
  public static final ColumnType CLOB_CONVERTED = new
ColumnTypeImpl("CLOB", String.class);
  public static final ColumnType BLOB = new ColumnTypeImpl("BLOB",
java.sql.Blob.class);
  public static final ColumnType BLOB_CONVERTED = new
ColumnTypeImpl("BLOB", byte[].class);

(and there you then have my suggested solution for solving the Clob issue -
to use a String for the name and a Class for the expected Java type)

I can try to make a patch for this solution. That would sort of validate if
it will work or not. But first ... what do you guys think?

Only pitfall I see is that it breaks backwards compatibility. I think we
can retain most of the compile-time compatibility, but not runtime
compatibility (as in "drop in a new jar without recompile"). Also, old
objects serialized with the old enums would not be properly deserialized
into using these constants. Maybe that can be solved in
the LegacyDeserializationObjectInputStream [1] somehow.

Best regards,
Kasper


[1] Described here:
http://wiki.apache.org/metamodel/MigratingFromEobjectsMetaModel


2014-04-01 10:28 GMT+02:00 Ankit Kumar <ak...@gmail.com>:

> Hi Kasper,
>
> Thanks for initiating this discussion as we are hit by this in MM right
> now.
>
> Thinking just a bit from a client of MM perspective, I guess the CLOB could
> contain a Object/File/Image/???/Large String text, so while getting CLOB's
> from the real database we still can check the content and if it smells more
> like a String then we return the VARCHAR otherwise an Object. May be this
> is not the neatest solution but I guess for a client this provides the
> flexibility needed.
>
> An additional point worth mentioning with respect to the above remark,
> playing with different databases we see some databases allow large text to
> be stored in VARCHAR kind of fields whereas some are still having limits of
> 4000 characters like Oracle. So when using MM as the abstraction layer
> above the databases this feature might be quite handy for a client.
>
> Excuse me if I sound a bit too biased here.
>
> Regards
> Ankit
>
>
> On Tue, Mar 25, 2014 at 3:54 PM, Kasper Sørensen <
> i.am.kasper.sorensen@gmail.com> wrote:
>
> > Hi all,
> >
> > I just came across a potential issue in MetaModel's design and want to
> > share it and maybe start thinking of ways to fix or work around it...
> >
> > In our ColumnType enum we have the method getJavaEquivalentClass() which
> is
> > used to tell which java type to expect when querying a particular column.
> > For instance, of I query a VARCHAR column, I can expect a
> java.lang.String
> > value.
> >
> > Now the trouble is that in our JDBC module we have a system property
> which
> > allows for eager loading of BLOBs and CLOBs so that they are
> automatically
> > read into byte-arrays and Strings respectively. This is a great utility
> > because otherwise the user has to do a lot of tedious work with
> > inputstreams etc which in most cases isn't particularly useful - in most
> > cases you just want the byte[ ] or String.
> >
> > Now the trouble is that if you turn that system property on, you get
> > Strings or byte-arrays but the column type is still CLOB/BLOB and that
> > means that the "expected equivalent Java class" is still java.sql.Clob or
> > java.sql.Blob! If you build your code to expect that, then it will
> > eventually break because you get a String or a byte-array instead.
> >
> > How to make this consistent? I can think of a few ways, but none that I
> > really love:
> >
> > 1) Probably the easiest way to do it is to let the JDBC datacontext give
> > the columns other ColumnTypes. But that sorta doesn't feel good now that
> > the "real" datatype is CLOB, that we will then call it e.g. VARCHAR.
> >
> > 2) We can remove the getEquivalentJavaClass() from ColumnType and instead
> > make it a direct member of the Column class. This will break backwards
> > compatibility of the API.
> >
> > 3) We can make ColumnType an interface or class instead of a enum. Then
> the
> > behaviour can simply be plugged in by the JDBC DataContext. I do like
> this
> > approach the best for many reasons, but it has the downside that it also
> > breaks backwards compatibility of the API and that there will no longer
> be
> > an enumerable and finite list of ColumnTypes. Maybe we could alleviate
> that
> > problem by ALSO having an enum with the typical implementations or
> > something like that.
> >
> > Maybe there are other solutions that I didn't think of.
> >
> > Regards,
> > Kasper
> >
>

Re: [DISCUSS] Concern regarding ColumnType.getJavaEquivalentClass()

Posted by Ankit Kumar <ak...@gmail.com>.
Hi Kasper,

Thanks for initiating this discussion as we are hit by this in MM right now.

Thinking just a bit from a client of MM perspective, I guess the CLOB could
contain a Object/File/Image/???/Large String text, so while getting CLOB's
from the real database we still can check the content and if it smells more
like a String then we return the VARCHAR otherwise an Object. May be this
is not the neatest solution but I guess for a client this provides the
flexibility needed.

An additional point worth mentioning with respect to the above remark,
playing with different databases we see some databases allow large text to
be stored in VARCHAR kind of fields whereas some are still having limits of
4000 characters like Oracle. So when using MM as the abstraction layer
above the databases this feature might be quite handy for a client.

Excuse me if I sound a bit too biased here.

Regards
Ankit


On Tue, Mar 25, 2014 at 3:54 PM, Kasper Sørensen <
i.am.kasper.sorensen@gmail.com> wrote:

> Hi all,
>
> I just came across a potential issue in MetaModel's design and want to
> share it and maybe start thinking of ways to fix or work around it...
>
> In our ColumnType enum we have the method getJavaEquivalentClass() which is
> used to tell which java type to expect when querying a particular column.
> For instance, of I query a VARCHAR column, I can expect a java.lang.String
> value.
>
> Now the trouble is that in our JDBC module we have a system property which
> allows for eager loading of BLOBs and CLOBs so that they are automatically
> read into byte-arrays and Strings respectively. This is a great utility
> because otherwise the user has to do a lot of tedious work with
> inputstreams etc which in most cases isn't particularly useful - in most
> cases you just want the byte[ ] or String.
>
> Now the trouble is that if you turn that system property on, you get
> Strings or byte-arrays but the column type is still CLOB/BLOB and that
> means that the "expected equivalent Java class" is still java.sql.Clob or
> java.sql.Blob! If you build your code to expect that, then it will
> eventually break because you get a String or a byte-array instead.
>
> How to make this consistent? I can think of a few ways, but none that I
> really love:
>
> 1) Probably the easiest way to do it is to let the JDBC datacontext give
> the columns other ColumnTypes. But that sorta doesn't feel good now that
> the "real" datatype is CLOB, that we will then call it e.g. VARCHAR.
>
> 2) We can remove the getEquivalentJavaClass() from ColumnType and instead
> make it a direct member of the Column class. This will break backwards
> compatibility of the API.
>
> 3) We can make ColumnType an interface or class instead of a enum. Then the
> behaviour can simply be plugged in by the JDBC DataContext. I do like this
> approach the best for many reasons, but it has the downside that it also
> breaks backwards compatibility of the API and that there will no longer be
> an enumerable and finite list of ColumnTypes. Maybe we could alleviate that
> problem by ALSO having an enum with the typical implementations or
> something like that.
>
> Maybe there are other solutions that I didn't think of.
>
> Regards,
> Kasper
>

Re: [DISCUSS] Concern regarding ColumnType.getJavaEquivalentClass()

Posted by Kasper Sørensen <i....@gmail.com>.
OK, I'll arrange a vote around suggestion #3, seeing that this is so far
the thing that most people seem to like.


2014-05-01 1:32 GMT+02:00 Henry Saputra <he...@gmail.com>:

> My take is since we are in incubator let us take advantage of making
> breaking API changes. Clients already have to change code to cope with
> the new package naming based on Apache.
>
> I like approach #3, looks cleaner and support pluggability to the
> context for different implementation of ColumnType interface.
>
> - Henry
>
> On Tue, Mar 25, 2014 at 7:54 AM, Kasper Sørensen
> <i....@gmail.com> wrote:
> > Hi all,
> >
> > I just came across a potential issue in MetaModel's design and want to
> > share it and maybe start thinking of ways to fix or work around it...
> >
> > In our ColumnType enum we have the method getJavaEquivalentClass() which
> is
> > used to tell which java type to expect when querying a particular column.
> > For instance, of I query a VARCHAR column, I can expect a
> java.lang.String
> > value.
> >
> > Now the trouble is that in our JDBC module we have a system property
> which
> > allows for eager loading of BLOBs and CLOBs so that they are
> automatically
> > read into byte-arrays and Strings respectively. This is a great utility
> > because otherwise the user has to do a lot of tedious work with
> > inputstreams etc which in most cases isn't particularly useful - in most
> > cases you just want the byte[ ] or String.
> >
> > Now the trouble is that if you turn that system property on, you get
> > Strings or byte-arrays but the column type is still CLOB/BLOB and that
> > means that the "expected equivalent Java class" is still java.sql.Clob or
> > java.sql.Blob! If you build your code to expect that, then it will
> > eventually break because you get a String or a byte-array instead.
> >
> > How to make this consistent? I can think of a few ways, but none that I
> > really love:
> >
> > 1) Probably the easiest way to do it is to let the JDBC datacontext give
> > the columns other ColumnTypes. But that sorta doesn't feel good now that
> > the "real" datatype is CLOB, that we will then call it e.g. VARCHAR.
> >
> > 2) We can remove the getEquivalentJavaClass() from ColumnType and instead
> > make it a direct member of the Column class. This will break backwards
> > compatibility of the API.
> >
> > 3) We can make ColumnType an interface or class instead of a enum. Then
> the
> > behaviour can simply be plugged in by the JDBC DataContext. I do like
> this
> > approach the best for many reasons, but it has the downside that it also
> > breaks backwards compatibility of the API and that there will no longer
> be
> > an enumerable and finite list of ColumnTypes. Maybe we could alleviate
> that
> > problem by ALSO having an enum with the typical implementations or
> > something like that.
> >
> > Maybe there are other solutions that I didn't think of.
> >
> > Regards,
> > Kasper
>

Re: [DISCUSS] Concern regarding ColumnType.getJavaEquivalentClass()

Posted by Henry Saputra <he...@gmail.com>.
My take is since we are in incubator let us take advantage of making
breaking API changes. Clients already have to change code to cope with
the new package naming based on Apache.

I like approach #3, looks cleaner and support pluggability to the
context for different implementation of ColumnType interface.

- Henry

On Tue, Mar 25, 2014 at 7:54 AM, Kasper Sørensen
<i....@gmail.com> wrote:
> Hi all,
>
> I just came across a potential issue in MetaModel's design and want to
> share it and maybe start thinking of ways to fix or work around it...
>
> In our ColumnType enum we have the method getJavaEquivalentClass() which is
> used to tell which java type to expect when querying a particular column.
> For instance, of I query a VARCHAR column, I can expect a java.lang.String
> value.
>
> Now the trouble is that in our JDBC module we have a system property which
> allows for eager loading of BLOBs and CLOBs so that they are automatically
> read into byte-arrays and Strings respectively. This is a great utility
> because otherwise the user has to do a lot of tedious work with
> inputstreams etc which in most cases isn't particularly useful - in most
> cases you just want the byte[ ] or String.
>
> Now the trouble is that if you turn that system property on, you get
> Strings or byte-arrays but the column type is still CLOB/BLOB and that
> means that the "expected equivalent Java class" is still java.sql.Clob or
> java.sql.Blob! If you build your code to expect that, then it will
> eventually break because you get a String or a byte-array instead.
>
> How to make this consistent? I can think of a few ways, but none that I
> really love:
>
> 1) Probably the easiest way to do it is to let the JDBC datacontext give
> the columns other ColumnTypes. But that sorta doesn't feel good now that
> the "real" datatype is CLOB, that we will then call it e.g. VARCHAR.
>
> 2) We can remove the getEquivalentJavaClass() from ColumnType and instead
> make it a direct member of the Column class. This will break backwards
> compatibility of the API.
>
> 3) We can make ColumnType an interface or class instead of a enum. Then the
> behaviour can simply be plugged in by the JDBC DataContext. I do like this
> approach the best for many reasons, but it has the downside that it also
> breaks backwards compatibility of the API and that there will no longer be
> an enumerable and finite list of ColumnTypes. Maybe we could alleviate that
> problem by ALSO having an enum with the typical implementations or
> something like that.
>
> Maybe there are other solutions that I didn't think of.
>
> Regards,
> Kasper