You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Stefano Bortoli <s....@gmail.com> on 2016/03/21 14:16:34 UTC

Oracle 11g number serialization: classcast problem

Hi squirrels,

I working on a flink job connecting to a Oracle DB. I started from the JDBC
example for Derby, and used the TupleTypeInfo to configure the fields of
the tuple as it is read.

The record of the example has 2 INT, 1 FLOAT and 2 VARCHAR. Apparently,
using Oracle, all the numbers are read as Double, causing a ClassCast
exception. Of course I can fix it by changing the TupleTypeInfo, but I
wonder whether there is some assumption for Oracle and Numbers.

Thanks a lot for your support!

saluti,
Stefano

Re: Oracle 11g number serialization: classcast problem

Posted by Stefano Bortoli <s....@gmail.com>.
Thanks for the clarification.

case java.sql.Types.DECIMAL:
                    reuse.setField(resultSet.getBigDecimal(pos +
1).doubleValue(), pos);
                    break;

this causes both a nullpointer on null values as well as a double class
cast exception when serializing the tuple.

For the moment, because we have mostly a 'reading problem', we modified the
inputformat to get strings and we output them as CSV.

case java.sql.Types.NUMERIC:
                    if(resultSet.getBigDecimal(pos + 1)==null)
reuse.setField("", pos);
                    else reuse.setField(resultSet.getBigDecimal(pos +
1).toPlainString(), pos);
                    break;

Another problem is that the reading is sequential and does not allow for
splits. When we get a working version that is satisfying, we'll share the
contribution. Our idea is to enable the execution of Sqoop scripts using
Flink. We are testing it on a Oracle table of 11 billion records, but we
did not get through a complete run. We are just at first prototype level,
so there is surely some work to do. :-)

saluti,
Stefano



2016-03-23 10:38 GMT+01:00 Chesnay Schepler <ch...@apache.org>:

> On 23.03.2016 10:04, Stefano Bortoli wrote:
>
> I had a look at the JDBC input format, and it does indeed interpret
> BIGDECIMAL and NUMERIC values as double.
>
> This sounds more like a bug actually. Feel free to open a JIRA for this.
>
> The status of the JDBCInputFormat is not adequate for real world use case,
> as for example does not deal with NULL values.
>
> This was already reported in FLINK-3471. To clarify, for NULL fields the
> format fails only if the type is either DECIMAL, NUMERIC, DATE, TIME,
> TIMESTAMP, or SQLXML. Other types should default to 0, empty string or
> false; which actually isn't intended behavior, but caused by JDBC itself.
>
> Defaulting to some value seems the only way to deal with this issue, since
> we can't store null in a Tuple.
>
> I wasn't sure what value DATE, TIME, TIMESTAMP and SQLXML should default
> to, as such i didn't change them yet. I also just dislike the fact that a
> straight copy from A to B will not produce the same table.
>
> However, with little effort we fixed few stuff and now we are getting to
> something usable. We are actually trying to do something a-la sqoop,
> therefore given a boundary query, we create the splits, and then assign it
> to the input format to read the database with configurable parallelism. We
> are still working on it. If we get to something stable and working, we'll
> gladly share it.
>
> saluti,
> Stefano
>
> 2016-03-22 15:46 GMT+01:00 Chesnay Schepler <ch...@apache.org>:
>
>> The JDBC formats don't make any assumption as to what DB backend is used.
>>
>> A JDBC float in general is returned as a double, since that was the
>> recommended mapping i found when i wrote the formats.
>>
>> Is the INT returned as a double as well?
>>
>> Note: The (runtime) output type is in no way connected to the TypeInfo
>> you pass when constructing the format.
>>
>>
>> On 21.03.2016 14:16, Stefano Bortoli wrote:
>>
>>> Hi squirrels,
>>>
>>> I working on a flink job connecting to a Oracle DB. I started from the
>>> JDBC example for Derby, and used the TupleTypeInfo to configure the fields
>>> of the tuple as it is read.
>>>
>>> The record of the example has 2 INT, 1 FLOAT and 2 VARCHAR. Apparently,
>>> using Oracle, all the numbers are read as Double, causing a ClassCast
>>> exception. Of course I can fix it by changing the TupleTypeInfo, but I
>>> wonder whether there is some assumption for Oracle and Numbers.
>>>
>>> Thanks a lot for your support!
>>>
>>> saluti,
>>> Stefano
>>>
>>
>>
>
>

Re: Oracle 11g number serialization: classcast problem

Posted by Chesnay Schepler <ch...@apache.org>.
On 23.03.2016 10:38, Chesnay Schepler wrote:
> On 23.03.2016 10:04, Stefano Bortoli wrote:
>> I had a look at the JDBC input format, and it does indeed interpret 
>> BIGDECIMAL and NUMERIC values as double.
> This sounds more like a bug actually. Feel free to open a JIRA for this.
Actually, this was done so that every field is a primitive or String.
>> The status of the JDBCInputFormat is not adequate for real world use 
>> case, as for example does not deal with NULL values.
> This was already reported in FLINK-3471. To clarify, for NULL fields 
> the format fails only if the type is either DECIMAL, NUMERIC, DATE, 
> TIME, TIMESTAMP, or SQLXML. Other types should default to 0, empty 
> string or false; which actually isn't intended behavior, but caused by 
> JDBC itself.
>
> Defaulting to some value seems the only way to deal with this issue, 
> since we can't store null in a Tuple.
>
> I wasn't sure what value DATE, TIME, TIMESTAMP and SQLXML should 
> default to, as such i didn't change them yet. I also just dislike the 
> fact that a straight copy from A to B will not produce the same table.
>> However, with little effort we fixed few stuff and now we are getting 
>> to something usable. We are actually trying to do something a-la 
>> sqoop, therefore given a boundary query, we create the splits, and 
>> then assign it to the input format to read the database with 
>> configurable parallelism. We are still working on it. If we get to 
>> something stable and working, we'll gladly share it.
>>
>> saluti,
>> Stefano
>>
>> 2016-03-22 15:46 GMT+01:00 Chesnay Schepler <chesnay@apache.org 
>> <ma...@apache.org>>:
>>
>>     The JDBC formats don't make any assumption as to what DB backend
>>     is used.
>>
>>     A JDBC float in general is returned as a double, since that was
>>     the recommended mapping i found when i wrote the formats.
>>
>>     Is the INT returned as a double as well?
>>
>>     Note: The (runtime) output type is in no way connected to the
>>     TypeInfo you pass when constructing the format.
>>
>>
>>     On 21.03.2016 14 <tel:21.03.2016%2014>:16, Stefano Bortoli wrote:
>>
>>         Hi squirrels,
>>
>>         I working on a flink job connecting to a Oracle DB. I started
>>         from the JDBC example for Derby, and used the TupleTypeInfo
>>         to configure the fields of the tuple as it is read.
>>
>>         The record of the example has 2 INT, 1 FLOAT and 2 VARCHAR.
>>         Apparently, using Oracle, all the numbers are read as Double,
>>         causing a ClassCast exception. Of course I can fix it by
>>         changing the TupleTypeInfo, but I wonder whether there is
>>         some assumption for Oracle and Numbers.
>>
>>         Thanks a lot for your support!
>>
>>         saluti,
>>         Stefano
>>
>>
>>
>


Re: Oracle 11g number serialization: classcast problem

Posted by Chesnay Schepler <ch...@apache.org>.
On 23.03.2016 10:04, Stefano Bortoli wrote:
> I had a look at the JDBC input format, and it does indeed interpret 
> BIGDECIMAL and NUMERIC values as double.
This sounds more like a bug actually. Feel free to open a JIRA for this.
> The status of the JDBCInputFormat is not adequate for real world use 
> case, as for example does not deal with NULL values.
This was already reported in FLINK-3471. To clarify, for NULL fields the 
format fails only if the type is either DECIMAL, NUMERIC, DATE, TIME, 
TIMESTAMP, or SQLXML. Other types should default to 0, empty string or 
false; which actually isn't intended behavior, but caused by JDBC itself.

Defaulting to some value seems the only way to deal with this issue, 
since we can't store null in a Tuple.

I wasn't sure what value DATE, TIME, TIMESTAMP and SQLXML should default 
to, as such i didn't change them yet. I also just dislike the fact that 
a straight copy from A to B will not produce the same table.
> However, with little effort we fixed few stuff and now we are getting 
> to something usable. We are actually trying to do something a-la 
> sqoop, therefore given a boundary query, we create the splits, and 
> then assign it to the input format to read the database with 
> configurable parallelism. We are still working on it. If we get to 
> something stable and working, we'll gladly share it.
>
> saluti,
> Stefano
>
> 2016-03-22 15:46 GMT+01:00 Chesnay Schepler <chesnay@apache.org 
> <ma...@apache.org>>:
>
>     The JDBC formats don't make any assumption as to what DB backend
>     is used.
>
>     A JDBC float in general is returned as a double, since that was
>     the recommended mapping i found when i wrote the formats.
>
>     Is the INT returned as a double as well?
>
>     Note: The (runtime) output type is in no way connected to the
>     TypeInfo you pass when constructing the format.
>
>
>     On 21.03.2016 14 <tel:21.03.2016%2014>:16, Stefano Bortoli wrote:
>
>         Hi squirrels,
>
>         I working on a flink job connecting to a Oracle DB. I started
>         from the JDBC example for Derby, and used the TupleTypeInfo to
>         configure the fields of the tuple as it is read.
>
>         The record of the example has 2 INT, 1 FLOAT and 2 VARCHAR.
>         Apparently, using Oracle, all the numbers are read as Double,
>         causing a ClassCast exception. Of course I can fix it by
>         changing the TupleTypeInfo, but I wonder whether there is some
>         assumption for Oracle and Numbers.
>
>         Thanks a lot for your support!
>
>         saluti,
>         Stefano
>
>
>


Re: Oracle 11g number serialization: classcast problem

Posted by Stefano Bortoli <s....@gmail.com>.
I had a look at the JDBC input format, and it does indeed interpret
BIGDECIMAL and NUMERIC values as double. The status of the JDBCInputFormat
is not adequate for real world use case, as for example does not deal with
NULL values.

However, with little effort we fixed few stuff and now we are getting to
something usable. We are actually trying to do something a-la sqoop,
therefore given a boundary query, we create the splits, and then assign it
to the input format to read the database with configurable parallelism. We
are still working on it. If we get to something stable and working, we'll
gladly share it.

saluti,
Stefano

2016-03-22 15:46 GMT+01:00 Chesnay Schepler <ch...@apache.org>:

> The JDBC formats don't make any assumption as to what DB backend is used.
>
> A JDBC float in general is returned as a double, since that was the
> recommended mapping i found when i wrote the formats.
>
> Is the INT returned as a double as well?
>
> Note: The (runtime) output type is in no way connected to the TypeInfo you
> pass when constructing the format.
>
>
> On 21.03.2016 14:16, Stefano Bortoli wrote:
>
>> Hi squirrels,
>>
>> I working on a flink job connecting to a Oracle DB. I started from the
>> JDBC example for Derby, and used the TupleTypeInfo to configure the fields
>> of the tuple as it is read.
>>
>> The record of the example has 2 INT, 1 FLOAT and 2 VARCHAR. Apparently,
>> using Oracle, all the numbers are read as Double, causing a ClassCast
>> exception. Of course I can fix it by changing the TupleTypeInfo, but I
>> wonder whether there is some assumption for Oracle and Numbers.
>>
>> Thanks a lot for your support!
>>
>> saluti,
>> Stefano
>>
>
>

Re: Oracle 11g number serialization: classcast problem

Posted by Chesnay Schepler <ch...@apache.org>.
The JDBC formats don't make any assumption as to what DB backend is used.

A JDBC float in general is returned as a double, since that was the 
recommended mapping i found when i wrote the formats.

Is the INT returned as a double as well?

Note: The (runtime) output type is in no way connected to the TypeInfo 
you pass when constructing the format.

On 21.03.2016 14:16, Stefano Bortoli wrote:
> Hi squirrels,
>
> I working on a flink job connecting to a Oracle DB. I started from the 
> JDBC example for Derby, and used the TupleTypeInfo to configure the 
> fields of the tuple as it is read.
>
> The record of the example has 2 INT, 1 FLOAT and 2 VARCHAR. 
> Apparently, using Oracle, all the numbers are read as Double, causing 
> a ClassCast exception. Of course I can fix it by changing the 
> TupleTypeInfo, but I wonder whether there is some assumption for 
> Oracle and Numbers.
>
> Thanks a lot for your support!
>
> saluti,
> Stefano