You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Shigeki Kobayashi <sh...@g.softbank.co.jp> on 2012/08/20 10:21:24 UTC

Crawling MySQL with latest MySQL connector fails

Hi guys.


I am not sure if everyone has already noticed this, but this is to share an
experimental fact of using MySQL connectors to crawl MySQL data.

Using "AS" in Select queries in SeedQuery and DataQuery causes error
depending on versions of MySQL connectors.

Env:
- ManifoldCF0.5
- Solr3.6
- MySQL5.5

Example:

 SeedQuery:SELECT idfield AS $(IDCOLUMN) FROM documenttable

Error Message:
  Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes
around $(IDCOLUMN) variable, e.g. "$(IDCOLUMN)".

Cause of Error:
 MySQL connecors of over version 5.1 seem to have a bug that causes error
when you use "AS" in Select to put an alias for a column.

Versions of MySQL Connector:
 mysql-connector-java-5.0.8.jar  -> OK
 mysql-connector-java-5.1.18.jar -> No Good
 mysql-connector-java-5.1.21.jar -> No Good

Exception:
Using function  (e.g. sysdate() as) or fixed strings  (e.g. " fixed string"
as) followed by "as" does not cause error.

Regards,

Shigeki

Re: Crawling MySQL with latest MySQL connector fails

Posted by Karl Wright <da...@gmail.com>.
I checked in a fix for this ticket - please let me know if it allows
you to successfully crawl MySQL database tables.

Karl

On Mon, Aug 20, 2012 at 8:10 AM, Karl Wright <da...@gmail.com> wrote:
> There's some online chatter about this.  Apparently the JDBC 4.0
> specification was clarified in this regard, and MySQL's implementation
> follows the clarified meaning.  The recommended approach during this
> transition period is to allow the user to select which method they
> want to use to get the column name.  See CONNECTORS-509.
>
> Karl
>
> On Mon, Aug 20, 2012 at 8:00 AM, Karl Wright <da...@gmail.com> wrote:
>> Here's some additional info.
>>
>> The JDBC class "ResultSetMetaData" has two methods: getColumnName(),
>> and getColumnLabel().  For all supported databases, getColumnName()
>> returns the right thing, EXCEPT for MySQL, where you have to use
>> getColumnLabel() instead.
>>
>> I've abstracted the logic that does this in the main database classes
>> that underlie the framework, but for the JDBC Connector I tried to
>> make the connector be independent of any special logic, and instead
>> make it the responsibility of the query writer to know how to work
>> with their database.  Unfortunately, that strategy is failing when it
>> comes to MySQL because the JDBC driver is implemented in a way that is
>> inconsistent with the specification.
>>
>> So, we have two ways forward:
>>
>> (1) Change the logic to use getColumnLabel() always.  If we do this,
>> it will be necessary to test the JDBC connector against a PostgreSQL,
>> MySQL, and MSSQL database before we know what the effects are.  It is
>> possible everything will just work, but it is also possible that such
>> a change would break other people's jobs, and that would be no good.
>>
>> (2) Try to conditionalize the logic so that only for MySQL is
>> getColumnLabel() used.  This is less risky but results in messy code
>> that sooner or later would become unmaintainable.
>>
>> Karl
>>
>>
>> On Mon, Aug 20, 2012 at 6:22 AM, Karl Wright <da...@gmail.com> wrote:
>>> Hi Shigeki,
>>>
>>> This is critical functionality for ManifoldCF.  Quite a lot of
>>> ManifoldCF stuff won't work on MySQL if this is broken - not just
>>> crawling using the JDBC connector.  Are you successfully crawling with
>>> MySQL as the back-end?  If you are, that means that there is a way to
>>> do this right but the JDBC connector is not using it.
>>>
>>> I am testing with MySQL JDBC connector 5.1.18 here, which would
>>> indicate that that is the case.
>>>
>>> Could you open a ticket describing the problem, and I will look into
>>> this in some detail tonight?  Thanks,
>>> Karl
>>>
>>>
>>> On Mon, Aug 20, 2012 at 4:21 AM, Shigeki Kobayashi
>>> <sh...@g.softbank.co.jp> wrote:
>>>> Hi guys.
>>>>
>>>>
>>>> I am not sure if everyone has already noticed this, but this is to share an
>>>> experimental fact of using MySQL connectors to crawl MySQL data.
>>>>
>>>> Using "AS" in Select queries in SeedQuery and DataQuery causes error
>>>> depending on versions of MySQL connectors.
>>>>
>>>> Env:
>>>> - ManifoldCF0.5
>>>> - Solr3.6
>>>> - MySQL5.5
>>>>
>>>> Example:
>>>>
>>>>  SeedQuery:SELECT idfield AS $(IDCOLUMN) FROM documenttable
>>>>
>>>> Error Message:
>>>>   Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes around
>>>> $(IDCOLUMN) variable, e.g. "$(IDCOLUMN)".
>>>>
>>>> Cause of Error:
>>>>  MySQL connecors of over version 5.1 seem to have a bug that causes error
>>>> when you use "AS" in Select to put an alias for a column.
>>>>
>>>> Versions of MySQL Connector:
>>>>  mysql-connector-java-5.0.8.jar  -> OK
>>>>  mysql-connector-java-5.1.18.jar -> No Good
>>>>  mysql-connector-java-5.1.21.jar -> No Good
>>>>
>>>> Exception:
>>>> Using function  (e.g. sysdate() as) or fixed strings  (e.g. " fixed string"
>>>> as) followed by "as" does not cause error.
>>>>
>>>> Regards,
>>>>
>>>> Shigeki

Re: Crawling MySQL with latest MySQL connector fails

Posted by Karl Wright <da...@gmail.com>.
There's some online chatter about this.  Apparently the JDBC 4.0
specification was clarified in this regard, and MySQL's implementation
follows the clarified meaning.  The recommended approach during this
transition period is to allow the user to select which method they
want to use to get the column name.  See CONNECTORS-509.

Karl

On Mon, Aug 20, 2012 at 8:00 AM, Karl Wright <da...@gmail.com> wrote:
> Here's some additional info.
>
> The JDBC class "ResultSetMetaData" has two methods: getColumnName(),
> and getColumnLabel().  For all supported databases, getColumnName()
> returns the right thing, EXCEPT for MySQL, where you have to use
> getColumnLabel() instead.
>
> I've abstracted the logic that does this in the main database classes
> that underlie the framework, but for the JDBC Connector I tried to
> make the connector be independent of any special logic, and instead
> make it the responsibility of the query writer to know how to work
> with their database.  Unfortunately, that strategy is failing when it
> comes to MySQL because the JDBC driver is implemented in a way that is
> inconsistent with the specification.
>
> So, we have two ways forward:
>
> (1) Change the logic to use getColumnLabel() always.  If we do this,
> it will be necessary to test the JDBC connector against a PostgreSQL,
> MySQL, and MSSQL database before we know what the effects are.  It is
> possible everything will just work, but it is also possible that such
> a change would break other people's jobs, and that would be no good.
>
> (2) Try to conditionalize the logic so that only for MySQL is
> getColumnLabel() used.  This is less risky but results in messy code
> that sooner or later would become unmaintainable.
>
> Karl
>
>
> On Mon, Aug 20, 2012 at 6:22 AM, Karl Wright <da...@gmail.com> wrote:
>> Hi Shigeki,
>>
>> This is critical functionality for ManifoldCF.  Quite a lot of
>> ManifoldCF stuff won't work on MySQL if this is broken - not just
>> crawling using the JDBC connector.  Are you successfully crawling with
>> MySQL as the back-end?  If you are, that means that there is a way to
>> do this right but the JDBC connector is not using it.
>>
>> I am testing with MySQL JDBC connector 5.1.18 here, which would
>> indicate that that is the case.
>>
>> Could you open a ticket describing the problem, and I will look into
>> this in some detail tonight?  Thanks,
>> Karl
>>
>>
>> On Mon, Aug 20, 2012 at 4:21 AM, Shigeki Kobayashi
>> <sh...@g.softbank.co.jp> wrote:
>>> Hi guys.
>>>
>>>
>>> I am not sure if everyone has already noticed this, but this is to share an
>>> experimental fact of using MySQL connectors to crawl MySQL data.
>>>
>>> Using "AS" in Select queries in SeedQuery and DataQuery causes error
>>> depending on versions of MySQL connectors.
>>>
>>> Env:
>>> - ManifoldCF0.5
>>> - Solr3.6
>>> - MySQL5.5
>>>
>>> Example:
>>>
>>>  SeedQuery:SELECT idfield AS $(IDCOLUMN) FROM documenttable
>>>
>>> Error Message:
>>>   Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes around
>>> $(IDCOLUMN) variable, e.g. "$(IDCOLUMN)".
>>>
>>> Cause of Error:
>>>  MySQL connecors of over version 5.1 seem to have a bug that causes error
>>> when you use "AS" in Select to put an alias for a column.
>>>
>>> Versions of MySQL Connector:
>>>  mysql-connector-java-5.0.8.jar  -> OK
>>>  mysql-connector-java-5.1.18.jar -> No Good
>>>  mysql-connector-java-5.1.21.jar -> No Good
>>>
>>> Exception:
>>> Using function  (e.g. sysdate() as) or fixed strings  (e.g. " fixed string"
>>> as) followed by "as" does not cause error.
>>>
>>> Regards,
>>>
>>> Shigeki

Re: Crawling MySQL with latest MySQL connector fails

Posted by Karl Wright <da...@gmail.com>.
Here's some additional info.

The JDBC class "ResultSetMetaData" has two methods: getColumnName(),
and getColumnLabel().  For all supported databases, getColumnName()
returns the right thing, EXCEPT for MySQL, where you have to use
getColumnLabel() instead.

I've abstracted the logic that does this in the main database classes
that underlie the framework, but for the JDBC Connector I tried to
make the connector be independent of any special logic, and instead
make it the responsibility of the query writer to know how to work
with their database.  Unfortunately, that strategy is failing when it
comes to MySQL because the JDBC driver is implemented in a way that is
inconsistent with the specification.

So, we have two ways forward:

(1) Change the logic to use getColumnLabel() always.  If we do this,
it will be necessary to test the JDBC connector against a PostgreSQL,
MySQL, and MSSQL database before we know what the effects are.  It is
possible everything will just work, but it is also possible that such
a change would break other people's jobs, and that would be no good.

(2) Try to conditionalize the logic so that only for MySQL is
getColumnLabel() used.  This is less risky but results in messy code
that sooner or later would become unmaintainable.

Karl


On Mon, Aug 20, 2012 at 6:22 AM, Karl Wright <da...@gmail.com> wrote:
> Hi Shigeki,
>
> This is critical functionality for ManifoldCF.  Quite a lot of
> ManifoldCF stuff won't work on MySQL if this is broken - not just
> crawling using the JDBC connector.  Are you successfully crawling with
> MySQL as the back-end?  If you are, that means that there is a way to
> do this right but the JDBC connector is not using it.
>
> I am testing with MySQL JDBC connector 5.1.18 here, which would
> indicate that that is the case.
>
> Could you open a ticket describing the problem, and I will look into
> this in some detail tonight?  Thanks,
> Karl
>
>
> On Mon, Aug 20, 2012 at 4:21 AM, Shigeki Kobayashi
> <sh...@g.softbank.co.jp> wrote:
>> Hi guys.
>>
>>
>> I am not sure if everyone has already noticed this, but this is to share an
>> experimental fact of using MySQL connectors to crawl MySQL data.
>>
>> Using "AS" in Select queries in SeedQuery and DataQuery causes error
>> depending on versions of MySQL connectors.
>>
>> Env:
>> - ManifoldCF0.5
>> - Solr3.6
>> - MySQL5.5
>>
>> Example:
>>
>>  SeedQuery:SELECT idfield AS $(IDCOLUMN) FROM documenttable
>>
>> Error Message:
>>   Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes around
>> $(IDCOLUMN) variable, e.g. "$(IDCOLUMN)".
>>
>> Cause of Error:
>>  MySQL connecors of over version 5.1 seem to have a bug that causes error
>> when you use "AS" in Select to put an alias for a column.
>>
>> Versions of MySQL Connector:
>>  mysql-connector-java-5.0.8.jar  -> OK
>>  mysql-connector-java-5.1.18.jar -> No Good
>>  mysql-connector-java-5.1.21.jar -> No Good
>>
>> Exception:
>> Using function  (e.g. sysdate() as) or fixed strings  (e.g. " fixed string"
>> as) followed by "as" does not cause error.
>>
>> Regards,
>>
>> Shigeki

Re: Crawling MySQL with latest MySQL connector fails

Posted by Karl Wright <da...@gmail.com>.
Hi Shigeki,

This is critical functionality for ManifoldCF.  Quite a lot of
ManifoldCF stuff won't work on MySQL if this is broken - not just
crawling using the JDBC connector.  Are you successfully crawling with
MySQL as the back-end?  If you are, that means that there is a way to
do this right but the JDBC connector is not using it.

I am testing with MySQL JDBC connector 5.1.18 here, which would
indicate that that is the case.

Could you open a ticket describing the problem, and I will look into
this in some detail tonight?  Thanks,
Karl


On Mon, Aug 20, 2012 at 4:21 AM, Shigeki Kobayashi
<sh...@g.softbank.co.jp> wrote:
> Hi guys.
>
>
> I am not sure if everyone has already noticed this, but this is to share an
> experimental fact of using MySQL connectors to crawl MySQL data.
>
> Using "AS" in Select queries in SeedQuery and DataQuery causes error
> depending on versions of MySQL connectors.
>
> Env:
> - ManifoldCF0.5
> - Solr3.6
> - MySQL5.5
>
> Example:
>
>  SeedQuery:SELECT idfield AS $(IDCOLUMN) FROM documenttable
>
> Error Message:
>   Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes around
> $(IDCOLUMN) variable, e.g. "$(IDCOLUMN)".
>
> Cause of Error:
>  MySQL connecors of over version 5.1 seem to have a bug that causes error
> when you use "AS" in Select to put an alias for a column.
>
> Versions of MySQL Connector:
>  mysql-connector-java-5.0.8.jar  -> OK
>  mysql-connector-java-5.1.18.jar -> No Good
>  mysql-connector-java-5.1.21.jar -> No Good
>
> Exception:
> Using function  (e.g. sysdate() as) or fixed strings  (e.g. " fixed string"
> as) followed by "as" does not cause error.
>
> Regards,
>
> Shigeki