You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shawn Heisey <ap...@elyograg.org> on 2015/05/20 03:21:53 UTC

Problem with numeric "math" types and the dataimport handler

An unusual problem is happening with the DIH on a field that is an
unsigned BIGINT in the MySQL database.  This is Solr 4.9.1 without
SolrCloud, running on OpenJDK 7u79.

During actual import, everything is fine.  The problem comes when I
restart Solr and the transaction logs are replayed.  I get the following
exception for every document replayed:

WARN  - 2015-05-19 18:52:44.461;
org.apache.solr.update.UpdateLog$LogReplayer; REYPLAY_ERR: IOException
reading log
org.apache.solr.common.SolrException: ERROR: [doc=getty26025060] Error
adding field 'file_size'='java.math.BigInteger:5934053' msg=For input
string: "java.math.BigInteger:5934053"

I believe I need one of two things to solve this problem:

1) A connection parameter for the MySQL JDBC driver that will force the
use of java.lang.* objects and exclude the java.math.* classes.

2) Write the actual imported value into the transaction log rather than
include the class name in the string representation.  Testing shows that
the toString() method on BigInteger does *NOT* include the class name,
so I am confused about why the class name is being recorded in the
transaction log.

For the first solution, I've been looking for a MySQL connection
parameter to change the Java object types that get used, but so far I
haven't found one.  For the second, I should probably open an issue in
Jira, but I wanted to run it by everyone before taking that step.

I have another index (building from a different database) where this
isn't happening, because the MySQL column is *NOT* unsigned, which
causes the JDBC driver to use java.lang.Long instead of
java.math.BigInteger.

Thanks,
Shawn


Re: Problem with numeric "math" types and the dataimport handler

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/20/2015 12:06 AM, Shalin Shekhar Mangar wrote:
> Sounds similar to https://issues.apache.org/jira/browse/SOLR-6165 which I
> fixed in 4.10. Can you try a newer release?

I can't upgrade yet.  I am using a plugin that hasn't been verified
against anything newer than 4.9.  When a new version becomes available,
I will begin testing 5.x.

The patch does look like it will fix the issue perfectly ... so I am
very likely to patch 4.9.1 and build a custom war.

Thanks,
Shawn


Re: Problem with numeric "math" types and the dataimport handler

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/26/2015 2:37 PM, Shawn Heisey wrote:
> On 5/20/2015 12:06 AM, Shalin Shekhar Mangar wrote:
>> Sounds similar to https://issues.apache.org/jira/browse/SOLR-6165 which I
>> fixed in 4.10. Can you try a newer release?
> Looks like that didn't fix it.
>
> I applied the patch on SOLR-6165 to the lucene_solr_4_9_1 tag, built a
> new war, and when it was done, restarted Solr with that war.  The
> solr-impl version in the dashboard is now
>
>     4.9-SNAPSHOT 1680667 - solr - 2015-05-20 14:23:11
>
> After some importing with DIH and a Solr restart, this is the most
> recent error in the log:
>
> WARN  - 2015-05-26 14:28:09.289;
> org.apache.solr.update.UpdateLog$LogReplayer; REYPLAY_ERR: IOException
> reading log org.apache.solr.common.SolrException: ERROR:
> [doc=usatphotos084190] Error adding field
> 'did'='java.math.BigInteger:1214221' msg=For input string:
> "java.math.BigInteger:1214221"
>
> Looks like we'll need a new issue.  I'm not in a position right now to
> try a newer Solr version than 4.9.1.

Given the way that I use Solr, this is honestly not really a major
problem for me.  Within five minutes or so after DIH is done, my
transaction logs will only contain data indexed via SolrJ, so this
problem will be gone.

The reason I think it's worth fixing, assuming it's still a problem in
5.2: There are people that use DIH *exclusively* for indexing, and for
those people, this could become a real problem, because tlog replay
won't work.

Thanks,
Shawn


Re: Problem with numeric "math" types and the dataimport handler

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/20/2015 12:06 AM, Shalin Shekhar Mangar wrote:
> Sounds similar to https://issues.apache.org/jira/browse/SOLR-6165 which I
> fixed in 4.10. Can you try a newer release?

Looks like that didn't fix it.

I applied the patch on SOLR-6165 to the lucene_solr_4_9_1 tag, built a
new war, and when it was done, restarted Solr with that war.  The
solr-impl version in the dashboard is now


    4.9-SNAPSHOT 1680667 - solr - 2015-05-20 14:23:11

After some importing with DIH and a Solr restart, this is the most
recent error in the log:

WARN  - 2015-05-26 14:28:09.289;
org.apache.solr.update.UpdateLog$LogReplayer; REYPLAY_ERR: IOException
reading log org.apache.solr.common.SolrException: ERROR:
[doc=usatphotos084190] Error adding field
'did'='java.math.BigInteger:1214221' msg=For input string:
"java.math.BigInteger:1214221"

Looks like we'll need a new issue.  I'm not in a position right now to
try a newer Solr version than 4.9.1.

Thanks,
Shawn


Re: Problem with numeric "math" types and the dataimport handler

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Sounds similar to https://issues.apache.org/jira/browse/SOLR-6165 which I
fixed in 4.10. Can you try a newer release?

On Wed, May 20, 2015 at 6:51 AM, Shawn Heisey <ap...@elyograg.org> wrote:

> An unusual problem is happening with the DIH on a field that is an
> unsigned BIGINT in the MySQL database.  This is Solr 4.9.1 without
> SolrCloud, running on OpenJDK 7u79.
>
> During actual import, everything is fine.  The problem comes when I
> restart Solr and the transaction logs are replayed.  I get the following
> exception for every document replayed:
>
> WARN  - 2015-05-19 18:52:44.461;
> org.apache.solr.update.UpdateLog$LogReplayer; REYPLAY_ERR: IOException
> reading log
> org.apache.solr.common.SolrException: ERROR: [doc=getty26025060] Error
> adding field 'file_size'='java.math.BigInteger:5934053' msg=For input
> string: "java.math.BigInteger:5934053"
>
> I believe I need one of two things to solve this problem:
>
> 1) A connection parameter for the MySQL JDBC driver that will force the
> use of java.lang.* objects and exclude the java.math.* classes.
>
> 2) Write the actual imported value into the transaction log rather than
> include the class name in the string representation.  Testing shows that
> the toString() method on BigInteger does *NOT* include the class name,
> so I am confused about why the class name is being recorded in the
> transaction log.
>
> For the first solution, I've been looking for a MySQL connection
> parameter to change the Java object types that get used, but so far I
> haven't found one.  For the second, I should probably open an issue in
> Jira, but I wanted to run it by everyone before taking that step.
>
> I have another index (building from a different database) where this
> isn't happening, because the MySQL column is *NOT* unsigned, which
> causes the JDBC driver to use java.lang.Long instead of
> java.math.BigInteger.
>
> Thanks,
> Shawn
>
>


-- 
Regards,
Shalin Shekhar Mangar.