You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tom Evans <te...@googlemail.com> on 2014/10/03 13:41:07 UTC

Determining which field caused a document to not be imported

Hi all

I recently rewrote our SOLR 4.8 dataimport to read from a set of
denormalised DB tables, in an attempt to increase full indexing speed.
When I tried it out however, indexing broke telling me that
"java.lang.Long cannot be cast to java.lang.Integer" (full stack
below, with the document elided). From googling, this tends to be some
field that is being selected out as a long, where it should probably
be cast as a string.

Unfortunately, our documents have some 400+ fields and over 100
entities; is there another way to determine which field could not be
cast from Long to Integer other than disabling each integer field in
turn?

Cheers

Tom


Exception while processing: variant document :
SolrInputDocument(fields: [(removed)]):
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.ClassCastException: java.lang.Long cannot be cast to
java.lang.Integer
at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:63)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:246)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:477)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:503)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:503)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:503)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:331)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:239)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:464)
Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast
to java.lang.Integer
at java.lang.Integer.compareTo(Integer.java:52)
at java.util.TreeMap.getEntry(TreeMap.java:346)
at java.util.TreeMap.get(TreeMap.java:273)
at org.apache.solr.handler.dataimport.SortedMapBackedCache.iterator(SortedMapBackedCache.java:147)
at org.apache.solr.handler.dataimport.DIHCacheSupport.getIdCacheData(DIHCacheSupport.java:179)
at org.apache.solr.handler.dataimport.DIHCacheSupport.getCacheData(DIHCacheSupport.java:145)
at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:129)
at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
... 10 more

Re: Determining which field caused a document to not be imported

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/3/2014 8:24 AM, Tom Evans wrote:
> On Fri, Oct 3, 2014 at 3:13 PM, Tom Evans <te...@googlemail.com> wrote:
>> I tried converting the selected data to SIGNED INTEGER, eg
>> "CONVERT(country_id, SIGNED INTEGER) AS country_id", but this did not
>> have the desired effect.
> 
> However, changing them to be cast to CHAR changed the error message -
> "java.lang.Integer cannot be cast to java.lang.String".
> 
> I guess this is saying that the type of the map key must match the
> type of the key coming from the parent entity (which is logical), so I
> guess my question is - what do SQL type do I need to select out to get
> a java.lang.Integer, to match what the map is expecting?

I still need to digest the stacktrace.  What database software are you
connecting to, what version of their JDBC driver do you use, and what
are your typical column types in the DB?

I'm not very familiar with DIH code, and when I've looked in the past,
I've found it very hard to follow ... but later tonight I will check the
code locations mentioned in your stacktrace to see whether it's possible
to log which field is producing the message.  Hopefully we can get you
something.  Ideally it will log all available information, which means
hopefully it can see definitions in the DIH config file like entity,
dataSource, table, etc.

Thanks,
Shawn


Re: Determining which field caused a document to not be imported

Posted by Tom Evans <te...@googlemail.com>.
On Fri, Oct 3, 2014 at 3:24 PM, Tom Evans <te...@googlemail.com> wrote:
> On Fri, Oct 3, 2014 at 3:13 PM, Tom Evans <te...@googlemail.com> wrote:
>> I tried converting the selected data to SIGNED INTEGER, eg
>> "CONVERT(country_id, SIGNED INTEGER) AS country_id", but this did not
>> have the desired effect.
>
> However, changing them to be cast to CHAR changed the error message -
> "java.lang.Integer cannot be cast to java.lang.String".
>
> I guess this is saying that the type of the map key must match the
> type of the key coming from the parent entity (which is logical), so I
> guess my question is - what do SQL type do I need to select out to get
> a java.lang.Integer, to match what the map is expecting?
>

I rewrote the query for the map, which was doing strange casts itself
(integer to integer casts). This then meant that the values from the
parent query were the same type as those in the map query, and no
funky casts are required anywhere.

However, I still don't have a way to determine which field is failing
when indexing fails like this, and it would be neat if I could
determine a way to do so for future debugging.

Cheers

Tom

Re: Determining which field caused a document to not be imported

Posted by Tom Evans <te...@googlemail.com>.
On Fri, Oct 3, 2014 at 3:13 PM, Tom Evans <te...@googlemail.com> wrote:
> I tried converting the selected data to SIGNED INTEGER, eg
> "CONVERT(country_id, SIGNED INTEGER) AS country_id", but this did not
> have the desired effect.

However, changing them to be cast to CHAR changed the error message -
"java.lang.Integer cannot be cast to java.lang.String".

I guess this is saying that the type of the map key must match the
type of the key coming from the parent entity (which is logical), so I
guess my question is - what do SQL type do I need to select out to get
a java.lang.Integer, to match what the map is expecting?

Cheers

Tom

Re: Determining which field caused a document to not be imported

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/3/2014 8:13 AM, Tom Evans wrote:
> Caused by: java.lang.ClassCastException: java.lang.Integer cannot be
> cast to java.lang.Long
> at java.lang.Long.compareTo(Long.java:50)
> at java.util.TreeMap.getEntry(TreeMap.java:346)
> at java.util.TreeMap.get(TreeMap.java:273)
> at org.apache.solr.handler.dataimport.SortedMapBackedCache.iterator(SortedMapBackedCache.java:147)
> at org.apache.solr.handler.dataimport.DIHCacheSupport.getIdCacheData(DIHCacheSupport.java:179)
> at org.apache.solr.handler.dataimport.DIHCacheSupport.getCacheData(DIHCacheSupport.java:145)
> at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:129)
> at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75)
> at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
> ... 10 more

Is it possible to temporarily remove the caching from the entity?  I
know that this will make performance suck, but I'm suggesting it only as
a troubleshooting step.  I'm wondering if maybe it's a problem in the
caching implementation and not the main DIH jdbc code.

Thanks,
Shawn


Re: Determining which field caused a document to not be imported

Posted by Tom Evans <te...@googlemail.com>.
On Fri, Oct 3, 2014 at 2:24 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> Can you give us the entire stacktrace, with complete details from any
> "caused by" sections?  Also, is this 4.8.0 or 4.8.1?
>

Thanks Shawn, this is SOLR 4.8.1 and here is the full traceback from the log:

95191 [Thread-21] INFO
org.apache.solr.update.processor.LogUpdateProcessor  – [products]
webapp=/products path=/dataimport-from-denorm
params={id=2148732&optimize=false&clean=false&indent=true&commit=true&verbose=false&command=full-import&debug=false&wt=json}
status=0 QTime=32 {} 0 32
95199 [Thread-21] ERROR
org.apache.solr.handler.dataimport.DataImporter  – Full Import
failed:java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.ClassCastException: java.lang.Integer cannot be cast to
java.lang.Long
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:278)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:464)
Caused by: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.ClassCastException: java.lang.Integer cannot be cast to
java.lang.Long
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:418)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:331)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:239)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.ClassCastException: java.lang.Integer cannot be cast to
java.lang.Long
at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:63)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:246)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:477)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:503)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:503)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:503)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
... 5 more
Caused by: java.lang.ClassCastException: java.lang.Integer cannot be
cast to java.lang.Long
at java.lang.Long.compareTo(Long.java:50)
at java.util.TreeMap.getEntry(TreeMap.java:346)
at java.util.TreeMap.get(TreeMap.java:273)
at org.apache.solr.handler.dataimport.SortedMapBackedCache.iterator(SortedMapBackedCache.java:147)
at org.apache.solr.handler.dataimport.DIHCacheSupport.getIdCacheData(DIHCacheSupport.java:179)
at org.apache.solr.handler.dataimport.DIHCacheSupport.getCacheData(DIHCacheSupport.java:145)
at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:129)
at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
... 10 more

95199 [Thread-21] INFO  org.apache.solr.update.UpdateHandler  – start rollback{}

I've tracked it down to a single entity now that selects some content
out of the database and then looks up other fields using that data
from sub-entities that have SortedMapBackedCache caching in use, but
I'm still not sure how to fix it.

Eg, the original entity selects out "country_id", which is then used
by this entity:

    <entity dataSource="products" name="country_lookup" query="
      SELECT
        lk_country.id AS xid,
        IF(LENGTH(english), CAST(english AS CHAR), description) AS country
      FROM lk_country
      INNER JOIN nl_strings ON lk_country.description_sid=nl_strings.id"
      cacheKey="xid"
      cacheLookup="product.country_id"
      cacheImpl="SortedMapBackedCache">
      <field column="country" name="country"/>
    </entity>

I tried converting the selected data to SIGNED INTEGER, eg
"CONVERT(country_id, SIGNED INTEGER) AS country_id", but this did not
have the desired effect.

The source database is mysql, the source column for "country_id" is
"`country_id` smallint(6) NOT NULL default '0'".

Again, I'm not 100% sure that it is even the "country" field that
causes this, there are several SortedMapBackedCache sub-entities (but
they are all analogous to this one).

Thanks in advance

Tom

Re: Determining which field caused a document to not be imported

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/3/2014 5:41 AM, Tom Evans wrote:
> I recently rewrote our SOLR 4.8 dataimport to read from a set of
> denormalised DB tables, in an attempt to increase full indexing speed.
> When I tried it out however, indexing broke telling me that
> "java.lang.Long cannot be cast to java.lang.Integer" (full stack
> below, with the document elided). From googling, this tends to be some
> field that is being selected out as a long, where it should probably
> be cast as a string.
> 
> Unfortunately, our documents have some 400+ fields and over 100
> entities; is there another way to determine which field could not be
> cast from Long to Integer other than disabling each integer field in
> turn?

Can you give us the entire stacktrace, with complete details from any
"caused by" sections?  Also, is this 4.8.0 or 4.8.1?

Thanks,
Shawn