You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by mathieu lacage <ma...@alcmeon.com> on 2012/01/27 19:39:53 UTC

DataImportHandler fails silently

hi,

I have setup my solr installation to run with jetty and I am trying to
import an sqlite database in the solr index. I have setup a jdbc sqlite
driver:

<dataConfig>
  <dataSource type="JdbcDataSource" driver="org.sqlite.JDBC"
url="jdbc:sqlite:/home/mathieu/data/final.db"/>
  <document name="document">
    <entity name="item" query="select id,thread_title from content">
      <field column="ID" name="id" />
      <field column="THREAD_TITLE" name="thread_title" />
    </entity>
  </document>
</dataConfig>

The schema:
 <fields>
   <field name="id" type="int" indexed="true" stored="true" required="true"
/>
   <field name="thread_title" type="text" indexed="true" stored="true"/>
</fields>
 <uniqueKey>id</uniqueKey>
 <defaultSearchField>thread_title</defaultSearchField>


I kickstart the import process with
"wget http://localhost:8080/solr/dataimport?command=full-import"

It seems to work but the following command reports that only 499 documents
were indexed (yes, there are many more documents in my database):

"wget http://localhost:8080/solr/dataimport?command=status"

and the logs seem to imply that the import is finished:

INFO: Read dataimport.properties
27-Jan-2012 19:37:17 org.apache.solr.handler.dataimport.SolrWriter persist
INFO: Wrote last indexed time to dataimport.properties
27-Jan-2012 19:37:17 org.apache.solr.handler.dataimport.DocBuilder execute
INFO: Time taken = 0:0:1.52

I am at a loss. What can I do to debug this further ? Help of any kind
would be most welcome.
Mathieu
-- 
Mathieu Lacage <ma...@alcmeon.com>

Re: DataImportHandler fails silently

Posted by Erik Hatcher <er...@gmail.com>.
On Jan 28, 2012, at 09:02 , mathieu lacage wrote:
> This deserves an entry in
> http://wiki.apache.org/solr/DataImportHandlerFaqwhich I would have
> updated but it is immutable. *hint to those who have
> edit powers there*

You can make yourself a wiki account and then edit the page.  An account is required to edit pages.

Re: DataImportHandler fails silently

Posted by mathieu lacage <ma...@alcmeon.com>.
On Sat, Jan 28, 2012 at 10:35 AM, mathieu lacage <mathieu.lacage@alcmeon.com
> wrote:

>
> (I have tried two different sqlite jdbc drivers so, I doubt it could
> be a problem there, but, who knows).
>

I eventually screamed really loud when I read the source code of the sqlite
jdbc drivers: they interpret the jdbcDataSource attribute batchSize as a
hard limit on the number of results to return. The default is 500.

QED.

I am not very familiar with the details of the expected semantcs if jdbc
drivers so, I do not know whether or not this is a bug in there or in
JdbcDataSource.

What I know, though, is that if you want to use sqlite with JdbcDataSource,
you better set batchSize=0 as such:
  <dataSource type="JdbcDataSource" driver="org.sqlite.JDBC"
url="jdbc:sqlite:/home/mathieu/data/final.db" batchSize="0"/>

This deserves an entry in
http://wiki.apache.org/solr/DataImportHandlerFaqwhich I would have
updated but it is immutable. *hint to those who have
edit powers there*

This is a sucky weekend.

Mathieu
-- 
Mathieu Lacage <ma...@alcmeon.com>

Re: DataImportHandler fails silently

Posted by mathieu lacage <ma...@alcmeon.com>.
On 1/28/12, mathieu lacage <ma...@alcmeon.com> wrote:
>
> Le 28 janv. 2012 à 05:17, Lance Norskog <go...@gmail.com> a écrit :
>
>> Do all of the documents have unique id fields?
>
> yes.

I have debugged this further with
http://localhost:8080/solr/admin/dataimport.jsp?handler=/dataimport

The returned xml file when I ask for verbose information tells me that
it stopped importing at document #501:
<lst name="document#500"><str>----------- row #1-------------</str>
<int name="id">4992</int>
<str name="thread_title">o/c cg kelle quelle ° ne pas
depasée?</str><str>---------------------------------------------</str>
</lst>
<lst name="document#501"/></lst>

I have changed the start row to see if it was not id 4993 that had a
problem but it is imported correctly when I specify another start row.
i.e., everything stops with no error whatsoever at an empty document
#501, regardless of the underlying db id column. I can't look at the
sql query that is sent to sqlite because, that is not a daemon so, I
wonder what I could look into to debug this.

(I have tried two different sqlite jdbc drivers so, I doubt it could
be a problem there, but, who knows).

Mathieu
-- 
Mathieu Lacage <ma...@alcmeon.com>

Re: DataImportHandler fails silently

Posted by mathieu lacage <ma...@alcmeon.com>.
Le 28 janv. 2012 à 05:17, Lance Norskog <go...@gmail.com> a écrit :

> Do all of the documents have unique id fields?

yes.


> 
> On Fri, Jan 27, 2012 at 10:44 AM, mathieu lacage
> <ma...@alcmeon.com> wrote:
>> On Fri, Jan 27, 2012 at 7:39 PM, mathieu lacage
>> <ma...@alcmeon.com>wrote:
>> 
>>> 
>>> It seems to work but the following command reports that only 499 documents
>>> were indexed (yes, there are many more documents in my database):
>>> 
>> 
>> And before anyone asks:
>> <lst name="statusMessages">
>> <str name="Total Requests made to DataSource">1</str>
>> <str name="Total Rows Fetched">499</str>
>> <str name="Total Documents Skipped">0</str>
>> <str name="Full Dump Started">2012-01-27 19:37:16</str>
>> <str name="">Indexing completed. Added/Updated: 499 documents. Deleted 0
>> documents.</str>
>> <str name="Committed">2012-01-27 19:37:17</str>
>> <str name="Optimized">2012-01-27 19:37:17</str>
>> <str name="Total Documents Processed">499</str>
>> <str name="Time taken ">0:0:1.52</str>
>> </lst>
>> 
>> 
>> --
>> Mathieu Lacage <ma...@alcmeon.com>
> 
> 
> 
> -- 
> Lance Norskog
> goksron@gmail.com

Re: DataImportHandler fails silently

Posted by Lance Norskog <go...@gmail.com>.
Do all of the documents have unique id fields?

On Fri, Jan 27, 2012 at 10:44 AM, mathieu lacage
<ma...@alcmeon.com> wrote:
> On Fri, Jan 27, 2012 at 7:39 PM, mathieu lacage
> <ma...@alcmeon.com>wrote:
>
>>
>> It seems to work but the following command reports that only 499 documents
>> were indexed (yes, there are many more documents in my database):
>>
>
> And before anyone asks:
> <lst name="statusMessages">
> <str name="Total Requests made to DataSource">1</str>
> <str name="Total Rows Fetched">499</str>
> <str name="Total Documents Skipped">0</str>
> <str name="Full Dump Started">2012-01-27 19:37:16</str>
> <str name="">Indexing completed. Added/Updated: 499 documents. Deleted 0
> documents.</str>
> <str name="Committed">2012-01-27 19:37:17</str>
> <str name="Optimized">2012-01-27 19:37:17</str>
> <str name="Total Documents Processed">499</str>
> <str name="Time taken ">0:0:1.52</str>
> </lst>
>
>
> --
> Mathieu Lacage <ma...@alcmeon.com>



-- 
Lance Norskog
goksron@gmail.com

Re: DataImportHandler fails silently

Posted by mathieu lacage <ma...@alcmeon.com>.
On Fri, Jan 27, 2012 at 7:39 PM, mathieu lacage
<ma...@alcmeon.com>wrote:

>
> It seems to work but the following command reports that only 499 documents
> were indexed (yes, there are many more documents in my database):
>

And before anyone asks:
<lst name="statusMessages">
<str name="Total Requests made to DataSource">1</str>
<str name="Total Rows Fetched">499</str>
<str name="Total Documents Skipped">0</str>
<str name="Full Dump Started">2012-01-27 19:37:16</str>
<str name="">Indexing completed. Added/Updated: 499 documents. Deleted 0
documents.</str>
<str name="Committed">2012-01-27 19:37:17</str>
<str name="Optimized">2012-01-27 19:37:17</str>
<str name="Total Documents Processed">499</str>
<str name="Time taken ">0:0:1.52</str>
</lst>


-- 
Mathieu Lacage <ma...@alcmeon.com>