You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by mathieu lacage <ma...@alcmeon.com> on 2012/01/27 19:39:53 UTC
DataImportHandler fails silently
hi,
I have setup my solr installation to run with jetty and I am trying to
import an sqlite database in the solr index. I have setup a jdbc sqlite
driver:
<dataConfig>
<dataSource type="JdbcDataSource" driver="org.sqlite.JDBC"
url="jdbc:sqlite:/home/mathieu/data/final.db"/>
<document name="document">
<entity name="item" query="select id,thread_title from content">
<field column="ID" name="id" />
<field column="THREAD_TITLE" name="thread_title" />
</entity>
</document>
</dataConfig>
The schema:
<fields>
<field name="id" type="int" indexed="true" stored="true" required="true"
/>
<field name="thread_title" type="text" indexed="true" stored="true"/>
</fields>
<uniqueKey>id</uniqueKey>
<defaultSearchField>thread_title</defaultSearchField>
I kickstart the import process with
"wget http://localhost:8080/solr/dataimport?command=full-import"
It seems to work but the following command reports that only 499 documents
were indexed (yes, there are many more documents in my database):
"wget http://localhost:8080/solr/dataimport?command=status"
and the logs seem to imply that the import is finished:
INFO: Read dataimport.properties
27-Jan-2012 19:37:17 org.apache.solr.handler.dataimport.SolrWriter persist
INFO: Wrote last indexed time to dataimport.properties
27-Jan-2012 19:37:17 org.apache.solr.handler.dataimport.DocBuilder execute
INFO: Time taken = 0:0:1.52
I am at a loss. What can I do to debug this further ? Help of any kind
would be most welcome.
Mathieu
--
Mathieu Lacage <ma...@alcmeon.com>
Re: DataImportHandler fails silently
Posted by Erik Hatcher <er...@gmail.com>.
On Jan 28, 2012, at 09:02 , mathieu lacage wrote:
> This deserves an entry in
> http://wiki.apache.org/solr/DataImportHandlerFaqwhich I would have
> updated but it is immutable. *hint to those who have
> edit powers there*
You can make yourself a wiki account and then edit the page. An account is required to edit pages.
Re: DataImportHandler fails silently
Posted by mathieu lacage <ma...@alcmeon.com>.
On Sat, Jan 28, 2012 at 10:35 AM, mathieu lacage <mathieu.lacage@alcmeon.com
> wrote:
>
> (I have tried two different sqlite jdbc drivers so, I doubt it could
> be a problem there, but, who knows).
>
I eventually screamed really loud when I read the source code of the sqlite
jdbc drivers: they interpret the jdbcDataSource attribute batchSize as a
hard limit on the number of results to return. The default is 500.
QED.
I am not very familiar with the details of the expected semantcs if jdbc
drivers so, I do not know whether or not this is a bug in there or in
JdbcDataSource.
What I know, though, is that if you want to use sqlite with JdbcDataSource,
you better set batchSize=0 as such:
<dataSource type="JdbcDataSource" driver="org.sqlite.JDBC"
url="jdbc:sqlite:/home/mathieu/data/final.db" batchSize="0"/>
This deserves an entry in
http://wiki.apache.org/solr/DataImportHandlerFaqwhich I would have
updated but it is immutable. *hint to those who have
edit powers there*
This is a sucky weekend.
Mathieu
--
Mathieu Lacage <ma...@alcmeon.com>
Re: DataImportHandler fails silently
Posted by mathieu lacage <ma...@alcmeon.com>.
On 1/28/12, mathieu lacage <ma...@alcmeon.com> wrote:
>
> Le 28 janv. 2012 à 05:17, Lance Norskog <go...@gmail.com> a écrit :
>
>> Do all of the documents have unique id fields?
>
> yes.
I have debugged this further with
http://localhost:8080/solr/admin/dataimport.jsp?handler=/dataimport
The returned xml file when I ask for verbose information tells me that
it stopped importing at document #501:
<lst name="document#500"><str>----------- row #1-------------</str>
<int name="id">4992</int>
<str name="thread_title">o/c cg kelle quelle ° ne pas
depasée?</str><str>---------------------------------------------</str>
</lst>
<lst name="document#501"/></lst>
I have changed the start row to see if it was not id 4993 that had a
problem but it is imported correctly when I specify another start row.
i.e., everything stops with no error whatsoever at an empty document
#501, regardless of the underlying db id column. I can't look at the
sql query that is sent to sqlite because, that is not a daemon so, I
wonder what I could look into to debug this.
(I have tried two different sqlite jdbc drivers so, I doubt it could
be a problem there, but, who knows).
Mathieu
--
Mathieu Lacage <ma...@alcmeon.com>
Re: DataImportHandler fails silently
Posted by mathieu lacage <ma...@alcmeon.com>.
Le 28 janv. 2012 à 05:17, Lance Norskog <go...@gmail.com> a écrit :
> Do all of the documents have unique id fields?
yes.
>
> On Fri, Jan 27, 2012 at 10:44 AM, mathieu lacage
> <ma...@alcmeon.com> wrote:
>> On Fri, Jan 27, 2012 at 7:39 PM, mathieu lacage
>> <ma...@alcmeon.com>wrote:
>>
>>>
>>> It seems to work but the following command reports that only 499 documents
>>> were indexed (yes, there are many more documents in my database):
>>>
>>
>> And before anyone asks:
>> <lst name="statusMessages">
>> <str name="Total Requests made to DataSource">1</str>
>> <str name="Total Rows Fetched">499</str>
>> <str name="Total Documents Skipped">0</str>
>> <str name="Full Dump Started">2012-01-27 19:37:16</str>
>> <str name="">Indexing completed. Added/Updated: 499 documents. Deleted 0
>> documents.</str>
>> <str name="Committed">2012-01-27 19:37:17</str>
>> <str name="Optimized">2012-01-27 19:37:17</str>
>> <str name="Total Documents Processed">499</str>
>> <str name="Time taken ">0:0:1.52</str>
>> </lst>
>>
>>
>> --
>> Mathieu Lacage <ma...@alcmeon.com>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
Re: DataImportHandler fails silently
Posted by Lance Norskog <go...@gmail.com>.
Do all of the documents have unique id fields?
On Fri, Jan 27, 2012 at 10:44 AM, mathieu lacage
<ma...@alcmeon.com> wrote:
> On Fri, Jan 27, 2012 at 7:39 PM, mathieu lacage
> <ma...@alcmeon.com>wrote:
>
>>
>> It seems to work but the following command reports that only 499 documents
>> were indexed (yes, there are many more documents in my database):
>>
>
> And before anyone asks:
> <lst name="statusMessages">
> <str name="Total Requests made to DataSource">1</str>
> <str name="Total Rows Fetched">499</str>
> <str name="Total Documents Skipped">0</str>
> <str name="Full Dump Started">2012-01-27 19:37:16</str>
> <str name="">Indexing completed. Added/Updated: 499 documents. Deleted 0
> documents.</str>
> <str name="Committed">2012-01-27 19:37:17</str>
> <str name="Optimized">2012-01-27 19:37:17</str>
> <str name="Total Documents Processed">499</str>
> <str name="Time taken ">0:0:1.52</str>
> </lst>
>
>
> --
> Mathieu Lacage <ma...@alcmeon.com>
--
Lance Norskog
goksron@gmail.com
Re: DataImportHandler fails silently
Posted by mathieu lacage <ma...@alcmeon.com>.
On Fri, Jan 27, 2012 at 7:39 PM, mathieu lacage
<ma...@alcmeon.com>wrote:
>
> It seems to work but the following command reports that only 499 documents
> were indexed (yes, there are many more documents in my database):
>
And before anyone asks:
<lst name="statusMessages">
<str name="Total Requests made to DataSource">1</str>
<str name="Total Rows Fetched">499</str>
<str name="Total Documents Skipped">0</str>
<str name="Full Dump Started">2012-01-27 19:37:16</str>
<str name="">Indexing completed. Added/Updated: 499 documents. Deleted 0
documents.</str>
<str name="Committed">2012-01-27 19:37:17</str>
<str name="Optimized">2012-01-27 19:37:17</str>
<str name="Total Documents Processed">499</str>
<str name="Time taken ">0:0:1.52</str>
</lst>
--
Mathieu Lacage <ma...@alcmeon.com>