You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Spadez <ja...@hotmail.com> on 2012/11/17 23:49:30 UTC
Solr Delta Import Handler not working
Hi,
These are the exact steps that I have taken to try and get delta import
handler working. If I can provide any more information to help let me know.
I have literally spent the entire friday night and today on this and I throw
in the towel. Where have I gone wrong?
*Added this line to the solrconfig:*
/<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">/home/solr/data-config.xml</str>
</lst>
</requestHandler>/
*Then my data-config.xml looks like this:*
/<dataConfig>
<dataSource type="FileDataSource" />
<document>
<entity
name="document"
processor="FileListEntityProcessor"
baseDir="/var/lib/data"
fileName=".*.xml$"
recursive="false"
rootEntity="false"
dataSource="null">
<entity
processor="XPathEntityProcessor"
url="${document.fileAbsolutePath}"
useSolrAddSchema="true"
stream="true">
</entity>
</entity>
</document>
</dataConfig>/
*Then in my var/lib/data folder I have a data.xml file that looks like
this:*
/<add>
<doc>
<field name="id">123</field>
<field name="description">This is my long description</field>
<field name="company">Google</field>
<field name="location_name">England</field>
<field name="date">2007-12-31 22:29:59</field>
<field name="source">Google</field>
<field name="url">www.google.com</field>
<field name="latlng">45.17614,45.17614</field>
</doc>
</add>/
*Finally I then ran this command:*
/http://localhost:8080/solr/dataimport?command=delta-import&clean=false/
*And I get this result (failed):*
/<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst>
<lst name="initArgs">
<lst name="defaults">
<str name="config">/opt/solr/example/solr/conf/data-config.xml</str>
</lst>
</lst>
<str name="command">delta-import</str>
<str name="status">idle</str>
<str name="importResponse"/>
<lst name="statusMessages">
<str name="Time Elapsed">0:15:9.543</str>
<str name="Total Requests made to DataSource">0</str>
<str name="Total Rows Fetched">0</str>
<str name="Total Documents Processed">0</str>
<str name="Total Documents Skipped">0</str>
<str name="Delta Dump started">2012-11-17 17:32:56</str>
<str name="Identifying Delta">2012-11-17 17:32:56</str>
<str name="">*Indexing failed*. Rolled back all changes.</str>
<str name="Rolledback">2012-11-17 17:32:56</str>
</lst>
<str name="WARNING">
This response format is experimental. It is likely to change in the future.
</str>
</response>/
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Delta-Import-Handler-not-working-tp4020897.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Delta Import Handler not working
Posted by Lance Norskog <go...@gmail.com>.
| dataSource="null"
I think this should not be here. The datasource should default to the <dataSource> listing. And 'rootEntity=true' should be in the XPathEntityProcessor block, because you are adding each file as one document.
----- Original Message -----
| From: "Spadez" <ja...@hotmail.com>
| To: solr-user@lucene.apache.org
| Sent: Sunday, November 18, 2012 7:34:34 AM
| Subject: Re: Solr Delta Import Handler not working
|
| Update! Thank you to Lance for the help. Based on your suggestion I
| have
| fixed up a few things.
|
| *My Dataconfig now has the filename pattern fixed and root
| entity=true*
| /<dataConfig>
| <dataSource type="FileDataSource" />
| <document>
| <entity
| name="document"
| processor="FileListEntityProcessor"
| baseDir="/var/lib/employ"
| fileName="^.*\.xml$"
| recursive="false"
| rootEntity="true"
| dataSource="null">
| <entity
| processor="XPathEntityProcessor"
| url="${document.fileAbsolutePath}"
| useSolrAddSchema="true"
| stream="true">
| </entity>
| </entity>
| </document>
| </dataConfig>/
|
| *My data.xml has a corrected date format with "T":*
| /<add>
| <doc>
| <field name="id">123</field>
| <field name="title">Delta Import 2</field>
| <field name="description">This is my long description</field>
| <field name="truncated_description">This is</field>
|
| <field name="company">Google</field>
| <field name="location_name">England</field>
| <field name="date">2007-12-31T22:29:59</field>
| <field name="source">Google</field>
| <field name="url">www.google.com</field>
| <field name="latlng">45.17614,45.17614</field>
| </doc>
| </add>/
|
|
|
| --
| View this message in context:
| http://lucene.472066.n3.nabble.com/Solr-Delta-Import-Handler-not-working-tp4020897p4020925.html
| Sent from the Solr - User mailing list archive at Nabble.com.
|
Re: Solr Delta Import Handler not working
Posted by Spadez <ja...@hotmail.com>.
Update! Thank you to Lance for the help. Based on your suggestion I have
fixed up a few things.
*My Dataconfig now has the filename pattern fixed and root entity=true*
/<dataConfig>
<dataSource type="FileDataSource" />
<document>
<entity
name="document"
processor="FileListEntityProcessor"
baseDir="/var/lib/employ"
fileName="^.*\.xml$"
recursive="false"
rootEntity="true"
dataSource="null">
<entity
processor="XPathEntityProcessor"
url="${document.fileAbsolutePath}"
useSolrAddSchema="true"
stream="true">
</entity>
</entity>
</document>
</dataConfig>/
*My data.xml has a corrected date format with "T":*
/<add>
<doc>
<field name="id">123</field>
<field name="title">Delta Import 2</field>
<field name="description">This is my long description</field>
<field name="truncated_description">This is</field>
<field name="company">Google</field>
<field name="location_name">England</field>
<field name="date">2007-12-31T22:29:59</field>
<field name="source">Google</field>
<field name="url">www.google.com</field>
<field name="latlng">45.17614,45.17614</field>
</doc>
</add>/
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Delta-Import-Handler-not-working-tp4020897p4020925.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Delta Import Handler not working
Posted by Spadez <ja...@hotmail.com>.
Thank you for the reply. I will try your corrections. I realised that it
actually logs the error in the tomcat log and this is what it says:
/Nov 18, 2012 10:09:40 AM org.apache.catalina.startup.Catalina start
INFO: Server startup in 3005 ms
Nov 18, 2012 10:09:47 AM org.apache.solr.handler.dataimport.DataImporter
doDeltaImport
INFO: Starting Delta Import
Nov 18, 2012 10:09:47 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport
params={clean=false&command=delta-import} status=0 QTime=10
Nov 18, 2012 10:09:47 AM
org.apache.solr.handler.dataimport.SimplePropertiesWriter
readIndexerProperties
WARNING: Unable to read: dataimport.properties
Nov 18, 2012 10:09:47 AM org.apache.solr.handler.dataimport.DocBuilder
doDelta
INFO: Starting delta collection.
Nov 18, 2012 10:09:47 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Running ModifiedRowKey() for Entity: 80953050238262
Nov 18, 2012 10:09:47 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed ModifiedRowKey for Entity: 80953050238262 rows obtained : 0
Nov 18, 2012 10:09:47 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed DeletedRowKey for Entity: 80953050238262 rows obtained : 0
Nov 18, 2012 10:09:47 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed parentDeltaQuery for Entity: 80953050238262
Nov 18, 2012 10:09:47 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Running ModifiedRowKey() for Entity: document
Nov 18, 2012 10:09:47 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed ModifiedRowKey for Entity: document rows obtained : 0
Nov 18, 2012 10:09:47 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed DeletedRowKey for Entity: document rows obtained : 0
Nov 18, 2012 10:09:47 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed parentDeltaQuery for Entity: document
Nov 18, 2012 10:09:47 AM org.apache.solr.handler.dataimport.DocBuilder
doDelta
INFO: Delta Import completed successfully
Nov 18, 2012 10:09:47 AM org.apache.solr.handler.dataimport.DocBuilder
execute
INFO: Time taken = 0:0:0.37
Nov 18, 2012 10:09:47 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: {} 0 10/
Is seems to be saying it executed successfully? I dont see any data in my
SOLR though!
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Delta-Import-Handler-not-working-tp4020897p4020924.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Delta Import Handler not working
Posted by Lance Norskog <go...@gmail.com>.
I think this means the pattern did not match any files:
<str name="Total Rows Fetched">0</str>
The wiki example includes a '^' at the beginning of the filename pattern. This matches a complete line.
http://wiki.apache.org/solr/DataImportHandler#Transformers_Example
More:
Add rootEntity="true". It cannot hurt to be explicit.
The date format needs a 'T' instead of a space:
http://en.wikipedia.org/wiki/ISO_8601
Cheers!
----- Original Message -----
| From: "Spadez" <ja...@hotmail.com>
| To: solr-user@lucene.apache.org
| Sent: Saturday, November 17, 2012 2:49:30 PM
| Subject: Solr Delta Import Handler not working
|
| Hi,
|
| These are the exact steps that I have taken to try and get delta
| import
| handler working. If I can provide any more information to help let me
| know.
| I have literally spent the entire friday night and today on this and
| I throw
| in the towel. Where have I gone wrong?
|
| *Added this line to the solrconfig:*
| /<requestHandler name="/dataimport"
| class="org.apache.solr.handler.dataimport.DataImportHandler">
| <lst name="defaults">
| <str name="config">/home/solr/data-config.xml</str>
| </lst>
| </requestHandler>/
|
| *Then my data-config.xml looks like this:*
| /<dataConfig>
| <dataSource type="FileDataSource" />
| <document>
| <entity
| name="document"
| processor="FileListEntityProcessor"
| baseDir="/var/lib/data"
| fileName=".*.xml$"
| recursive="false"
| rootEntity="false"
| dataSource="null">
| <entity
| processor="XPathEntityProcessor"
| url="${document.fileAbsolutePath}"
| useSolrAddSchema="true"
| stream="true">
| </entity>
| </entity>
| </document>
| </dataConfig>/
|
| *Then in my var/lib/data folder I have a data.xml file that looks
| like
| this:*
| /<add>
| <doc>
| <field name="id">123</field>
| <field name="description">This is my long description</field>
| <field name="company">Google</field>
| <field name="location_name">England</field>
| <field name="date">2007-12-31 22:29:59</field>
| <field name="source">Google</field>
| <field name="url">www.google.com</field>
| <field name="latlng">45.17614,45.17614</field>
| </doc>
| </add>/
|
| *Finally I then ran this command:*
| /http://localhost:8080/solr/dataimport?command=delta-import&clean=false/
|
| *And I get this result (failed):*
| /<response>
| <lst name="responseHeader">
| <int name="status">0</int>
| <int name="QTime">1</int>
| </lst>
| <lst name="initArgs">
| <lst name="defaults">
| <str name="config">/opt/solr/example/solr/conf/data-config.xml</str>
| </lst>
| </lst>
| <str name="command">delta-import</str>
| <str name="status">idle</str>
| <str name="importResponse"/>
| <lst name="statusMessages">
| <str name="Time Elapsed">0:15:9.543</str>
| <str name="Total Requests made to DataSource">0</str>
| <str name="Total Rows Fetched">0</str>
| <str name="Total Documents Processed">0</str>
| <str name="Total Documents Skipped">0</str>
| <str name="Delta Dump started">2012-11-17 17:32:56</str>
| <str name="Identifying Delta">2012-11-17 17:32:56</str>
| <str name="">*Indexing failed*. Rolled back all changes.</str>
| <str name="Rolledback">2012-11-17 17:32:56</str>
| </lst>
| <str name="WARNING">
| This response format is experimental. It is likely to change in the
| future.
| </str>
| </response>/
|
|
|
|
|
| --
| View this message in context:
| http://lucene.472066.n3.nabble.com/Solr-Delta-Import-Handler-not-working-tp4020897.html
| Sent from the Solr - User mailing list archive at Nabble.com.
|