You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Garafola Timothy <ti...@gmail.com> on 2009/03/03 23:23:32 UTC

DataImportHandler and delta-import question

I'm using solr 1.3 and am trying to get a delta-import with the DIH.
Recently the wiki, http://wiki.apache.org/solr/DataImportHandler, was
updated explaining that delta import is a 1.4 feature now but it was
still possible get a delta using the full import example here,
http://wiki.apache.org/solr/DataImportHandlerFaq#fullimportdelta.  I
tried this but each time I run DIH, it reimports all rows and updates.

Below is my data-config.xml.  I set rootEntity to false and issued
command=full-import&clean=false&optimize=false through DIH.  Am I
doing something wrong here or is the DataImportHandlerFaq incorrect?

<dataConfig>
        <dataSource driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://pencil-somewhere.com:22222/SomeDB" user="someUser"
 password="somePassword"/>
        <document name="">
                <entity name = "item" rootEntity="false"
                        query = "select DId from 2_Doc where
ModifiedDate > '${dataimporter.last_index_time}'
                                      and DocType != 'Research Articles'">
                        <entity name="feature" pk="DId"
transformer="RegexTransformer"
                                query = "SELECT d.DId, d.SiteId,
d.DocTitle, d.DocURL, d.DocDesc,
                                        d.DocType, d.Tags, d.Source,
d.Last90DaysRFIsPercent,
                                        d.ModifiedDate, d.DocGuid, d.Author,
                                        i.Industry FROM 2_Doc d LEFT
OUTER JOIN tmp_DocIndustry i
                                        ON (d.DocId=i.DocId AND
d.SiteId=i.SiteId) where d.DocType != 'Research articles'
                                        and d.DId = '${item.DId}' and
d.ModifiedDate > '${dataimporter.last_index_time}'">
                                <field column = "DId"   name ="did"/>
                                <field column = "SiteId"   name ="SiteId"/>
                                <field column = "DocId"   name ="DocId"/>
                                <field column = "DocTitle"   name ="DocTitle"/>
                                <field column = "DocURL"   name ="DocURL"/>
                                <field column = "DocDesc" name ="DocDesc" />
                                <field column = "Snippet"
regex="^(.{0,800})\b.*$" sourceColName="DocDesc"/>
                                <field column = "DocType"   name ="DocType"/>
                                <field column = "Tags" name ="Tags"
splitBy=";" sourceColName="Tags"/>
                                <field column = "Source"   name ="Source"/>
                                <field column =
"Last90DaysRFIsPercent"   name ="Last90DaysRFIsPercent"/>
                                <field column = "ModifiedDate"   name
="ModifiedDate"/>
                                <field column = "DocGuid"   name ="DocGuid"/>
                                <field column = "Author"   name ="Author"/>
                                <field column = "Industry" name
="Industry" sourceColName="Industry"/>
                        </entity>
                </entity>
        </document>
</dataConfig>

Thanks,
-Tim

Re: DataImportHandler and delta-import question

Posted by Marc Sturlese <ma...@gmail.com>.
Hey,
You can donwload the DataImportHandler from the Solr1.3 release, that should
work as does not implements rollbacks. If you want to test a nightly, I
think is always best to test one of the recents.

Tim Garafola wrote:
> 
> Thanks.  Can you recommend a build I can try?
> 
> On Thu, Mar 5, 2009 at 3:09 PM, Marc Sturlese <ma...@gmail.com>
> wrote:
>>
>> I am not sure if RollBackUpdateCommand was yet developed in the oficial
>> solr
>> 1.3 release. I think it's just in the nightly builds. Looks like your
>> dataimport package is too new. I think you should try to use that
>> dataimport
>> release with a solr nightly or try to grab an older dataimport release.
>>
>>
>> Tim Garafola wrote:
>>>
>>> I tried updating the solr instance I'm testing DIH with, adding the
>>> the dataimport and slf4j jar files to solr.
>>>
>>> When I start solr, I get the following error.  Is there something else
>>> which needs to be installed for the nightly build version of DIH to
>>> work in solr release 1.3?
>>>
>>> Thanks,
>>> Tim
>>>
>>>
>>> java.lang.NoClassDefFoundError:
>>> org/apache/solr/update/RollbackUpdateCommand
>>>       at
>>> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:95)
>>>       at
>>> org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:311)
>>>       at org.apache.solr.core.SolrCore.<init>(SolrCore.java:480)
>>>       at
>>> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:119)
>>>       at
>>> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
>>>       at
>>> com.caucho.server.dispatch.FilterManager.createFilter(FilterManager.java:134)
>>>       at
>>> com.caucho.server.dispatch.FilterManager.init(FilterManager.java:87)
>>>       at
>>> com.caucho.server.webapp.Application.start(Application.java:1655)
>>>       at
>>> com.caucho.server.deploy.DeployController.startImpl(DeployController.java:621)
>>>       at
>>> com.caucho.server.deploy.StartAutoRedeployAutoStrategy.startOnInit(StartAutoRedeployAutoStrategy.java:72)
>>>       at
>>> com.caucho.server.deploy.DeployController.startOnInit(DeployController.java:509)
>>>       at
>>> com.caucho.server.deploy.DeployContainer.start(DeployContainer.java:153)
>>>       at
>>> com.caucho.server.webapp.ApplicationContainer.start(ApplicationContainer.java:670)
>>>       at com.caucho.server.host.Host.start(Host.java:420)
>>>       at
>>> com.caucho.server.deploy.DeployController.startImpl(DeployController.java:621)
>>>       at
>>> com.caucho.server.deploy.StartAutoRedeployAutoStrategy.startOnInit(StartAutoRedeployAutoStrategy.java:72)
>>>       at
>>> com.caucho.server.deploy.DeployController.startOnInit(DeployController.java:509)
>>>       at
>>> com.caucho.server.deploy.DeployContainer.start(DeployContainer.java:153)
>>>       at
>>> com.caucho.server.host.HostContainer.start(HostContainer.java:504)
>>>       at
>>> com.caucho.server.resin.ServletServer.start(ServletServer.java:971)
>>>       at
>>> com.caucho.server.deploy.DeployController.startImpl(DeployController.java:621)
>>>       at
>>> com.caucho.server.deploy.AbstractDeployControllerStrategy.start(AbstractDeployControllerStrategy.java:56)
>>>       at
>>> com.caucho.server.deploy.DeployController.start(DeployController.java:517)
>>>       at com.caucho.server.resin.ResinServer.start(ResinServer.java:551)
>>>       at com.caucho.server.resin.Resin.init(Resin.java)
>>>       at com.caucho.server.resin.Resin.main(Resin.java:625)
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.apache.solr.update.RollbackUpdateCommand
>>>       at
>>> com.caucho.loader.DynamicClassLoader.findClass(DynamicClassLoader.java:1130)
>>>       at
>>> com.caucho.loader.DynamicClassLoader.loadClass(DynamicClassLoader.java:1072)
>>>       at
>>> com.caucho.loader.DynamicClassLoader.loadClass(DynamicClassLoader.java:1021)
>>>       at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
>>>       ... 26 more
>>>
>>>
>>> On Thu, Mar 5, 2009 at 9:10 AM, Garafola Timothy <ti...@gmail.com>
>>> wrote:
>>>> yes, the dataimport.properties file is present in the conf directory
>>>> from previous imports.  I'll try the trunk version as you suggested to
>>>> see if the problem persists.
>>>>
>>>> Thanks,
>>>> Tim
>>>>
>>>> On Wed, Mar 4, 2009 at 7:54 PM, Noble Paul നോബിള്‍  नोब्ळ्
>>>> <no...@gmail.com> wrote:
>>>>> the dataimport.properties is created only after one successful import
>>>>> .so it is available only from second import onwards. probably you can
>>>>> create one manually and put it in the conf dir.
>>>>>
>>>>> On Thu, Mar 5, 2009 at 12:52 AM, Garafola Timothy
>>>>> <ti...@gmail.com> wrote:
>>>>>> Thanks,
>>>>>>
>>>>>> I set up a another test instance of solr and ran a full import within
>>>>>> the DIH Development Console.  I examined the query and found that
>>>>>> last_index_time is not getting set in the query.  Yet the value does
>>>>>> get updated after a full import completes (outside of the development
>>>>>> console).  Is there some place that I need to set the path to the
>>>>>> dataimport.properties file?
>>>>>>
>>>>>> On Tue, Mar 3, 2009 at 8:03 PM, Noble Paul നോബിള്‍  नोब्ळ्
>>>>>> <no...@gmail.com> wrote:
>>>>>>> I do not see anything wrong with this .It should have worked . Can
>>>>>>> you
>>>>>>> check that dataimport.properties is created (by DIH) in the conf
>>>>>>> directory? . check the content?
>>>>>>>
>>>>>>>
>>>>>>> are you sure that the query
>>>>>>>
>>>>>>> select DId from 2_Doc where ModifiedDate >
>>>>>>> '${dataimporter.last_index_time}'
>>>>>>>
>>>>>>> works with  a date format yyyy-MM-dd HH:mm:ss . This is the format
>>>>>>> which DIH sends the date in . If the format is wrong you may need to
>>>>>>> format it using a dateformat function.
>>>>>>>
>>>>>>> see here
>>>>>>>
>>>>>>> http://wiki.apache.org/solr/DataImportHandler#head-5675e913396a42eb7c6c5d3c894ada5dadbb62d7
>>>>>>>
>>>>>>>
>>>>>>>  The trunk DIH can work with Solr1.3 (you may need to put the DIH
>>>>>>> jar
>>>>>>> and slf4j). Can
>>>>>>> - Show quoted text -
>>>>>>> On Wed, Mar 4, 2009 at 3:53 AM, Garafola Timothy
>>>>>>> <ti...@gmail.com> wrote:
>>>>>>>> I'm using solr 1.3 and am trying to get a delta-import with the
>>>>>>>> DIH.
>>>>>>>> Recently the wiki, http://wiki.apache.org/solr/DataImportHandler,
>>>>>>>> was
>>>>>>>> updated explaining that delta import is a 1.4 feature now but it
>>>>>>>> was
>>>>>>>> still possible get a delta using the full import example here,
>>>>>>>> http://wiki.apache.org/solr/DataImportHandlerFaq#fullimportdelta.
>>>>>>>>  I
>>>>>>>> tried this but each time I run DIH, it reimports all rows and
>>>>>>>> updates.
>>>>>>>>
>>>>>>>> Below is my data-config.xml.  I set rootEntity to false and issued
>>>>>>>> command=full-import&clean=false&optimize=false through DIH.  Am I
>>>>>>>> doing something wrong here or is the DataImportHandlerFaq
>>>>>>>> incorrect?
>>>>>>>>
>>>>>>>> <dataConfig>
>>>>>>>>        <dataSource driver="com.mysql.jdbc.Driver"
>>>>>>>> url="jdbc:mysql://pencil-somewhere.com:22222/SomeDB"
>>>>>>>> user="someUser"
>>>>>>>>  password="somePassword"/>
>>>>>>>>        <document name="">
>>>>>>>>                <entity name = "item" rootEntity="false"
>>>>>>>>                        query = "select DId from 2_Doc where
>>>>>>>> ModifiedDate > '${dataimporter.last_index_time}'
>>>>>>>>                                      and DocType != 'Research
>>>>>>>> Articles'">
>>>>>>>>                        <entity name="feature" pk="DId"
>>>>>>>> transformer="RegexTransformer"
>>>>>>>>                                query = "SELECT d.DId, d.SiteId,
>>>>>>>> d.DocTitle, d.DocURL, d.DocDesc,
>>>>>>>>                                        d.DocType, d.Tags, d.Source,
>>>>>>>> d.Last90DaysRFIsPercent,
>>>>>>>>                                        d.ModifiedDate, d.DocGuid,
>>>>>>>> d.Author,
>>>>>>>>                                        i.Industry FROM 2_Doc d LEFT
>>>>>>>> OUTER JOIN tmp_DocIndustry i
>>>>>>>>                                        ON (d.DocId=i.DocId AND
>>>>>>>> d.SiteId=i.SiteId) where d.DocType != 'Research articles'
>>>>>>>>                                        and d.DId = '${item.DId}'
>>>>>>>> and
>>>>>>>> d.ModifiedDate > '${dataimporter.last_index_time}'">
>>>>>>>>                                <field column = "DId"   name
>>>>>>>> ="did"/>
>>>>>>>>                                <field column = "SiteId"   name
>>>>>>>> ="SiteId"/>
>>>>>>>>                                <field column = "DocId"   name
>>>>>>>> ="DocId"/>
>>>>>>>>                                <field column = "DocTitle"   name
>>>>>>>> ="DocTitle"/>
>>>>>>>>                                <field column = "DocURL"   name
>>>>>>>> ="DocURL"/>
>>>>>>>>                                <field column = "DocDesc" name
>>>>>>>> ="DocDesc" />
>>>>>>>>                                <field column = "Snippet"
>>>>>>>> regex="^(.{0,800})\b.*$" sourceColName="DocDesc"/>
>>>>>>>>                                <field column = "DocType"   name
>>>>>>>> ="DocType"/>
>>>>>>>>                                <field column = "Tags" name ="Tags"
>>>>>>>> splitBy=";" sourceColName="Tags"/>
>>>>>>>>                                <field column = "Source"   name
>>>>>>>> ="Source"/>
>>>>>>>>                                <field column =
>>>>>>>> "Last90DaysRFIsPercent"   name ="Last90DaysRFIsPercent"/>
>>>>>>>>                                <field column = "ModifiedDate"  
>>>>>>>> name
>>>>>>>> ="ModifiedDate"/>
>>>>>>>>                                <field column = "DocGuid"   name
>>>>>>>> ="DocGuid"/>
>>>>>>>>                                <field column = "Author"   name
>>>>>>>> ="Author"/>
>>>>>>>>                                <field column = "Industry" name
>>>>>>>> ="Industry" sourceColName="Industry"/>
>>>>>>>>                        </entity>
>>>>>>>>                </entity>
>>>>>>>>        </document>
>>>>>>>> </dataConfig>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> -Tim
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> --Noble Paul
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> -Tim
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> --Noble Paul
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> -Tim
>>>>
>>>
>>>
>>>
>>> --
>>> -Tim
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/DataImportHandler-and-delta-import-question-tp22319343p22362850.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> -Tim
> 
> 

-- 
View this message in context: http://www.nabble.com/DataImportHandler-and-delta-import-question-tp22319343p22369458.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImportHandler and delta-import question

Posted by Garafola Timothy <ti...@gmail.com>.
Thanks.  Can you recommend a build I can try?

On Thu, Mar 5, 2009 at 3:09 PM, Marc Sturlese <ma...@gmail.com> wrote:
>
> I am not sure if RollBackUpdateCommand was yet developed in the oficial solr
> 1.3 release. I think it's just in the nightly builds. Looks like your
> dataimport package is too new. I think you should try to use that dataimport
> release with a solr nightly or try to grab an older dataimport release.
>
>
> Tim Garafola wrote:
>>
>> I tried updating the solr instance I'm testing DIH with, adding the
>> the dataimport and slf4j jar files to solr.
>>
>> When I start solr, I get the following error.  Is there something else
>> which needs to be installed for the nightly build version of DIH to
>> work in solr release 1.3?
>>
>> Thanks,
>> Tim
>>
>>
>> java.lang.NoClassDefFoundError:
>> org/apache/solr/update/RollbackUpdateCommand
>>       at
>> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:95)
>>       at
>> org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:311)
>>       at org.apache.solr.core.SolrCore.<init>(SolrCore.java:480)
>>       at
>> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:119)
>>       at
>> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
>>       at
>> com.caucho.server.dispatch.FilterManager.createFilter(FilterManager.java:134)
>>       at com.caucho.server.dispatch.FilterManager.init(FilterManager.java:87)
>>       at com.caucho.server.webapp.Application.start(Application.java:1655)
>>       at
>> com.caucho.server.deploy.DeployController.startImpl(DeployController.java:621)
>>       at
>> com.caucho.server.deploy.StartAutoRedeployAutoStrategy.startOnInit(StartAutoRedeployAutoStrategy.java:72)
>>       at
>> com.caucho.server.deploy.DeployController.startOnInit(DeployController.java:509)
>>       at
>> com.caucho.server.deploy.DeployContainer.start(DeployContainer.java:153)
>>       at
>> com.caucho.server.webapp.ApplicationContainer.start(ApplicationContainer.java:670)
>>       at com.caucho.server.host.Host.start(Host.java:420)
>>       at
>> com.caucho.server.deploy.DeployController.startImpl(DeployController.java:621)
>>       at
>> com.caucho.server.deploy.StartAutoRedeployAutoStrategy.startOnInit(StartAutoRedeployAutoStrategy.java:72)
>>       at
>> com.caucho.server.deploy.DeployController.startOnInit(DeployController.java:509)
>>       at
>> com.caucho.server.deploy.DeployContainer.start(DeployContainer.java:153)
>>       at com.caucho.server.host.HostContainer.start(HostContainer.java:504)
>>       at com.caucho.server.resin.ServletServer.start(ServletServer.java:971)
>>       at
>> com.caucho.server.deploy.DeployController.startImpl(DeployController.java:621)
>>       at
>> com.caucho.server.deploy.AbstractDeployControllerStrategy.start(AbstractDeployControllerStrategy.java:56)
>>       at
>> com.caucho.server.deploy.DeployController.start(DeployController.java:517)
>>       at com.caucho.server.resin.ResinServer.start(ResinServer.java:551)
>>       at com.caucho.server.resin.Resin.init(Resin.java)
>>       at com.caucho.server.resin.Resin.main(Resin.java:625)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.solr.update.RollbackUpdateCommand
>>       at
>> com.caucho.loader.DynamicClassLoader.findClass(DynamicClassLoader.java:1130)
>>       at
>> com.caucho.loader.DynamicClassLoader.loadClass(DynamicClassLoader.java:1072)
>>       at
>> com.caucho.loader.DynamicClassLoader.loadClass(DynamicClassLoader.java:1021)
>>       at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
>>       ... 26 more
>>
>>
>> On Thu, Mar 5, 2009 at 9:10 AM, Garafola Timothy <ti...@gmail.com>
>> wrote:
>>> yes, the dataimport.properties file is present in the conf directory
>>> from previous imports.  I'll try the trunk version as you suggested to
>>> see if the problem persists.
>>>
>>> Thanks,
>>> Tim
>>>
>>> On Wed, Mar 4, 2009 at 7:54 PM, Noble Paul നോബിള്‍  नोब्ळ्
>>> <no...@gmail.com> wrote:
>>>> the dataimport.properties is created only after one successful import
>>>> .so it is available only from second import onwards. probably you can
>>>> create one manually and put it in the conf dir.
>>>>
>>>> On Thu, Mar 5, 2009 at 12:52 AM, Garafola Timothy
>>>> <ti...@gmail.com> wrote:
>>>>> Thanks,
>>>>>
>>>>> I set up a another test instance of solr and ran a full import within
>>>>> the DIH Development Console.  I examined the query and found that
>>>>> last_index_time is not getting set in the query.  Yet the value does
>>>>> get updated after a full import completes (outside of the development
>>>>> console).  Is there some place that I need to set the path to the
>>>>> dataimport.properties file?
>>>>>
>>>>> On Tue, Mar 3, 2009 at 8:03 PM, Noble Paul നോബിള്‍  नोब्ळ्
>>>>> <no...@gmail.com> wrote:
>>>>>> I do not see anything wrong with this .It should have worked . Can you
>>>>>> check that dataimport.properties is created (by DIH) in the conf
>>>>>> directory? . check the content?
>>>>>>
>>>>>>
>>>>>> are you sure that the query
>>>>>>
>>>>>> select DId from 2_Doc where ModifiedDate >
>>>>>> '${dataimporter.last_index_time}'
>>>>>>
>>>>>> works with  a date format yyyy-MM-dd HH:mm:ss . This is the format
>>>>>> which DIH sends the date in . If the format is wrong you may need to
>>>>>> format it using a dateformat function.
>>>>>>
>>>>>> see here
>>>>>>
>>>>>> http://wiki.apache.org/solr/DataImportHandler#head-5675e913396a42eb7c6c5d3c894ada5dadbb62d7
>>>>>>
>>>>>>
>>>>>>  The trunk DIH can work with Solr1.3 (you may need to put the DIH jar
>>>>>> and slf4j). Can
>>>>>> - Show quoted text -
>>>>>> On Wed, Mar 4, 2009 at 3:53 AM, Garafola Timothy
>>>>>> <ti...@gmail.com> wrote:
>>>>>>> I'm using solr 1.3 and am trying to get a delta-import with the DIH.
>>>>>>> Recently the wiki, http://wiki.apache.org/solr/DataImportHandler, was
>>>>>>> updated explaining that delta import is a 1.4 feature now but it was
>>>>>>> still possible get a delta using the full import example here,
>>>>>>> http://wiki.apache.org/solr/DataImportHandlerFaq#fullimportdelta.  I
>>>>>>> tried this but each time I run DIH, it reimports all rows and
>>>>>>> updates.
>>>>>>>
>>>>>>> Below is my data-config.xml.  I set rootEntity to false and issued
>>>>>>> command=full-import&clean=false&optimize=false through DIH.  Am I
>>>>>>> doing something wrong here or is the DataImportHandlerFaq incorrect?
>>>>>>>
>>>>>>> <dataConfig>
>>>>>>>        <dataSource driver="com.mysql.jdbc.Driver"
>>>>>>> url="jdbc:mysql://pencil-somewhere.com:22222/SomeDB" user="someUser"
>>>>>>>  password="somePassword"/>
>>>>>>>        <document name="">
>>>>>>>                <entity name = "item" rootEntity="false"
>>>>>>>                        query = "select DId from 2_Doc where
>>>>>>> ModifiedDate > '${dataimporter.last_index_time}'
>>>>>>>                                      and DocType != 'Research
>>>>>>> Articles'">
>>>>>>>                        <entity name="feature" pk="DId"
>>>>>>> transformer="RegexTransformer"
>>>>>>>                                query = "SELECT d.DId, d.SiteId,
>>>>>>> d.DocTitle, d.DocURL, d.DocDesc,
>>>>>>>                                        d.DocType, d.Tags, d.Source,
>>>>>>> d.Last90DaysRFIsPercent,
>>>>>>>                                        d.ModifiedDate, d.DocGuid,
>>>>>>> d.Author,
>>>>>>>                                        i.Industry FROM 2_Doc d LEFT
>>>>>>> OUTER JOIN tmp_DocIndustry i
>>>>>>>                                        ON (d.DocId=i.DocId AND
>>>>>>> d.SiteId=i.SiteId) where d.DocType != 'Research articles'
>>>>>>>                                        and d.DId = '${item.DId}' and
>>>>>>> d.ModifiedDate > '${dataimporter.last_index_time}'">
>>>>>>>                                <field column = "DId"   name ="did"/>
>>>>>>>                                <field column = "SiteId"   name
>>>>>>> ="SiteId"/>
>>>>>>>                                <field column = "DocId"   name
>>>>>>> ="DocId"/>
>>>>>>>                                <field column = "DocTitle"   name
>>>>>>> ="DocTitle"/>
>>>>>>>                                <field column = "DocURL"   name
>>>>>>> ="DocURL"/>
>>>>>>>                                <field column = "DocDesc" name
>>>>>>> ="DocDesc" />
>>>>>>>                                <field column = "Snippet"
>>>>>>> regex="^(.{0,800})\b.*$" sourceColName="DocDesc"/>
>>>>>>>                                <field column = "DocType"   name
>>>>>>> ="DocType"/>
>>>>>>>                                <field column = "Tags" name ="Tags"
>>>>>>> splitBy=";" sourceColName="Tags"/>
>>>>>>>                                <field column = "Source"   name
>>>>>>> ="Source"/>
>>>>>>>                                <field column =
>>>>>>> "Last90DaysRFIsPercent"   name ="Last90DaysRFIsPercent"/>
>>>>>>>                                <field column = "ModifiedDate"   name
>>>>>>> ="ModifiedDate"/>
>>>>>>>                                <field column = "DocGuid"   name
>>>>>>> ="DocGuid"/>
>>>>>>>                                <field column = "Author"   name
>>>>>>> ="Author"/>
>>>>>>>                                <field column = "Industry" name
>>>>>>> ="Industry" sourceColName="Industry"/>
>>>>>>>                        </entity>
>>>>>>>                </entity>
>>>>>>>        </document>
>>>>>>> </dataConfig>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> -Tim
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> --Noble Paul
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> -Tim
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> --Noble Paul
>>>>
>>>
>>>
>>>
>>> --
>>> -Tim
>>>
>>
>>
>>
>> --
>> -Tim
>>
>>
>
> --
> View this message in context: http://www.nabble.com/DataImportHandler-and-delta-import-question-tp22319343p22362850.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-Tim

Re: DataImportHandler and delta-import question

Posted by Marc Sturlese <ma...@gmail.com>.
I am not sure if RollBackUpdateCommand was yet developed in the oficial solr
1.3 release. I think it's just in the nightly builds. Looks like your
dataimport package is too new. I think you should try to use that dataimport
release with a solr nightly or try to grab an older dataimport release.


Tim Garafola wrote:
> 
> I tried updating the solr instance I'm testing DIH with, adding the
> the dataimport and slf4j jar files to solr.
> 
> When I start solr, I get the following error.  Is there something else
> which needs to be installed for the nightly build version of DIH to
> work in solr release 1.3?
> 
> Thanks,
> Tim
> 
> 
> java.lang.NoClassDefFoundError:
> org/apache/solr/update/RollbackUpdateCommand
> 	at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:95)
> 	at
> org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:311)
> 	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:480)
> 	at
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:119)
> 	at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
> 	at
> com.caucho.server.dispatch.FilterManager.createFilter(FilterManager.java:134)
> 	at com.caucho.server.dispatch.FilterManager.init(FilterManager.java:87)
> 	at com.caucho.server.webapp.Application.start(Application.java:1655)
> 	at
> com.caucho.server.deploy.DeployController.startImpl(DeployController.java:621)
> 	at
> com.caucho.server.deploy.StartAutoRedeployAutoStrategy.startOnInit(StartAutoRedeployAutoStrategy.java:72)
> 	at
> com.caucho.server.deploy.DeployController.startOnInit(DeployController.java:509)
> 	at
> com.caucho.server.deploy.DeployContainer.start(DeployContainer.java:153)
> 	at
> com.caucho.server.webapp.ApplicationContainer.start(ApplicationContainer.java:670)
> 	at com.caucho.server.host.Host.start(Host.java:420)
> 	at
> com.caucho.server.deploy.DeployController.startImpl(DeployController.java:621)
> 	at
> com.caucho.server.deploy.StartAutoRedeployAutoStrategy.startOnInit(StartAutoRedeployAutoStrategy.java:72)
> 	at
> com.caucho.server.deploy.DeployController.startOnInit(DeployController.java:509)
> 	at
> com.caucho.server.deploy.DeployContainer.start(DeployContainer.java:153)
> 	at com.caucho.server.host.HostContainer.start(HostContainer.java:504)
> 	at com.caucho.server.resin.ServletServer.start(ServletServer.java:971)
> 	at
> com.caucho.server.deploy.DeployController.startImpl(DeployController.java:621)
> 	at
> com.caucho.server.deploy.AbstractDeployControllerStrategy.start(AbstractDeployControllerStrategy.java:56)
> 	at
> com.caucho.server.deploy.DeployController.start(DeployController.java:517)
> 	at com.caucho.server.resin.ResinServer.start(ResinServer.java:551)
> 	at com.caucho.server.resin.Resin.init(Resin.java)
> 	at com.caucho.server.resin.Resin.main(Resin.java:625)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.solr.update.RollbackUpdateCommand
> 	at
> com.caucho.loader.DynamicClassLoader.findClass(DynamicClassLoader.java:1130)
> 	at
> com.caucho.loader.DynamicClassLoader.loadClass(DynamicClassLoader.java:1072)
> 	at
> com.caucho.loader.DynamicClassLoader.loadClass(DynamicClassLoader.java:1021)
> 	at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
> 	... 26 more
> 
> 
> On Thu, Mar 5, 2009 at 9:10 AM, Garafola Timothy <ti...@gmail.com>
> wrote:
>> yes, the dataimport.properties file is present in the conf directory
>> from previous imports.  I'll try the trunk version as you suggested to
>> see if the problem persists.
>>
>> Thanks,
>> Tim
>>
>> On Wed, Mar 4, 2009 at 7:54 PM, Noble Paul നോബിള്‍  नोब्ळ्
>> <no...@gmail.com> wrote:
>>> the dataimport.properties is created only after one successful import
>>> .so it is available only from second import onwards. probably you can
>>> create one manually and put it in the conf dir.
>>>
>>> On Thu, Mar 5, 2009 at 12:52 AM, Garafola Timothy
>>> <ti...@gmail.com> wrote:
>>>> Thanks,
>>>>
>>>> I set up a another test instance of solr and ran a full import within
>>>> the DIH Development Console.  I examined the query and found that
>>>> last_index_time is not getting set in the query.  Yet the value does
>>>> get updated after a full import completes (outside of the development
>>>> console).  Is there some place that I need to set the path to the
>>>> dataimport.properties file?
>>>>
>>>> On Tue, Mar 3, 2009 at 8:03 PM, Noble Paul നോബിള്‍  नोब्ळ्
>>>> <no...@gmail.com> wrote:
>>>>> I do not see anything wrong with this .It should have worked . Can you
>>>>> check that dataimport.properties is created (by DIH) in the conf
>>>>> directory? . check the content?
>>>>>
>>>>>
>>>>> are you sure that the query
>>>>>
>>>>> select DId from 2_Doc where ModifiedDate >
>>>>> '${dataimporter.last_index_time}'
>>>>>
>>>>> works with  a date format yyyy-MM-dd HH:mm:ss . This is the format
>>>>> which DIH sends the date in . If the format is wrong you may need to
>>>>> format it using a dateformat function.
>>>>>
>>>>> see here
>>>>>
>>>>> http://wiki.apache.org/solr/DataImportHandler#head-5675e913396a42eb7c6c5d3c894ada5dadbb62d7
>>>>>
>>>>>
>>>>>  The trunk DIH can work with Solr1.3 (you may need to put the DIH jar
>>>>> and slf4j). Can
>>>>> - Show quoted text -
>>>>> On Wed, Mar 4, 2009 at 3:53 AM, Garafola Timothy
>>>>> <ti...@gmail.com> wrote:
>>>>>> I'm using solr 1.3 and am trying to get a delta-import with the DIH.
>>>>>> Recently the wiki, http://wiki.apache.org/solr/DataImportHandler, was
>>>>>> updated explaining that delta import is a 1.4 feature now but it was
>>>>>> still possible get a delta using the full import example here,
>>>>>> http://wiki.apache.org/solr/DataImportHandlerFaq#fullimportdelta.  I
>>>>>> tried this but each time I run DIH, it reimports all rows and
>>>>>> updates.
>>>>>>
>>>>>> Below is my data-config.xml.  I set rootEntity to false and issued
>>>>>> command=full-import&clean=false&optimize=false through DIH.  Am I
>>>>>> doing something wrong here or is the DataImportHandlerFaq incorrect?
>>>>>>
>>>>>> <dataConfig>
>>>>>>        <dataSource driver="com.mysql.jdbc.Driver"
>>>>>> url="jdbc:mysql://pencil-somewhere.com:22222/SomeDB" user="someUser"
>>>>>>  password="somePassword"/>
>>>>>>        <document name="">
>>>>>>                <entity name = "item" rootEntity="false"
>>>>>>                        query = "select DId from 2_Doc where
>>>>>> ModifiedDate > '${dataimporter.last_index_time}'
>>>>>>                                      and DocType != 'Research
>>>>>> Articles'">
>>>>>>                        <entity name="feature" pk="DId"
>>>>>> transformer="RegexTransformer"
>>>>>>                                query = "SELECT d.DId, d.SiteId,
>>>>>> d.DocTitle, d.DocURL, d.DocDesc,
>>>>>>                                        d.DocType, d.Tags, d.Source,
>>>>>> d.Last90DaysRFIsPercent,
>>>>>>                                        d.ModifiedDate, d.DocGuid,
>>>>>> d.Author,
>>>>>>                                        i.Industry FROM 2_Doc d LEFT
>>>>>> OUTER JOIN tmp_DocIndustry i
>>>>>>                                        ON (d.DocId=i.DocId AND
>>>>>> d.SiteId=i.SiteId) where d.DocType != 'Research articles'
>>>>>>                                        and d.DId = '${item.DId}' and
>>>>>> d.ModifiedDate > '${dataimporter.last_index_time}'">
>>>>>>                                <field column = "DId"   name ="did"/>
>>>>>>                                <field column = "SiteId"   name
>>>>>> ="SiteId"/>
>>>>>>                                <field column = "DocId"   name
>>>>>> ="DocId"/>
>>>>>>                                <field column = "DocTitle"   name
>>>>>> ="DocTitle"/>
>>>>>>                                <field column = "DocURL"   name
>>>>>> ="DocURL"/>
>>>>>>                                <field column = "DocDesc" name
>>>>>> ="DocDesc" />
>>>>>>                                <field column = "Snippet"
>>>>>> regex="^(.{0,800})\b.*$" sourceColName="DocDesc"/>
>>>>>>                                <field column = "DocType"   name
>>>>>> ="DocType"/>
>>>>>>                                <field column = "Tags" name ="Tags"
>>>>>> splitBy=";" sourceColName="Tags"/>
>>>>>>                                <field column = "Source"   name
>>>>>> ="Source"/>
>>>>>>                                <field column =
>>>>>> "Last90DaysRFIsPercent"   name ="Last90DaysRFIsPercent"/>
>>>>>>                                <field column = "ModifiedDate"   name
>>>>>> ="ModifiedDate"/>
>>>>>>                                <field column = "DocGuid"   name
>>>>>> ="DocGuid"/>
>>>>>>                                <field column = "Author"   name
>>>>>> ="Author"/>
>>>>>>                                <field column = "Industry" name
>>>>>> ="Industry" sourceColName="Industry"/>
>>>>>>                        </entity>
>>>>>>                </entity>
>>>>>>        </document>
>>>>>> </dataConfig>
>>>>>>
>>>>>> Thanks,
>>>>>> -Tim
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> --Noble Paul
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> -Tim
>>>>
>>>
>>>
>>>
>>> --
>>> --Noble Paul
>>>
>>
>>
>>
>> --
>> -Tim
>>
> 
> 
> 
> -- 
> -Tim
> 
> 

-- 
View this message in context: http://www.nabble.com/DataImportHandler-and-delta-import-question-tp22319343p22362850.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImportHandler and delta-import question

Posted by Garafola Timothy <ti...@gmail.com>.
I tried updating the solr instance I'm testing DIH with, adding the
the dataimport and slf4j jar files to solr.

When I start solr, I get the following error.  Is there something else
which needs to be installed for the nightly build version of DIH to
work in solr release 1.3?

Thanks,
Tim


java.lang.NoClassDefFoundError: org/apache/solr/update/RollbackUpdateCommand
	at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:95)
	at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:311)
	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:480)
	at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:119)
	at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
	at com.caucho.server.dispatch.FilterManager.createFilter(FilterManager.java:134)
	at com.caucho.server.dispatch.FilterManager.init(FilterManager.java:87)
	at com.caucho.server.webapp.Application.start(Application.java:1655)
	at com.caucho.server.deploy.DeployController.startImpl(DeployController.java:621)
	at com.caucho.server.deploy.StartAutoRedeployAutoStrategy.startOnInit(StartAutoRedeployAutoStrategy.java:72)
	at com.caucho.server.deploy.DeployController.startOnInit(DeployController.java:509)
	at com.caucho.server.deploy.DeployContainer.start(DeployContainer.java:153)
	at com.caucho.server.webapp.ApplicationContainer.start(ApplicationContainer.java:670)
	at com.caucho.server.host.Host.start(Host.java:420)
	at com.caucho.server.deploy.DeployController.startImpl(DeployController.java:621)
	at com.caucho.server.deploy.StartAutoRedeployAutoStrategy.startOnInit(StartAutoRedeployAutoStrategy.java:72)
	at com.caucho.server.deploy.DeployController.startOnInit(DeployController.java:509)
	at com.caucho.server.deploy.DeployContainer.start(DeployContainer.java:153)
	at com.caucho.server.host.HostContainer.start(HostContainer.java:504)
	at com.caucho.server.resin.ServletServer.start(ServletServer.java:971)
	at com.caucho.server.deploy.DeployController.startImpl(DeployController.java:621)
	at com.caucho.server.deploy.AbstractDeployControllerStrategy.start(AbstractDeployControllerStrategy.java:56)
	at com.caucho.server.deploy.DeployController.start(DeployController.java:517)
	at com.caucho.server.resin.ResinServer.start(ResinServer.java:551)
	at com.caucho.server.resin.Resin.init(Resin.java)
	at com.caucho.server.resin.Resin.main(Resin.java:625)
Caused by: java.lang.ClassNotFoundException:
org.apache.solr.update.RollbackUpdateCommand
	at com.caucho.loader.DynamicClassLoader.findClass(DynamicClassLoader.java:1130)
	at com.caucho.loader.DynamicClassLoader.loadClass(DynamicClassLoader.java:1072)
	at com.caucho.loader.DynamicClassLoader.loadClass(DynamicClassLoader.java:1021)
	at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
	... 26 more


On Thu, Mar 5, 2009 at 9:10 AM, Garafola Timothy <ti...@gmail.com> wrote:
> yes, the dataimport.properties file is present in the conf directory
> from previous imports.  I'll try the trunk version as you suggested to
> see if the problem persists.
>
> Thanks,
> Tim
>
> On Wed, Mar 4, 2009 at 7:54 PM, Noble Paul നോബിള്‍  नोब्ळ्
> <no...@gmail.com> wrote:
>> the dataimport.properties is created only after one successful import
>> .so it is available only from second import onwards. probably you can
>> create one manually and put it in the conf dir.
>>
>> On Thu, Mar 5, 2009 at 12:52 AM, Garafola Timothy <ti...@gmail.com> wrote:
>>> Thanks,
>>>
>>> I set up a another test instance of solr and ran a full import within
>>> the DIH Development Console.  I examined the query and found that
>>> last_index_time is not getting set in the query.  Yet the value does
>>> get updated after a full import completes (outside of the development
>>> console).  Is there some place that I need to set the path to the
>>> dataimport.properties file?
>>>
>>> On Tue, Mar 3, 2009 at 8:03 PM, Noble Paul നോബിള്‍  नोब्ळ्
>>> <no...@gmail.com> wrote:
>>>> I do not see anything wrong with this .It should have worked . Can you
>>>> check that dataimport.properties is created (by DIH) in the conf
>>>> directory? . check the content?
>>>>
>>>>
>>>> are you sure that the query
>>>>
>>>> select DId from 2_Doc where ModifiedDate > '${dataimporter.last_index_time}'
>>>>
>>>> works with  a date format yyyy-MM-dd HH:mm:ss . This is the format
>>>> which DIH sends the date in . If the format is wrong you may need to
>>>> format it using a dateformat function.
>>>>
>>>> see here
>>>>
>>>> http://wiki.apache.org/solr/DataImportHandler#head-5675e913396a42eb7c6c5d3c894ada5dadbb62d7
>>>>
>>>>
>>>>  The trunk DIH can work with Solr1.3 (you may need to put the DIH jar
>>>> and slf4j). Can
>>>> - Show quoted text -
>>>> On Wed, Mar 4, 2009 at 3:53 AM, Garafola Timothy <ti...@gmail.com> wrote:
>>>>> I'm using solr 1.3 and am trying to get a delta-import with the DIH.
>>>>> Recently the wiki, http://wiki.apache.org/solr/DataImportHandler, was
>>>>> updated explaining that delta import is a 1.4 feature now but it was
>>>>> still possible get a delta using the full import example here,
>>>>> http://wiki.apache.org/solr/DataImportHandlerFaq#fullimportdelta.  I
>>>>> tried this but each time I run DIH, it reimports all rows and updates.
>>>>>
>>>>> Below is my data-config.xml.  I set rootEntity to false and issued
>>>>> command=full-import&clean=false&optimize=false through DIH.  Am I
>>>>> doing something wrong here or is the DataImportHandlerFaq incorrect?
>>>>>
>>>>> <dataConfig>
>>>>>        <dataSource driver="com.mysql.jdbc.Driver"
>>>>> url="jdbc:mysql://pencil-somewhere.com:22222/SomeDB" user="someUser"
>>>>>  password="somePassword"/>
>>>>>        <document name="">
>>>>>                <entity name = "item" rootEntity="false"
>>>>>                        query = "select DId from 2_Doc where
>>>>> ModifiedDate > '${dataimporter.last_index_time}'
>>>>>                                      and DocType != 'Research Articles'">
>>>>>                        <entity name="feature" pk="DId"
>>>>> transformer="RegexTransformer"
>>>>>                                query = "SELECT d.DId, d.SiteId,
>>>>> d.DocTitle, d.DocURL, d.DocDesc,
>>>>>                                        d.DocType, d.Tags, d.Source,
>>>>> d.Last90DaysRFIsPercent,
>>>>>                                        d.ModifiedDate, d.DocGuid, d.Author,
>>>>>                                        i.Industry FROM 2_Doc d LEFT
>>>>> OUTER JOIN tmp_DocIndustry i
>>>>>                                        ON (d.DocId=i.DocId AND
>>>>> d.SiteId=i.SiteId) where d.DocType != 'Research articles'
>>>>>                                        and d.DId = '${item.DId}' and
>>>>> d.ModifiedDate > '${dataimporter.last_index_time}'">
>>>>>                                <field column = "DId"   name ="did"/>
>>>>>                                <field column = "SiteId"   name ="SiteId"/>
>>>>>                                <field column = "DocId"   name ="DocId"/>
>>>>>                                <field column = "DocTitle"   name ="DocTitle"/>
>>>>>                                <field column = "DocURL"   name ="DocURL"/>
>>>>>                                <field column = "DocDesc" name ="DocDesc" />
>>>>>                                <field column = "Snippet"
>>>>> regex="^(.{0,800})\b.*$" sourceColName="DocDesc"/>
>>>>>                                <field column = "DocType"   name ="DocType"/>
>>>>>                                <field column = "Tags" name ="Tags"
>>>>> splitBy=";" sourceColName="Tags"/>
>>>>>                                <field column = "Source"   name ="Source"/>
>>>>>                                <field column =
>>>>> "Last90DaysRFIsPercent"   name ="Last90DaysRFIsPercent"/>
>>>>>                                <field column = "ModifiedDate"   name
>>>>> ="ModifiedDate"/>
>>>>>                                <field column = "DocGuid"   name ="DocGuid"/>
>>>>>                                <field column = "Author"   name ="Author"/>
>>>>>                                <field column = "Industry" name
>>>>> ="Industry" sourceColName="Industry"/>
>>>>>                        </entity>
>>>>>                </entity>
>>>>>        </document>
>>>>> </dataConfig>
>>>>>
>>>>> Thanks,
>>>>> -Tim
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> --Noble Paul
>>>>
>>>
>>>
>>>
>>> --
>>> -Tim
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>
>
>
> --
> -Tim
>



-- 
-Tim

Re: DataImportHandler and delta-import question

Posted by Garafola Timothy <ti...@gmail.com>.
yes, the dataimport.properties file is present in the conf directory
from previous imports.  I'll try the trunk version as you suggested to
see if the problem persists.

Thanks,
Tim

On Wed, Mar 4, 2009 at 7:54 PM, Noble Paul നോബിള്‍  नोब्ळ्
<no...@gmail.com> wrote:
> the dataimport.properties is created only after one successful import
> .so it is available only from second import onwards. probably you can
> create one manually and put it in the conf dir.
>
> On Thu, Mar 5, 2009 at 12:52 AM, Garafola Timothy <ti...@gmail.com> wrote:
>> Thanks,
>>
>> I set up a another test instance of solr and ran a full import within
>> the DIH Development Console.  I examined the query and found that
>> last_index_time is not getting set in the query.  Yet the value does
>> get updated after a full import completes (outside of the development
>> console).  Is there some place that I need to set the path to the
>> dataimport.properties file?
>>
>> On Tue, Mar 3, 2009 at 8:03 PM, Noble Paul നോബിള്‍  नोब्ळ्
>> <no...@gmail.com> wrote:
>>> I do not see anything wrong with this .It should have worked . Can you
>>> check that dataimport.properties is created (by DIH) in the conf
>>> directory? . check the content?
>>>
>>>
>>> are you sure that the query
>>>
>>> select DId from 2_Doc where ModifiedDate > '${dataimporter.last_index_time}'
>>>
>>> works with  a date format yyyy-MM-dd HH:mm:ss . This is the format
>>> which DIH sends the date in . If the format is wrong you may need to
>>> format it using a dateformat function.
>>>
>>> see here
>>>
>>> http://wiki.apache.org/solr/DataImportHandler#head-5675e913396a42eb7c6c5d3c894ada5dadbb62d7
>>>
>>>
>>>  The trunk DIH can work with Solr1.3 (you may need to put the DIH jar
>>> and slf4j). Can
>>> - Show quoted text -
>>> On Wed, Mar 4, 2009 at 3:53 AM, Garafola Timothy <ti...@gmail.com> wrote:
>>>> I'm using solr 1.3 and am trying to get a delta-import with the DIH.
>>>> Recently the wiki, http://wiki.apache.org/solr/DataImportHandler, was
>>>> updated explaining that delta import is a 1.4 feature now but it was
>>>> still possible get a delta using the full import example here,
>>>> http://wiki.apache.org/solr/DataImportHandlerFaq#fullimportdelta.  I
>>>> tried this but each time I run DIH, it reimports all rows and updates.
>>>>
>>>> Below is my data-config.xml.  I set rootEntity to false and issued
>>>> command=full-import&clean=false&optimize=false through DIH.  Am I
>>>> doing something wrong here or is the DataImportHandlerFaq incorrect?
>>>>
>>>> <dataConfig>
>>>>        <dataSource driver="com.mysql.jdbc.Driver"
>>>> url="jdbc:mysql://pencil-somewhere.com:22222/SomeDB" user="someUser"
>>>>  password="somePassword"/>
>>>>        <document name="">
>>>>                <entity name = "item" rootEntity="false"
>>>>                        query = "select DId from 2_Doc where
>>>> ModifiedDate > '${dataimporter.last_index_time}'
>>>>                                      and DocType != 'Research Articles'">
>>>>                        <entity name="feature" pk="DId"
>>>> transformer="RegexTransformer"
>>>>                                query = "SELECT d.DId, d.SiteId,
>>>> d.DocTitle, d.DocURL, d.DocDesc,
>>>>                                        d.DocType, d.Tags, d.Source,
>>>> d.Last90DaysRFIsPercent,
>>>>                                        d.ModifiedDate, d.DocGuid, d.Author,
>>>>                                        i.Industry FROM 2_Doc d LEFT
>>>> OUTER JOIN tmp_DocIndustry i
>>>>                                        ON (d.DocId=i.DocId AND
>>>> d.SiteId=i.SiteId) where d.DocType != 'Research articles'
>>>>                                        and d.DId = '${item.DId}' and
>>>> d.ModifiedDate > '${dataimporter.last_index_time}'">
>>>>                                <field column = "DId"   name ="did"/>
>>>>                                <field column = "SiteId"   name ="SiteId"/>
>>>>                                <field column = "DocId"   name ="DocId"/>
>>>>                                <field column = "DocTitle"   name ="DocTitle"/>
>>>>                                <field column = "DocURL"   name ="DocURL"/>
>>>>                                <field column = "DocDesc" name ="DocDesc" />
>>>>                                <field column = "Snippet"
>>>> regex="^(.{0,800})\b.*$" sourceColName="DocDesc"/>
>>>>                                <field column = "DocType"   name ="DocType"/>
>>>>                                <field column = "Tags" name ="Tags"
>>>> splitBy=";" sourceColName="Tags"/>
>>>>                                <field column = "Source"   name ="Source"/>
>>>>                                <field column =
>>>> "Last90DaysRFIsPercent"   name ="Last90DaysRFIsPercent"/>
>>>>                                <field column = "ModifiedDate"   name
>>>> ="ModifiedDate"/>
>>>>                                <field column = "DocGuid"   name ="DocGuid"/>
>>>>                                <field column = "Author"   name ="Author"/>
>>>>                                <field column = "Industry" name
>>>> ="Industry" sourceColName="Industry"/>
>>>>                        </entity>
>>>>                </entity>
>>>>        </document>
>>>> </dataConfig>
>>>>
>>>> Thanks,
>>>> -Tim
>>>>
>>>
>>>
>>>
>>> --
>>> --Noble Paul
>>>
>>
>>
>>
>> --
>> -Tim
>>
>
>
>
> --
> --Noble Paul
>



-- 
-Tim

Re: DataImportHandler and delta-import question

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
the dataimport.properties is created only after one successful import
.so it is available only from second import onwards. probably you can
create one manually and put it in the conf dir.

On Thu, Mar 5, 2009 at 12:52 AM, Garafola Timothy <ti...@gmail.com> wrote:
> Thanks,
>
> I set up a another test instance of solr and ran a full import within
> the DIH Development Console.  I examined the query and found that
> last_index_time is not getting set in the query.  Yet the value does
> get updated after a full import completes (outside of the development
> console).  Is there some place that I need to set the path to the
> dataimport.properties file?
>
> On Tue, Mar 3, 2009 at 8:03 PM, Noble Paul നോബിള്‍  नोब्ळ्
> <no...@gmail.com> wrote:
>> I do not see anything wrong with this .It should have worked . Can you
>> check that dataimport.properties is created (by DIH) in the conf
>> directory? . check the content?
>>
>>
>> are you sure that the query
>>
>> select DId from 2_Doc where ModifiedDate > '${dataimporter.last_index_time}'
>>
>> works with  a date format yyyy-MM-dd HH:mm:ss . This is the format
>> which DIH sends the date in . If the format is wrong you may need to
>> format it using a dateformat function.
>>
>> see here
>>
>> http://wiki.apache.org/solr/DataImportHandler#head-5675e913396a42eb7c6c5d3c894ada5dadbb62d7
>>
>>
>>  The trunk DIH can work with Solr1.3 (you may need to put the DIH jar
>> and slf4j). Can
>> - Show quoted text -
>> On Wed, Mar 4, 2009 at 3:53 AM, Garafola Timothy <ti...@gmail.com> wrote:
>>> I'm using solr 1.3 and am trying to get a delta-import with the DIH.
>>> Recently the wiki, http://wiki.apache.org/solr/DataImportHandler, was
>>> updated explaining that delta import is a 1.4 feature now but it was
>>> still possible get a delta using the full import example here,
>>> http://wiki.apache.org/solr/DataImportHandlerFaq#fullimportdelta.  I
>>> tried this but each time I run DIH, it reimports all rows and updates.
>>>
>>> Below is my data-config.xml.  I set rootEntity to false and issued
>>> command=full-import&clean=false&optimize=false through DIH.  Am I
>>> doing something wrong here or is the DataImportHandlerFaq incorrect?
>>>
>>> <dataConfig>
>>>        <dataSource driver="com.mysql.jdbc.Driver"
>>> url="jdbc:mysql://pencil-somewhere.com:22222/SomeDB" user="someUser"
>>>  password="somePassword"/>
>>>        <document name="">
>>>                <entity name = "item" rootEntity="false"
>>>                        query = "select DId from 2_Doc where
>>> ModifiedDate > '${dataimporter.last_index_time}'
>>>                                      and DocType != 'Research Articles'">
>>>                        <entity name="feature" pk="DId"
>>> transformer="RegexTransformer"
>>>                                query = "SELECT d.DId, d.SiteId,
>>> d.DocTitle, d.DocURL, d.DocDesc,
>>>                                        d.DocType, d.Tags, d.Source,
>>> d.Last90DaysRFIsPercent,
>>>                                        d.ModifiedDate, d.DocGuid, d.Author,
>>>                                        i.Industry FROM 2_Doc d LEFT
>>> OUTER JOIN tmp_DocIndustry i
>>>                                        ON (d.DocId=i.DocId AND
>>> d.SiteId=i.SiteId) where d.DocType != 'Research articles'
>>>                                        and d.DId = '${item.DId}' and
>>> d.ModifiedDate > '${dataimporter.last_index_time}'">
>>>                                <field column = "DId"   name ="did"/>
>>>                                <field column = "SiteId"   name ="SiteId"/>
>>>                                <field column = "DocId"   name ="DocId"/>
>>>                                <field column = "DocTitle"   name ="DocTitle"/>
>>>                                <field column = "DocURL"   name ="DocURL"/>
>>>                                <field column = "DocDesc" name ="DocDesc" />
>>>                                <field column = "Snippet"
>>> regex="^(.{0,800})\b.*$" sourceColName="DocDesc"/>
>>>                                <field column = "DocType"   name ="DocType"/>
>>>                                <field column = "Tags" name ="Tags"
>>> splitBy=";" sourceColName="Tags"/>
>>>                                <field column = "Source"   name ="Source"/>
>>>                                <field column =
>>> "Last90DaysRFIsPercent"   name ="Last90DaysRFIsPercent"/>
>>>                                <field column = "ModifiedDate"   name
>>> ="ModifiedDate"/>
>>>                                <field column = "DocGuid"   name ="DocGuid"/>
>>>                                <field column = "Author"   name ="Author"/>
>>>                                <field column = "Industry" name
>>> ="Industry" sourceColName="Industry"/>
>>>                        </entity>
>>>                </entity>
>>>        </document>
>>> </dataConfig>
>>>
>>> Thanks,
>>> -Tim
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>
>
>
> --
> -Tim
>



-- 
--Noble Paul

Re: DataImportHandler and delta-import question

Posted by Garafola Timothy <ti...@gmail.com>.
Thanks,

I set up a another test instance of solr and ran a full import within
the DIH Development Console.  I examined the query and found that
last_index_time is not getting set in the query.  Yet the value does
get updated after a full import completes (outside of the development
console).  Is there some place that I need to set the path to the
dataimport.properties file?

On Tue, Mar 3, 2009 at 8:03 PM, Noble Paul നോബിള്‍  नोब्ळ्
<no...@gmail.com> wrote:
> I do not see anything wrong with this .It should have worked . Can you
> check that dataimport.properties is created (by DIH) in the conf
> directory? . check the content?
>
>
> are you sure that the query
>
> select DId from 2_Doc where ModifiedDate > '${dataimporter.last_index_time}'
>
> works with  a date format yyyy-MM-dd HH:mm:ss . This is the format
> which DIH sends the date in . If the format is wrong you may need to
> format it using a dateformat function.
>
> see here
>
> http://wiki.apache.org/solr/DataImportHandler#head-5675e913396a42eb7c6c5d3c894ada5dadbb62d7
>
>
>  The trunk DIH can work with Solr1.3 (you may need to put the DIH jar
> and slf4j). Can
> - Show quoted text -
> On Wed, Mar 4, 2009 at 3:53 AM, Garafola Timothy <ti...@gmail.com> wrote:
>> I'm using solr 1.3 and am trying to get a delta-import with the DIH.
>> Recently the wiki, http://wiki.apache.org/solr/DataImportHandler, was
>> updated explaining that delta import is a 1.4 feature now but it was
>> still possible get a delta using the full import example here,
>> http://wiki.apache.org/solr/DataImportHandlerFaq#fullimportdelta.  I
>> tried this but each time I run DIH, it reimports all rows and updates.
>>
>> Below is my data-config.xml.  I set rootEntity to false and issued
>> command=full-import&clean=false&optimize=false through DIH.  Am I
>> doing something wrong here or is the DataImportHandlerFaq incorrect?
>>
>> <dataConfig>
>>        <dataSource driver="com.mysql.jdbc.Driver"
>> url="jdbc:mysql://pencil-somewhere.com:22222/SomeDB" user="someUser"
>>  password="somePassword"/>
>>        <document name="">
>>                <entity name = "item" rootEntity="false"
>>                        query = "select DId from 2_Doc where
>> ModifiedDate > '${dataimporter.last_index_time}'
>>                                      and DocType != 'Research Articles'">
>>                        <entity name="feature" pk="DId"
>> transformer="RegexTransformer"
>>                                query = "SELECT d.DId, d.SiteId,
>> d.DocTitle, d.DocURL, d.DocDesc,
>>                                        d.DocType, d.Tags, d.Source,
>> d.Last90DaysRFIsPercent,
>>                                        d.ModifiedDate, d.DocGuid, d.Author,
>>                                        i.Industry FROM 2_Doc d LEFT
>> OUTER JOIN tmp_DocIndustry i
>>                                        ON (d.DocId=i.DocId AND
>> d.SiteId=i.SiteId) where d.DocType != 'Research articles'
>>                                        and d.DId = '${item.DId}' and
>> d.ModifiedDate > '${dataimporter.last_index_time}'">
>>                                <field column = "DId"   name ="did"/>
>>                                <field column = "SiteId"   name ="SiteId"/>
>>                                <field column = "DocId"   name ="DocId"/>
>>                                <field column = "DocTitle"   name ="DocTitle"/>
>>                                <field column = "DocURL"   name ="DocURL"/>
>>                                <field column = "DocDesc" name ="DocDesc" />
>>                                <field column = "Snippet"
>> regex="^(.{0,800})\b.*$" sourceColName="DocDesc"/>
>>                                <field column = "DocType"   name ="DocType"/>
>>                                <field column = "Tags" name ="Tags"
>> splitBy=";" sourceColName="Tags"/>
>>                                <field column = "Source"   name ="Source"/>
>>                                <field column =
>> "Last90DaysRFIsPercent"   name ="Last90DaysRFIsPercent"/>
>>                                <field column = "ModifiedDate"   name
>> ="ModifiedDate"/>
>>                                <field column = "DocGuid"   name ="DocGuid"/>
>>                                <field column = "Author"   name ="Author"/>
>>                                <field column = "Industry" name
>> ="Industry" sourceColName="Industry"/>
>>                        </entity>
>>                </entity>
>>        </document>
>> </dataConfig>
>>
>> Thanks,
>> -Tim
>>
>
>
>
> --
> --Noble Paul
>



-- 
-Tim

Re: DataImportHandler and delta-import question

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
I do not see anything wrong with this .It should have worked . Can you
check that dataimport.properties is created (by DIH) in the conf
directory? . check the content?


are you sure that the query

select DId from 2_Doc where ModifiedDate > '${dataimporter.last_index_time}'

works with  a date format yyyy-MM-dd HH:mm:ss . This is the format
which DIH sends the date in . If the format is wrong you may need to
format it using a dateformat function.

see here

http://wiki.apache.org/solr/DataImportHandler#head-5675e913396a42eb7c6c5d3c894ada5dadbb62d7


 The trunk DIH can work with Solr1.3 (you may need to put the DIH jar
and slf4j). Can

On Wed, Mar 4, 2009 at 3:53 AM, Garafola Timothy <ti...@gmail.com> wrote:
> I'm using solr 1.3 and am trying to get a delta-import with the DIH.
> Recently the wiki, http://wiki.apache.org/solr/DataImportHandler, was
> updated explaining that delta import is a 1.4 feature now but it was
> still possible get a delta using the full import example here,
> http://wiki.apache.org/solr/DataImportHandlerFaq#fullimportdelta.  I
> tried this but each time I run DIH, it reimports all rows and updates.
>
> Below is my data-config.xml.  I set rootEntity to false and issued
> command=full-import&clean=false&optimize=false through DIH.  Am I
> doing something wrong here or is the DataImportHandlerFaq incorrect?
>
> <dataConfig>
>        <dataSource driver="com.mysql.jdbc.Driver"
> url="jdbc:mysql://pencil-somewhere.com:22222/SomeDB" user="someUser"
>  password="somePassword"/>
>        <document name="">
>                <entity name = "item" rootEntity="false"
>                        query = "select DId from 2_Doc where
> ModifiedDate > '${dataimporter.last_index_time}'
>                                      and DocType != 'Research Articles'">
>                        <entity name="feature" pk="DId"
> transformer="RegexTransformer"
>                                query = "SELECT d.DId, d.SiteId,
> d.DocTitle, d.DocURL, d.DocDesc,
>                                        d.DocType, d.Tags, d.Source,
> d.Last90DaysRFIsPercent,
>                                        d.ModifiedDate, d.DocGuid, d.Author,
>                                        i.Industry FROM 2_Doc d LEFT
> OUTER JOIN tmp_DocIndustry i
>                                        ON (d.DocId=i.DocId AND
> d.SiteId=i.SiteId) where d.DocType != 'Research articles'
>                                        and d.DId = '${item.DId}' and
> d.ModifiedDate > '${dataimporter.last_index_time}'">
>                                <field column = "DId"   name ="did"/>
>                                <field column = "SiteId"   name ="SiteId"/>
>                                <field column = "DocId"   name ="DocId"/>
>                                <field column = "DocTitle"   name ="DocTitle"/>
>                                <field column = "DocURL"   name ="DocURL"/>
>                                <field column = "DocDesc" name ="DocDesc" />
>                                <field column = "Snippet"
> regex="^(.{0,800})\b.*$" sourceColName="DocDesc"/>
>                                <field column = "DocType"   name ="DocType"/>
>                                <field column = "Tags" name ="Tags"
> splitBy=";" sourceColName="Tags"/>
>                                <field column = "Source"   name ="Source"/>
>                                <field column =
> "Last90DaysRFIsPercent"   name ="Last90DaysRFIsPercent"/>
>                                <field column = "ModifiedDate"   name
> ="ModifiedDate"/>
>                                <field column = "DocGuid"   name ="DocGuid"/>
>                                <field column = "Author"   name ="Author"/>
>                                <field column = "Industry" name
> ="Industry" sourceColName="Industry"/>
>                        </entity>
>                </entity>
>        </document>
> </dataConfig>
>
> Thanks,
> -Tim
>



-- 
--Noble Paul