You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Marc Sturlese <ma...@gmail.com> on 2008/12/02 10:31:12 UTC

DataImportHandler: Deleteing from index and db; lastIndexed id feature

Hey there,

I have my dataimporthanlder almost completely configured. I am missing three
goals. I don't think I can reach them just via xml conf or transformer and
sqlEntitProcessor plugin. But need to be sure of that.
If there's no other way I will hack some solr source classes, would like to
know the best way to do that. Once I have it solved, I can upload or post
the source in the forum in case someone think it can be helpful.

1.- Every time I execute dataimporthandler (to index data from a db), at the
start time or end time I need to delete some expired documents. I have to
delete them from the database and from the index. I know wich documents must
be deleted because of a field in the db that says it. Would not like to
delete first all from DB or first all from index but one from index and one
from doc every time.
The "delete mark" is setted as an update in the db row so I think I could
use deltaImport. Don't know If deletedPkQuery is the way to do that. Can not
find so much information about how to make it work. As deltaQuery modifies
docs (delete old and insert new) I supose it must be a easy way to do this
just doing the delete and not the new insert.

2.-This is probably my most difficult goal.
Deltaimport reads a timestamp from the dataimport.properties and modify/add
all documents from db wich were inserted after that date. What I want is to
be able to save in the field the id of the last idexed doc. So in the next
time I ejecute the indexer make it start indexing from that last indexed id
doc.
The point of doing this is that if I do a full import from a db with lots of
rows the app could encounter a problem in the middle of the execution and
abort the process. As deltaquey works I would have to restart the execution
from the begining. Having this new functionality I could optimize the index
and start from the last indexed doc.
I think I should begin modifying the SolrWriter.java and DocBuilder.java.
Creating functions like getStartTime, persistStartTime... for ID control

3.-I commented before about this last point. I want to give boost to doc
fields at indexing time.
>>Adding fieldboost is a planned item.

>>It must work as follows .
>>Add a special value $fieldBoost.<fieldname> to the row map

>>And DocBuilder should respect that. You can raise a bug and we can
>>commit it soon.
How can I do to rise a bug?

Thanks in advance




-- 
View this message in context: http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20788755.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
Good.
We need usecases like these and contributions from users .

This is a win-win
you will not have to manage the code yourself once it is checked in
As we have more eyes on the DIH code it will also improve

Thanks a lot,
Noble

On Wed, Dec 3, 2008 at 1:49 PM, Marc Sturlese <ma...@gmail.com> wrote:
>
> That's what I am trying to do. Thanks for the advice. Once I have it done I
> will rise the issue and upload the patch.
>
>
> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>
>> OK . I guess I see it.  I am thinking of exposing the writes to the
>> properties file via an API.
>>
>> say Context#persist(key,value);
>>
>>
>> This can write the data to the dataimport.properties.
>>
>> You must be able to retrieve that value by ${dataimport.persist.<key>}
>>
>> or through an API, Context.getPersistValue(key)
>>
>> You can raise an issue and give a patch and we can get it committed
>>
>> I guess this is what you wish to achieve
>>
>> --Noble
>>
>>
>>
>> On Wed, Dec 3, 2008 at 3:28 AM, Marc Sturlese <ma...@gmail.com>
>> wrote:
>>>
>>> Do you mean the file used by dataimporthandler called
>>> dataimport.properties?
>>> If you mean this one it's writen at the end of the indexing proccess. The
>>> writen date will be used in the next indexation by delta-query to
>>> identify
>>> the new or modified rows from the database.
>>>
>>> What I am trying to do is instead of saving a timestamp save the last
>>> indexed id. Doing that, in the next execution I will start indexing from
>>> the
>>> last doc that was indexed in the previous indexation. But I am still a
>>> bit
>>> confused about how to do that...
>>>
>>> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>>
>>>> delta-import file?
>>>>
>>>>
>>>> On Wed, Dec 3, 2008 at 12:08 AM, Lance Norskog <go...@gmail.com>
>>>> wrote:
>>>>> Does the DIH delta feature rewrite the delta-import file for each set
>>>>> of
>>>>> rows? If it does not, that sounds like a bug/enhancement.
>>>>> Lance
>>>>>
>>>>> -----Original Message-----
>>>>> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.paul@gmail.com]
>>>>> Sent: Tuesday, December 02, 2008 8:51 AM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: Re: DataImportHandler: Deleteing from index and db;
>>>>> lastIndexed
>>>>> id feature
>>>>>
>>>>> You can write the details to a file using a Transformer itself.
>>>>>
>>>>> It is wise to stick to the public API as far as possible. We will
>>>>> maintain back compat and your code will be usable w/ newer versions.
>>>>>
>>>>>
>>>>> On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese <ma...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Thanks I really apreciate your help.
>>>>>>
>>>>>> I didn't explain myself so well in here:
>>>>>>
>>>>>>> 2.-This is probably my most difficult goal.
>>>>>>> Deltaimport reads a timestamp from the dataimport.properties and
>>>>>>> modify/add all documents from db wich were inserted after that date.
>>>>>>> What I want is to be able to save in the field the id of the last
>>>>>>> idexed doc. So in the next time I ejecute the indexer make it start
>>>>>>> indexing from that last indexed id doc.
>>>>>> You can use a Transformer to write something to the DB.
>>>>>> Context#getDataSource(String) for each row
>>>>>>
>>>>>> When I said:
>>>>>>
>>>>>>> be able to save in the field the id of the last idexed doc
>>>>>> I made a mistake, wanted to mean :
>>>>>>
>>>>>> be able to save in the file (dataimport.properties) the id of the last
>>>>>> indexed doc.
>>>>>> The point would be to do my own deltaquery indexing from the last doc
>>>>>> indexed id instead of the timestamp.
>>>>>> So I think this would not work in that case (it's my mistake because
>>>>>> of the bad explanation):
>>>>>>
>>>>>>>You can use a Transformer to write something to the DB.
>>>>>>>Context#getDataSource(String) for each row
>>>>>>
>>>>>> It is because I was saying:
>>>>>>> I think I should begin modifying the SolrWriter.java and
>>>>>>> DocBuilder.java.
>>>>>>> Creating functions like getStartTime, persistStartTime... for ID
>>>>>>> control
>>>>>>
>>>>>> I am in the correct direction?
>>>>>>  Sorry for my englis and thanks in advance
>>>>>>
>>>>>>
>>>>>> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>>>>>
>>>>>>> On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese
>>>>>>> <ma...@gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hey there,
>>>>>>>>
>>>>>>>> I have my dataimporthanlder almost completely configured. I am
>>>>>>>> missing three goals. I don't think I can reach them just via xml
>>>>>>>> conf or transformer and sqlEntitProcessor plugin. But need to be
>>>>>>>> sure of that.
>>>>>>>> If there's no other way I will hack some solr source classes, would
>>>>>>>> like to know the best way to do that. Once I have it solved, I can
>>>>>>>> upload or post the source in the forum in case someone think it can
>>>>>>>> be helpful.
>>>>>>>>
>>>>>>>> 1.- Every time I execute dataimporthandler (to index data from a
>>>>>>>> db), at the start time or end time I need to delete some expired
>>>>>>>> documents. I have to delete them from the database and from the
>>>>>>>> index. I know wich documents must be deleted because of a field in
>>>>>>>> the db that says it. Would not like to delete first all from DB or
>>>>>>>> first all from index but one from index and one from doc every time.
>>>>>>>
>>>>>>> You can override the init() destroy() of the SqlEntityProcessor and
>>>>>>> use it as the processor for the root entity. At this point you can
>>>>>>> run the necessary db queries and solr delete queries . look at
>>>>>>> Context#getSolrCore() and Context#getdataSource(String)
>>>>>>>
>>>>>>>
>>>>>>>> The "delete mark" is setted as an update in the db row so I think I
>>>>>>>> could use deltaImport. Don't know If deletedPkQuery is the way to do
>>>>>>>> that. Can not find so much information about how to make it work. As
>>>>>>>> deltaQuery modifies docs (delete old and insert new) I supose it
>>>>>>>> must be a easy way to do this just doing the delete and not the new
>>>>>>>> insert.
>>>>>>> deletedPkQuery does everything first. it runs the query and uses that
>>>>>>> to identify the deleted rows.
>>>>>>>>
>>>>>>>> 2.-This is probably my most difficult goal.
>>>>>>>> Deltaimport reads a timestamp from the dataimport.properties and
>>>>>>>> modify/add all documents from db wich were inserted after that date.
>>>>>>>> What I want is to be able to save in the field the id of the last
>>>>>>>> idexed doc. So in the next time I ejecute the indexer make it start
>>>>>>>> indexing from that last indexed id doc.
>>>>>>> You can use a Transformer to write something to the DB.
>>>>>>> Context#getDataSource(String) for each row
>>>>>>>
>>>>>>>> The point of doing this is that if I do a full import from a db with
>>>>>>>> lots of rows the app could encounter a problem in the middle of the
>>>>>>>> execution and abort the process. As deltaquey works I would have to
>>>>>>>> restart the execution from the begining. Having this new
>>>>>>>> functionality I could optimize the index and start from the last
>>>>>>>> indexed doc.
>>>>>>>> I think I should begin modifying the SolrWriter.java and
>>>>>>>> DocBuilder.java.
>>>>>>>> Creating functions like getStartTime, persistStartTime... for ID
>>>>>>>> control
>>>>>>>>
>>>>>>>> 3.-I commented before about this last point. I want to give boost to
>>>>>>>> doc fields at indexing time.
>>>>>>>>>>Adding fieldboost is a planned item.
>>>>>>>>
>>>>>>>>>>It must work as follows .
>>>>>>>>>>Add a special value $fieldBoost.<fieldname> to the row map
>>>>>>>>
>>>>>>>>>>And DocBuilder should respect that. You can raise a bug and we can
>>>>>>>>>>commit it soon.
>>>>>>>> How can I do to rise a bug?
>>>>>>> https://issues.apache.org/jira/secure/CreateIssue!default.jspa
>>>>>>>>
>>>>>>>> Thanks in advance
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> View this message in context:
>>>>>>>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-
>>>>>>>> db--lastIndexed-id-feature-tp20788755p20788755.html
>>>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> --Noble Paul
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db
>>>>>> --lastIndexed-id-feature-tp20788755p20790542.html
>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> --Noble Paul
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> --Noble Paul
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20801932.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>>
>
> --
> View this message in context: http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20808620.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul

Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature

Posted by Marc Sturlese <ma...@gmail.com>.
That's what I am trying to do. Thanks for the advice. Once I have it done I
will rise the issue and upload the patch.
 

Noble Paul നോബിള്‍ नोब्ळ् wrote:
> 
> OK . I guess I see it.  I am thinking of exposing the writes to the
> properties file via an API.
> 
> say Context#persist(key,value);
> 
> 
> This can write the data to the dataimport.properties.
> 
> You must be able to retrieve that value by ${dataimport.persist.<key>}
> 
> or through an API, Context.getPersistValue(key)
> 
> You can raise an issue and give a patch and we can get it committed
> 
> I guess this is what you wish to achieve
> 
> --Noble
> 
> 
> 
> On Wed, Dec 3, 2008 at 3:28 AM, Marc Sturlese <ma...@gmail.com>
> wrote:
>>
>> Do you mean the file used by dataimporthandler called
>> dataimport.properties?
>> If you mean this one it's writen at the end of the indexing proccess. The
>> writen date will be used in the next indexation by delta-query to
>> identify
>> the new or modified rows from the database.
>>
>> What I am trying to do is instead of saving a timestamp save the last
>> indexed id. Doing that, in the next execution I will start indexing from
>> the
>> last doc that was indexed in the previous indexation. But I am still a
>> bit
>> confused about how to do that...
>>
>> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>
>>> delta-import file?
>>>
>>>
>>> On Wed, Dec 3, 2008 at 12:08 AM, Lance Norskog <go...@gmail.com>
>>> wrote:
>>>> Does the DIH delta feature rewrite the delta-import file for each set
>>>> of
>>>> rows? If it does not, that sounds like a bug/enhancement.
>>>> Lance
>>>>
>>>> -----Original Message-----
>>>> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.paul@gmail.com]
>>>> Sent: Tuesday, December 02, 2008 8:51 AM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: DataImportHandler: Deleteing from index and db;
>>>> lastIndexed
>>>> id feature
>>>>
>>>> You can write the details to a file using a Transformer itself.
>>>>
>>>> It is wise to stick to the public API as far as possible. We will
>>>> maintain back compat and your code will be usable w/ newer versions.
>>>>
>>>>
>>>> On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese <ma...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Thanks I really apreciate your help.
>>>>>
>>>>> I didn't explain myself so well in here:
>>>>>
>>>>>> 2.-This is probably my most difficult goal.
>>>>>> Deltaimport reads a timestamp from the dataimport.properties and
>>>>>> modify/add all documents from db wich were inserted after that date.
>>>>>> What I want is to be able to save in the field the id of the last
>>>>>> idexed doc. So in the next time I ejecute the indexer make it start
>>>>>> indexing from that last indexed id doc.
>>>>> You can use a Transformer to write something to the DB.
>>>>> Context#getDataSource(String) for each row
>>>>>
>>>>> When I said:
>>>>>
>>>>>> be able to save in the field the id of the last idexed doc
>>>>> I made a mistake, wanted to mean :
>>>>>
>>>>> be able to save in the file (dataimport.properties) the id of the last
>>>>> indexed doc.
>>>>> The point would be to do my own deltaquery indexing from the last doc
>>>>> indexed id instead of the timestamp.
>>>>> So I think this would not work in that case (it's my mistake because
>>>>> of the bad explanation):
>>>>>
>>>>>>You can use a Transformer to write something to the DB.
>>>>>>Context#getDataSource(String) for each row
>>>>>
>>>>> It is because I was saying:
>>>>>> I think I should begin modifying the SolrWriter.java and
>>>>>> DocBuilder.java.
>>>>>> Creating functions like getStartTime, persistStartTime... for ID
>>>>>> control
>>>>>
>>>>> I am in the correct direction?
>>>>>  Sorry for my englis and thanks in advance
>>>>>
>>>>>
>>>>> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>>>>
>>>>>> On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese
>>>>>> <ma...@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hey there,
>>>>>>>
>>>>>>> I have my dataimporthanlder almost completely configured. I am
>>>>>>> missing three goals. I don't think I can reach them just via xml
>>>>>>> conf or transformer and sqlEntitProcessor plugin. But need to be
>>>>>>> sure of that.
>>>>>>> If there's no other way I will hack some solr source classes, would
>>>>>>> like to know the best way to do that. Once I have it solved, I can
>>>>>>> upload or post the source in the forum in case someone think it can
>>>>>>> be helpful.
>>>>>>>
>>>>>>> 1.- Every time I execute dataimporthandler (to index data from a
>>>>>>> db), at the start time or end time I need to delete some expired
>>>>>>> documents. I have to delete them from the database and from the
>>>>>>> index. I know wich documents must be deleted because of a field in
>>>>>>> the db that says it. Would not like to delete first all from DB or
>>>>>>> first all from index but one from index and one from doc every time.
>>>>>>
>>>>>> You can override the init() destroy() of the SqlEntityProcessor and
>>>>>> use it as the processor for the root entity. At this point you can
>>>>>> run the necessary db queries and solr delete queries . look at
>>>>>> Context#getSolrCore() and Context#getdataSource(String)
>>>>>>
>>>>>>
>>>>>>> The "delete mark" is setted as an update in the db row so I think I
>>>>>>> could use deltaImport. Don't know If deletedPkQuery is the way to do
>>>>>>> that. Can not find so much information about how to make it work. As
>>>>>>> deltaQuery modifies docs (delete old and insert new) I supose it
>>>>>>> must be a easy way to do this just doing the delete and not the new
>>>>>>> insert.
>>>>>> deletedPkQuery does everything first. it runs the query and uses that
>>>>>> to identify the deleted rows.
>>>>>>>
>>>>>>> 2.-This is probably my most difficult goal.
>>>>>>> Deltaimport reads a timestamp from the dataimport.properties and
>>>>>>> modify/add all documents from db wich were inserted after that date.
>>>>>>> What I want is to be able to save in the field the id of the last
>>>>>>> idexed doc. So in the next time I ejecute the indexer make it start
>>>>>>> indexing from that last indexed id doc.
>>>>>> You can use a Transformer to write something to the DB.
>>>>>> Context#getDataSource(String) for each row
>>>>>>
>>>>>>> The point of doing this is that if I do a full import from a db with
>>>>>>> lots of rows the app could encounter a problem in the middle of the
>>>>>>> execution and abort the process. As deltaquey works I would have to
>>>>>>> restart the execution from the begining. Having this new
>>>>>>> functionality I could optimize the index and start from the last
>>>>>>> indexed doc.
>>>>>>> I think I should begin modifying the SolrWriter.java and
>>>>>>> DocBuilder.java.
>>>>>>> Creating functions like getStartTime, persistStartTime... for ID
>>>>>>> control
>>>>>>>
>>>>>>> 3.-I commented before about this last point. I want to give boost to
>>>>>>> doc fields at indexing time.
>>>>>>>>>Adding fieldboost is a planned item.
>>>>>>>
>>>>>>>>>It must work as follows .
>>>>>>>>>Add a special value $fieldBoost.<fieldname> to the row map
>>>>>>>
>>>>>>>>>And DocBuilder should respect that. You can raise a bug and we can
>>>>>>>>>commit it soon.
>>>>>>> How can I do to rise a bug?
>>>>>> https://issues.apache.org/jira/secure/CreateIssue!default.jspa
>>>>>>>
>>>>>>> Thanks in advance
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-
>>>>>>> db--lastIndexed-id-feature-tp20788755p20788755.html
>>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> --Noble Paul
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db
>>>>> --lastIndexed-id-feature-tp20788755p20790542.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> --Noble Paul
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> --Noble Paul
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20801932.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 

-- 
View this message in context: http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20808620.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
OK . I guess I see it.  I am thinking of exposing the writes to the
properties file via an API.

say Context#persist(key,value);


This can write the data to the dataimport.properties.

You must be able to retrieve that value by ${dataimport.persist.<key>}

or through an API, Context.getPersistValue(key)

You can raise an issue and give a patch and we can get it committed

I guess this is what you wish to achieve

--Noble



On Wed, Dec 3, 2008 at 3:28 AM, Marc Sturlese <ma...@gmail.com> wrote:
>
> Do you mean the file used by dataimporthandler called dataimport.properties?
> If you mean this one it's writen at the end of the indexing proccess. The
> writen date will be used in the next indexation by delta-query to identify
> the new or modified rows from the database.
>
> What I am trying to do is instead of saving a timestamp save the last
> indexed id. Doing that, in the next execution I will start indexing from the
> last doc that was indexed in the previous indexation. But I am still a bit
> confused about how to do that...
>
> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>
>> delta-import file?
>>
>>
>> On Wed, Dec 3, 2008 at 12:08 AM, Lance Norskog <go...@gmail.com> wrote:
>>> Does the DIH delta feature rewrite the delta-import file for each set of
>>> rows? If it does not, that sounds like a bug/enhancement.
>>> Lance
>>>
>>> -----Original Message-----
>>> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.paul@gmail.com]
>>> Sent: Tuesday, December 02, 2008 8:51 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: DataImportHandler: Deleteing from index and db; lastIndexed
>>> id feature
>>>
>>> You can write the details to a file using a Transformer itself.
>>>
>>> It is wise to stick to the public API as far as possible. We will
>>> maintain back compat and your code will be usable w/ newer versions.
>>>
>>>
>>> On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese <ma...@gmail.com>
>>> wrote:
>>>>
>>>> Thanks I really apreciate your help.
>>>>
>>>> I didn't explain myself so well in here:
>>>>
>>>>> 2.-This is probably my most difficult goal.
>>>>> Deltaimport reads a timestamp from the dataimport.properties and
>>>>> modify/add all documents from db wich were inserted after that date.
>>>>> What I want is to be able to save in the field the id of the last
>>>>> idexed doc. So in the next time I ejecute the indexer make it start
>>>>> indexing from that last indexed id doc.
>>>> You can use a Transformer to write something to the DB.
>>>> Context#getDataSource(String) for each row
>>>>
>>>> When I said:
>>>>
>>>>> be able to save in the field the id of the last idexed doc
>>>> I made a mistake, wanted to mean :
>>>>
>>>> be able to save in the file (dataimport.properties) the id of the last
>>>> indexed doc.
>>>> The point would be to do my own deltaquery indexing from the last doc
>>>> indexed id instead of the timestamp.
>>>> So I think this would not work in that case (it's my mistake because
>>>> of the bad explanation):
>>>>
>>>>>You can use a Transformer to write something to the DB.
>>>>>Context#getDataSource(String) for each row
>>>>
>>>> It is because I was saying:
>>>>> I think I should begin modifying the SolrWriter.java and
>>>>> DocBuilder.java.
>>>>> Creating functions like getStartTime, persistStartTime... for ID
>>>>> control
>>>>
>>>> I am in the correct direction?
>>>>  Sorry for my englis and thanks in advance
>>>>
>>>>
>>>> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>>>
>>>>> On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese
>>>>> <ma...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hey there,
>>>>>>
>>>>>> I have my dataimporthanlder almost completely configured. I am
>>>>>> missing three goals. I don't think I can reach them just via xml
>>>>>> conf or transformer and sqlEntitProcessor plugin. But need to be
>>>>>> sure of that.
>>>>>> If there's no other way I will hack some solr source classes, would
>>>>>> like to know the best way to do that. Once I have it solved, I can
>>>>>> upload or post the source in the forum in case someone think it can
>>>>>> be helpful.
>>>>>>
>>>>>> 1.- Every time I execute dataimporthandler (to index data from a
>>>>>> db), at the start time or end time I need to delete some expired
>>>>>> documents. I have to delete them from the database and from the
>>>>>> index. I know wich documents must be deleted because of a field in
>>>>>> the db that says it. Would not like to delete first all from DB or
>>>>>> first all from index but one from index and one from doc every time.
>>>>>
>>>>> You can override the init() destroy() of the SqlEntityProcessor and
>>>>> use it as the processor for the root entity. At this point you can
>>>>> run the necessary db queries and solr delete queries . look at
>>>>> Context#getSolrCore() and Context#getdataSource(String)
>>>>>
>>>>>
>>>>>> The "delete mark" is setted as an update in the db row so I think I
>>>>>> could use deltaImport. Don't know If deletedPkQuery is the way to do
>>>>>> that. Can not find so much information about how to make it work. As
>>>>>> deltaQuery modifies docs (delete old and insert new) I supose it
>>>>>> must be a easy way to do this just doing the delete and not the new
>>>>>> insert.
>>>>> deletedPkQuery does everything first. it runs the query and uses that
>>>>> to identify the deleted rows.
>>>>>>
>>>>>> 2.-This is probably my most difficult goal.
>>>>>> Deltaimport reads a timestamp from the dataimport.properties and
>>>>>> modify/add all documents from db wich were inserted after that date.
>>>>>> What I want is to be able to save in the field the id of the last
>>>>>> idexed doc. So in the next time I ejecute the indexer make it start
>>>>>> indexing from that last indexed id doc.
>>>>> You can use a Transformer to write something to the DB.
>>>>> Context#getDataSource(String) for each row
>>>>>
>>>>>> The point of doing this is that if I do a full import from a db with
>>>>>> lots of rows the app could encounter a problem in the middle of the
>>>>>> execution and abort the process. As deltaquey works I would have to
>>>>>> restart the execution from the begining. Having this new
>>>>>> functionality I could optimize the index and start from the last
>>>>>> indexed doc.
>>>>>> I think I should begin modifying the SolrWriter.java and
>>>>>> DocBuilder.java.
>>>>>> Creating functions like getStartTime, persistStartTime... for ID
>>>>>> control
>>>>>>
>>>>>> 3.-I commented before about this last point. I want to give boost to
>>>>>> doc fields at indexing time.
>>>>>>>>Adding fieldboost is a planned item.
>>>>>>
>>>>>>>>It must work as follows .
>>>>>>>>Add a special value $fieldBoost.<fieldname> to the row map
>>>>>>
>>>>>>>>And DocBuilder should respect that. You can raise a bug and we can
>>>>>>>>commit it soon.
>>>>>> How can I do to rise a bug?
>>>>> https://issues.apache.org/jira/secure/CreateIssue!default.jspa
>>>>>>
>>>>>> Thanks in advance
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-
>>>>>> db--lastIndexed-id-feature-tp20788755p20788755.html
>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> --Noble Paul
>>>>>
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db
>>>> --lastIndexed-id-feature-tp20788755p20790542.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> --Noble Paul
>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>>
>
> --
> View this message in context: http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20801932.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul

Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature

Posted by Marc Sturlese <ma...@gmail.com>.
Do you mean the file used by dataimporthandler called dataimport.properties? 
If you mean this one it's writen at the end of the indexing proccess. The
writen date will be used in the next indexation by delta-query to identify
the new or modified rows from the database.

What I am trying to do is instead of saving a timestamp save the last
indexed id. Doing that, in the next execution I will start indexing from the
last doc that was indexed in the previous indexation. But I am still a bit
confused about how to do that...

Noble Paul നോബിള്‍ नोब्ळ् wrote:
> 
> delta-import file?
> 
> 
> On Wed, Dec 3, 2008 at 12:08 AM, Lance Norskog <go...@gmail.com> wrote:
>> Does the DIH delta feature rewrite the delta-import file for each set of
>> rows? If it does not, that sounds like a bug/enhancement.
>> Lance
>>
>> -----Original Message-----
>> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.paul@gmail.com]
>> Sent: Tuesday, December 02, 2008 8:51 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: DataImportHandler: Deleteing from index and db; lastIndexed
>> id feature
>>
>> You can write the details to a file using a Transformer itself.
>>
>> It is wise to stick to the public API as far as possible. We will
>> maintain back compat and your code will be usable w/ newer versions.
>>
>>
>> On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese <ma...@gmail.com>
>> wrote:
>>>
>>> Thanks I really apreciate your help.
>>>
>>> I didn't explain myself so well in here:
>>>
>>>> 2.-This is probably my most difficult goal.
>>>> Deltaimport reads a timestamp from the dataimport.properties and
>>>> modify/add all documents from db wich were inserted after that date.
>>>> What I want is to be able to save in the field the id of the last
>>>> idexed doc. So in the next time I ejecute the indexer make it start
>>>> indexing from that last indexed id doc.
>>> You can use a Transformer to write something to the DB.
>>> Context#getDataSource(String) for each row
>>>
>>> When I said:
>>>
>>>> be able to save in the field the id of the last idexed doc
>>> I made a mistake, wanted to mean :
>>>
>>> be able to save in the file (dataimport.properties) the id of the last
>>> indexed doc.
>>> The point would be to do my own deltaquery indexing from the last doc
>>> indexed id instead of the timestamp.
>>> So I think this would not work in that case (it's my mistake because
>>> of the bad explanation):
>>>
>>>>You can use a Transformer to write something to the DB.
>>>>Context#getDataSource(String) for each row
>>>
>>> It is because I was saying:
>>>> I think I should begin modifying the SolrWriter.java and
>>>> DocBuilder.java.
>>>> Creating functions like getStartTime, persistStartTime... for ID
>>>> control
>>>
>>> I am in the correct direction?
>>>  Sorry for my englis and thanks in advance
>>>
>>>
>>> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>>
>>>> On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese
>>>> <ma...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Hey there,
>>>>>
>>>>> I have my dataimporthanlder almost completely configured. I am
>>>>> missing three goals. I don't think I can reach them just via xml
>>>>> conf or transformer and sqlEntitProcessor plugin. But need to be
>>>>> sure of that.
>>>>> If there's no other way I will hack some solr source classes, would
>>>>> like to know the best way to do that. Once I have it solved, I can
>>>>> upload or post the source in the forum in case someone think it can
>>>>> be helpful.
>>>>>
>>>>> 1.- Every time I execute dataimporthandler (to index data from a
>>>>> db), at the start time or end time I need to delete some expired
>>>>> documents. I have to delete them from the database and from the
>>>>> index. I know wich documents must be deleted because of a field in
>>>>> the db that says it. Would not like to delete first all from DB or
>>>>> first all from index but one from index and one from doc every time.
>>>>
>>>> You can override the init() destroy() of the SqlEntityProcessor and
>>>> use it as the processor for the root entity. At this point you can
>>>> run the necessary db queries and solr delete queries . look at
>>>> Context#getSolrCore() and Context#getdataSource(String)
>>>>
>>>>
>>>>> The "delete mark" is setted as an update in the db row so I think I
>>>>> could use deltaImport. Don't know If deletedPkQuery is the way to do
>>>>> that. Can not find so much information about how to make it work. As
>>>>> deltaQuery modifies docs (delete old and insert new) I supose it
>>>>> must be a easy way to do this just doing the delete and not the new
>>>>> insert.
>>>> deletedPkQuery does everything first. it runs the query and uses that
>>>> to identify the deleted rows.
>>>>>
>>>>> 2.-This is probably my most difficult goal.
>>>>> Deltaimport reads a timestamp from the dataimport.properties and
>>>>> modify/add all documents from db wich were inserted after that date.
>>>>> What I want is to be able to save in the field the id of the last
>>>>> idexed doc. So in the next time I ejecute the indexer make it start
>>>>> indexing from that last indexed id doc.
>>>> You can use a Transformer to write something to the DB.
>>>> Context#getDataSource(String) for each row
>>>>
>>>>> The point of doing this is that if I do a full import from a db with
>>>>> lots of rows the app could encounter a problem in the middle of the
>>>>> execution and abort the process. As deltaquey works I would have to
>>>>> restart the execution from the begining. Having this new
>>>>> functionality I could optimize the index and start from the last
>>>>> indexed doc.
>>>>> I think I should begin modifying the SolrWriter.java and
>>>>> DocBuilder.java.
>>>>> Creating functions like getStartTime, persistStartTime... for ID
>>>>> control
>>>>>
>>>>> 3.-I commented before about this last point. I want to give boost to
>>>>> doc fields at indexing time.
>>>>>>>Adding fieldboost is a planned item.
>>>>>
>>>>>>>It must work as follows .
>>>>>>>Add a special value $fieldBoost.<fieldname> to the row map
>>>>>
>>>>>>>And DocBuilder should respect that. You can raise a bug and we can
>>>>>>>commit it soon.
>>>>> How can I do to rise a bug?
>>>> https://issues.apache.org/jira/secure/CreateIssue!default.jspa
>>>>>
>>>>> Thanks in advance
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-
>>>>> db--lastIndexed-id-feature-tp20788755p20788755.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> --Noble Paul
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db
>>> --lastIndexed-id-feature-tp20788755p20790542.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 

-- 
View this message in context: http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20801932.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
delta-import file?


On Wed, Dec 3, 2008 at 12:08 AM, Lance Norskog <go...@gmail.com> wrote:
> Does the DIH delta feature rewrite the delta-import file for each set of rows? If it does not, that sounds like a bug/enhancement.
> Lance
>
> -----Original Message-----
> From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.paul@gmail.com]
> Sent: Tuesday, December 02, 2008 8:51 AM
> To: solr-user@lucene.apache.org
> Subject: Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature
>
> You can write the details to a file using a Transformer itself.
>
> It is wise to stick to the public API as far as possible. We will maintain back compat and your code will be usable w/ newer versions.
>
>
> On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese <ma...@gmail.com> wrote:
>>
>> Thanks I really apreciate your help.
>>
>> I didn't explain myself so well in here:
>>
>>> 2.-This is probably my most difficult goal.
>>> Deltaimport reads a timestamp from the dataimport.properties and
>>> modify/add all documents from db wich were inserted after that date.
>>> What I want is to be able to save in the field the id of the last
>>> idexed doc. So in the next time I ejecute the indexer make it start
>>> indexing from that last indexed id doc.
>> You can use a Transformer to write something to the DB.
>> Context#getDataSource(String) for each row
>>
>> When I said:
>>
>>> be able to save in the field the id of the last idexed doc
>> I made a mistake, wanted to mean :
>>
>> be able to save in the file (dataimport.properties) the id of the last
>> indexed doc.
>> The point would be to do my own deltaquery indexing from the last doc
>> indexed id instead of the timestamp.
>> So I think this would not work in that case (it's my mistake because
>> of the bad explanation):
>>
>>>You can use a Transformer to write something to the DB.
>>>Context#getDataSource(String) for each row
>>
>> It is because I was saying:
>>> I think I should begin modifying the SolrWriter.java and DocBuilder.java.
>>> Creating functions like getStartTime, persistStartTime... for ID
>>> control
>>
>> I am in the correct direction?
>>  Sorry for my englis and thanks in advance
>>
>>
>> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>
>>> On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese
>>> <ma...@gmail.com>
>>> wrote:
>>>>
>>>> Hey there,
>>>>
>>>> I have my dataimporthanlder almost completely configured. I am
>>>> missing three goals. I don't think I can reach them just via xml
>>>> conf or transformer and sqlEntitProcessor plugin. But need to be
>>>> sure of that.
>>>> If there's no other way I will hack some solr source classes, would
>>>> like to know the best way to do that. Once I have it solved, I can
>>>> upload or post the source in the forum in case someone think it can
>>>> be helpful.
>>>>
>>>> 1.- Every time I execute dataimporthandler (to index data from a
>>>> db), at the start time or end time I need to delete some expired
>>>> documents. I have to delete them from the database and from the
>>>> index. I know wich documents must be deleted because of a field in
>>>> the db that says it. Would not like to delete first all from DB or
>>>> first all from index but one from index and one from doc every time.
>>>
>>> You can override the init() destroy() of the SqlEntityProcessor and
>>> use it as the processor for the root entity. At this point you can
>>> run the necessary db queries and solr delete queries . look at
>>> Context#getSolrCore() and Context#getdataSource(String)
>>>
>>>
>>>> The "delete mark" is setted as an update in the db row so I think I
>>>> could use deltaImport. Don't know If deletedPkQuery is the way to do
>>>> that. Can not find so much information about how to make it work. As
>>>> deltaQuery modifies docs (delete old and insert new) I supose it
>>>> must be a easy way to do this just doing the delete and not the new
>>>> insert.
>>> deletedPkQuery does everything first. it runs the query and uses that
>>> to identify the deleted rows.
>>>>
>>>> 2.-This is probably my most difficult goal.
>>>> Deltaimport reads a timestamp from the dataimport.properties and
>>>> modify/add all documents from db wich were inserted after that date.
>>>> What I want is to be able to save in the field the id of the last
>>>> idexed doc. So in the next time I ejecute the indexer make it start
>>>> indexing from that last indexed id doc.
>>> You can use a Transformer to write something to the DB.
>>> Context#getDataSource(String) for each row
>>>
>>>> The point of doing this is that if I do a full import from a db with
>>>> lots of rows the app could encounter a problem in the middle of the
>>>> execution and abort the process. As deltaquey works I would have to
>>>> restart the execution from the begining. Having this new
>>>> functionality I could optimize the index and start from the last
>>>> indexed doc.
>>>> I think I should begin modifying the SolrWriter.java and DocBuilder.java.
>>>> Creating functions like getStartTime, persistStartTime... for ID
>>>> control
>>>>
>>>> 3.-I commented before about this last point. I want to give boost to
>>>> doc fields at indexing time.
>>>>>>Adding fieldboost is a planned item.
>>>>
>>>>>>It must work as follows .
>>>>>>Add a special value $fieldBoost.<fieldname> to the row map
>>>>
>>>>>>And DocBuilder should respect that. You can raise a bug and we can
>>>>>>commit it soon.
>>>> How can I do to rise a bug?
>>> https://issues.apache.org/jira/secure/CreateIssue!default.jspa
>>>>
>>>> Thanks in advance
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-
>>>> db--lastIndexed-id-feature-tp20788755p20788755.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> --Noble Paul
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db
>> --lastIndexed-id-feature-tp20788755p20790542.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
>
>
> --
> --Noble Paul
>
>



-- 
--Noble Paul

RE: DataImportHandler: Deleteing from index and db; lastIndexed id feature

Posted by Lance Norskog <go...@gmail.com>.
Does the DIH delta feature rewrite the delta-import file for each set of rows? If it does not, that sounds like a bug/enhancement. 
Lance

-----Original Message-----
From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.paul@gmail.com] 
Sent: Tuesday, December 02, 2008 8:51 AM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature

You can write the details to a file using a Transformer itself.

It is wise to stick to the public API as far as possible. We will maintain back compat and your code will be usable w/ newer versions.


On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese <ma...@gmail.com> wrote:
>
> Thanks I really apreciate your help.
>
> I didn't explain myself so well in here:
>
>> 2.-This is probably my most difficult goal.
>> Deltaimport reads a timestamp from the dataimport.properties and 
>> modify/add all documents from db wich were inserted after that date. 
>> What I want is to be able to save in the field the id of the last 
>> idexed doc. So in the next time I ejecute the indexer make it start 
>> indexing from that last indexed id doc.
> You can use a Transformer to write something to the DB.
> Context#getDataSource(String) for each row
>
> When I said:
>
>> be able to save in the field the id of the last idexed doc
> I made a mistake, wanted to mean :
>
> be able to save in the file (dataimport.properties) the id of the last 
> indexed doc.
> The point would be to do my own deltaquery indexing from the last doc 
> indexed id instead of the timestamp.
> So I think this would not work in that case (it's my mistake because 
> of the bad explanation):
>
>>You can use a Transformer to write something to the DB.
>>Context#getDataSource(String) for each row
>
> It is because I was saying:
>> I think I should begin modifying the SolrWriter.java and DocBuilder.java.
>> Creating functions like getStartTime, persistStartTime... for ID 
>> control
>
> I am in the correct direction?
>  Sorry for my englis and thanks in advance
>
>
> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>
>> On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese 
>> <ma...@gmail.com>
>> wrote:
>>>
>>> Hey there,
>>>
>>> I have my dataimporthanlder almost completely configured. I am 
>>> missing three goals. I don't think I can reach them just via xml 
>>> conf or transformer and sqlEntitProcessor plugin. But need to be 
>>> sure of that.
>>> If there's no other way I will hack some solr source classes, would 
>>> like to know the best way to do that. Once I have it solved, I can 
>>> upload or post the source in the forum in case someone think it can 
>>> be helpful.
>>>
>>> 1.- Every time I execute dataimporthandler (to index data from a 
>>> db), at the start time or end time I need to delete some expired 
>>> documents. I have to delete them from the database and from the 
>>> index. I know wich documents must be deleted because of a field in 
>>> the db that says it. Would not like to delete first all from DB or 
>>> first all from index but one from index and one from doc every time.
>>
>> You can override the init() destroy() of the SqlEntityProcessor and 
>> use it as the processor for the root entity. At this point you can 
>> run the necessary db queries and solr delete queries . look at
>> Context#getSolrCore() and Context#getdataSource(String)
>>
>>
>>> The "delete mark" is setted as an update in the db row so I think I 
>>> could use deltaImport. Don't know If deletedPkQuery is the way to do 
>>> that. Can not find so much information about how to make it work. As 
>>> deltaQuery modifies docs (delete old and insert new) I supose it 
>>> must be a easy way to do this just doing the delete and not the new 
>>> insert.
>> deletedPkQuery does everything first. it runs the query and uses that 
>> to identify the deleted rows.
>>>
>>> 2.-This is probably my most difficult goal.
>>> Deltaimport reads a timestamp from the dataimport.properties and 
>>> modify/add all documents from db wich were inserted after that date. 
>>> What I want is to be able to save in the field the id of the last 
>>> idexed doc. So in the next time I ejecute the indexer make it start 
>>> indexing from that last indexed id doc.
>> You can use a Transformer to write something to the DB.
>> Context#getDataSource(String) for each row
>>
>>> The point of doing this is that if I do a full import from a db with 
>>> lots of rows the app could encounter a problem in the middle of the 
>>> execution and abort the process. As deltaquey works I would have to 
>>> restart the execution from the begining. Having this new 
>>> functionality I could optimize the index and start from the last 
>>> indexed doc.
>>> I think I should begin modifying the SolrWriter.java and DocBuilder.java.
>>> Creating functions like getStartTime, persistStartTime... for ID 
>>> control
>>>
>>> 3.-I commented before about this last point. I want to give boost to 
>>> doc fields at indexing time.
>>>>>Adding fieldboost is a planned item.
>>>
>>>>>It must work as follows .
>>>>>Add a special value $fieldBoost.<fieldname> to the row map
>>>
>>>>>And DocBuilder should respect that. You can raise a bug and we can 
>>>>>commit it soon.
>>> How can I do to rise a bug?
>> https://issues.apache.org/jira/secure/CreateIssue!default.jspa
>>>
>>> Thanks in advance
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-
>>> db--lastIndexed-id-feature-tp20788755p20788755.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db
> --lastIndexed-id-feature-tp20788755p20790542.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



--
--Noble Paul


Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
You can write the details to a file using a Transformer itself.

It is wise to stick to the public API as far as possible. We will
maintain back compat and your code will be usable w/ newer versions.


On Tue, Dec 2, 2008 at 5:12 PM, Marc Sturlese <ma...@gmail.com> wrote:
>
> Thanks I really apreciate your help.
>
> I didn't explain myself so well in here:
>
>> 2.-This is probably my most difficult goal.
>> Deltaimport reads a timestamp from the dataimport.properties and
>> modify/add
>> all documents from db wich were inserted after that date. What I want is
>> to
>> be able to save in the field the id of the last idexed doc. So in the next
>> time I ejecute the indexer make it start indexing from that last indexed
>> id
>> doc.
> You can use a Transformer to write something to the DB.
> Context#getDataSource(String) for each row
>
> When I said:
>
>> be able to save in the field the id of the last idexed doc
> I made a mistake, wanted to mean :
>
> be able to save in the file (dataimport.properties) the id of the last
> indexed doc.
> The point would be to do my own deltaquery indexing from the last doc
> indexed id instead of the timestamp.
> So I think this would not work in that case (it's my mistake because of the
> bad explanation):
>
>>You can use a Transformer to write something to the DB.
>>Context#getDataSource(String) for each row
>
> It is because I was saying:
>> I think I should begin modifying the SolrWriter.java and DocBuilder.java.
>> Creating functions like getStartTime, persistStartTime... for ID control
>
> I am in the correct direction?
>  Sorry for my englis and thanks in advance
>
>
> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>
>> On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese <ma...@gmail.com>
>> wrote:
>>>
>>> Hey there,
>>>
>>> I have my dataimporthanlder almost completely configured. I am missing
>>> three
>>> goals. I don't think I can reach them just via xml conf or transformer
>>> and
>>> sqlEntitProcessor plugin. But need to be sure of that.
>>> If there's no other way I will hack some solr source classes, would like
>>> to
>>> know the best way to do that. Once I have it solved, I can upload or post
>>> the source in the forum in case someone think it can be helpful.
>>>
>>> 1.- Every time I execute dataimporthandler (to index data from a db), at
>>> the
>>> start time or end time I need to delete some expired documents. I have to
>>> delete them from the database and from the index. I know wich documents
>>> must
>>> be deleted because of a field in the db that says it. Would not like to
>>> delete first all from DB or first all from index but one from index and
>>> one
>>> from doc every time.
>>
>> You can override the init() destroy() of the SqlEntityProcessor and
>> use it as the processor for the root entity. At this point you can run
>> the necessary db queries and solr delete queries . look at
>> Context#getSolrCore() and Context#getdataSource(String)
>>
>>
>>> The "delete mark" is setted as an update in the db row so I think I could
>>> use deltaImport. Don't know If deletedPkQuery is the way to do that. Can
>>> not
>>> find so much information about how to make it work. As deltaQuery
>>> modifies
>>> docs (delete old and insert new) I supose it must be a easy way to do
>>> this
>>> just doing the delete and not the new insert.
>> deletedPkQuery does everything first. it runs the query and uses that
>> to identify the deleted rows.
>>>
>>> 2.-This is probably my most difficult goal.
>>> Deltaimport reads a timestamp from the dataimport.properties and
>>> modify/add
>>> all documents from db wich were inserted after that date. What I want is
>>> to
>>> be able to save in the field the id of the last idexed doc. So in the
>>> next
>>> time I ejecute the indexer make it start indexing from that last indexed
>>> id
>>> doc.
>> You can use a Transformer to write something to the DB.
>> Context#getDataSource(String) for each row
>>
>>> The point of doing this is that if I do a full import from a db with lots
>>> of
>>> rows the app could encounter a problem in the middle of the execution and
>>> abort the process. As deltaquey works I would have to restart the
>>> execution
>>> from the begining. Having this new functionality I could optimize the
>>> index
>>> and start from the last indexed doc.
>>> I think I should begin modifying the SolrWriter.java and DocBuilder.java.
>>> Creating functions like getStartTime, persistStartTime... for ID control
>>>
>>> 3.-I commented before about this last point. I want to give boost to doc
>>> fields at indexing time.
>>>>>Adding fieldboost is a planned item.
>>>
>>>>>It must work as follows .
>>>>>Add a special value $fieldBoost.<fieldname> to the row map
>>>
>>>>>And DocBuilder should respect that. You can raise a bug and we can
>>>>>commit it soon.
>>> How can I do to rise a bug?
>> https://issues.apache.org/jira/secure/CreateIssue!default.jspa
>>>
>>> Thanks in advance
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20788755.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>>
>
> --
> View this message in context: http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20790542.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul

Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature

Posted by Marc Sturlese <ma...@gmail.com>.
Thanks I really apreciate your help.

I didn't explain myself so well in here:

> 2.-This is probably my most difficult goal.
> Deltaimport reads a timestamp from the dataimport.properties and
> modify/add
> all documents from db wich were inserted after that date. What I want is
> to
> be able to save in the field the id of the last idexed doc. So in the next
> time I ejecute the indexer make it start indexing from that last indexed
> id
> doc.
You can use a Transformer to write something to the DB.
Context#getDataSource(String) for each row

When I said:

> be able to save in the field the id of the last idexed doc
I made a mistake, wanted to mean :

be able to save in the file (dataimport.properties) the id of the last
indexed doc.
The point would be to do my own deltaquery indexing from the last doc
indexed id instead of the timestamp.
So I think this would not work in that case (it's my mistake because of the
bad explanation):

>You can use a Transformer to write something to the DB.
>Context#getDataSource(String) for each row

It is because I was saying:
> I think I should begin modifying the SolrWriter.java and DocBuilder.java.
> Creating functions like getStartTime, persistStartTime... for ID control 

I am in the correct direction?
 Sorry for my englis and thanks in advance


Noble Paul നോബിള്‍ नोब्ळ् wrote:
> 
> On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese <ma...@gmail.com>
> wrote:
>>
>> Hey there,
>>
>> I have my dataimporthanlder almost completely configured. I am missing
>> three
>> goals. I don't think I can reach them just via xml conf or transformer
>> and
>> sqlEntitProcessor plugin. But need to be sure of that.
>> If there's no other way I will hack some solr source classes, would like
>> to
>> know the best way to do that. Once I have it solved, I can upload or post
>> the source in the forum in case someone think it can be helpful.
>>
>> 1.- Every time I execute dataimporthandler (to index data from a db), at
>> the
>> start time or end time I need to delete some expired documents. I have to
>> delete them from the database and from the index. I know wich documents
>> must
>> be deleted because of a field in the db that says it. Would not like to
>> delete first all from DB or first all from index but one from index and
>> one
>> from doc every time.
> 
> You can override the init() destroy() of the SqlEntityProcessor and
> use it as the processor for the root entity. At this point you can run
> the necessary db queries and solr delete queries . look at
> Context#getSolrCore() and Context#getdataSource(String)
> 
> 
>> The "delete mark" is setted as an update in the db row so I think I could
>> use deltaImport. Don't know If deletedPkQuery is the way to do that. Can
>> not
>> find so much information about how to make it work. As deltaQuery
>> modifies
>> docs (delete old and insert new) I supose it must be a easy way to do
>> this
>> just doing the delete and not the new insert.
> deletedPkQuery does everything first. it runs the query and uses that
> to identify the deleted rows.
>>
>> 2.-This is probably my most difficult goal.
>> Deltaimport reads a timestamp from the dataimport.properties and
>> modify/add
>> all documents from db wich were inserted after that date. What I want is
>> to
>> be able to save in the field the id of the last idexed doc. So in the
>> next
>> time I ejecute the indexer make it start indexing from that last indexed
>> id
>> doc.
> You can use a Transformer to write something to the DB.
> Context#getDataSource(String) for each row
> 
>> The point of doing this is that if I do a full import from a db with lots
>> of
>> rows the app could encounter a problem in the middle of the execution and
>> abort the process. As deltaquey works I would have to restart the
>> execution
>> from the begining. Having this new functionality I could optimize the
>> index
>> and start from the last indexed doc.
>> I think I should begin modifying the SolrWriter.java and DocBuilder.java.
>> Creating functions like getStartTime, persistStartTime... for ID control
>>
>> 3.-I commented before about this last point. I want to give boost to doc
>> fields at indexing time.
>>>>Adding fieldboost is a planned item.
>>
>>>>It must work as follows .
>>>>Add a special value $fieldBoost.<fieldname> to the row map
>>
>>>>And DocBuilder should respect that. You can raise a bug and we can
>>>>commit it soon.
>> How can I do to rise a bug?
> https://issues.apache.org/jira/secure/CreateIssue!default.jspa
>>
>> Thanks in advance
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20788755.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 

-- 
View this message in context: http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20790542.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImportHandler: Deleteing from index and db; lastIndexed id feature

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
On Tue, Dec 2, 2008 at 3:01 PM, Marc Sturlese <ma...@gmail.com> wrote:
>
> Hey there,
>
> I have my dataimporthanlder almost completely configured. I am missing three
> goals. I don't think I can reach them just via xml conf or transformer and
> sqlEntitProcessor plugin. But need to be sure of that.
> If there's no other way I will hack some solr source classes, would like to
> know the best way to do that. Once I have it solved, I can upload or post
> the source in the forum in case someone think it can be helpful.
>
> 1.- Every time I execute dataimporthandler (to index data from a db), at the
> start time or end time I need to delete some expired documents. I have to
> delete them from the database and from the index. I know wich documents must
> be deleted because of a field in the db that says it. Would not like to
> delete first all from DB or first all from index but one from index and one
> from doc every time.

You can override the init() destroy() of the SqlEntityProcessor and
use it as the processor for the root entity. At this point you can run
the necessary db queries and solr delete queries . look at
Context#getSolrCore() and Context#getdataSource(String)


> The "delete mark" is setted as an update in the db row so I think I could
> use deltaImport. Don't know If deletedPkQuery is the way to do that. Can not
> find so much information about how to make it work. As deltaQuery modifies
> docs (delete old and insert new) I supose it must be a easy way to do this
> just doing the delete and not the new insert.
deletedPkQuery does everything first. it runs the query and uses that
to identify the deleted rows.
>
> 2.-This is probably my most difficult goal.
> Deltaimport reads a timestamp from the dataimport.properties and modify/add
> all documents from db wich were inserted after that date. What I want is to
> be able to save in the field the id of the last idexed doc. So in the next
> time I ejecute the indexer make it start indexing from that last indexed id
> doc.
You can use a Transformer to write something to the DB.
Context#getDataSource(String) for each row

> The point of doing this is that if I do a full import from a db with lots of
> rows the app could encounter a problem in the middle of the execution and
> abort the process. As deltaquey works I would have to restart the execution
> from the begining. Having this new functionality I could optimize the index
> and start from the last indexed doc.
> I think I should begin modifying the SolrWriter.java and DocBuilder.java.
> Creating functions like getStartTime, persistStartTime... for ID control
>
> 3.-I commented before about this last point. I want to give boost to doc
> fields at indexing time.
>>>Adding fieldboost is a planned item.
>
>>>It must work as follows .
>>>Add a special value $fieldBoost.<fieldname> to the row map
>
>>>And DocBuilder should respect that. You can raise a bug and we can
>>>commit it soon.
> How can I do to rise a bug?
https://issues.apache.org/jira/secure/CreateIssue!default.jspa
>
> Thanks in advance
>
>
>
>
> --
> View this message in context: http://www.nabble.com/DataImportHandler%3A-Deleteing-from-index-and-db--lastIndexed-id-feature-tp20788755p20788755.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul