You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Marc Sturlese <ma...@gmail.com> on 2008/12/06 18:29:55 UTC

Delta-import hack to use last indexed id document

Hey there,
I am doing some hacks to some parts of the solr source. I am doing a feature
for everytime I use delta import hanlder I want it to start geting info from
the db starting from the last indexed document id (from the latest
execution).

The point of doing that is that if I start a full import and the process is
aborted for any reason, I want to be able to start a delta import and start
indexing from the last indexed id of the full import.

To do that basically I have created functions in solrwriter.java and
dataimporter.java. The funcions I have created are the same as the ones to
write and retrieve the timestamp to the dataimport.properties but mines do
it with an id (long instead of date).
I call this functions in docbuilder.java (in the places were functions for
timestamp were created)
I do one more thing... i write in the dataimport.properties every time I
call the function upload in docbuilder to upload a document.

The problem is that not every time the upload function (in docbuilder) is
called a commit is called aswell. So, if I kill -9 the process in the middle
of the execution i will have in the dataimport.properties the last uploaded
id but in the index (opening it with luke) I will have the last commited.

I have done some tests calling writer.commit(false) just after the upload or
setting in solrconfig.xml  <maxBufferedDocs>2</maxBufferedDocs>. With both
it works fine but opiously the indexer works extremely slow.

Is there any way to write in the dataimport.properties
(writer.persistIndexLastID(arow.get("id").toString())) just after every
commit but not calling myself the commit function? If not, I would apreciate
any advice about other ways to reach this goals.

If I get it done I will open an issue and upload there the patch cause I
thing that this can be a common use case.
Thanks in advanced



-- 
View this message in context: http://www.nabble.com/Delta-import-hack-to-use-last-indexed-id-document-tp20872450p20872450.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Delta-import hack to use last indexed id document

Posted by Jon Baer <jo...@gmail.com>.
This sounds a little like my original problem of deltaQuery imports  
per entity ...

https://issues.apache.org/jira/browse/SOLR-783

I wonder if those 2 hacks could be combined to fix the issue.

- Jon

On Dec 6, 2008, at 12:29 PM, Marc Sturlese wrote:

>
> Hey there,
> I am doing some hacks to some parts of the solr source. I am doing a  
> feature
> for everytime I use delta import hanlder I want it to start geting  
> info from
> the db starting from the last indexed document id (from the latest
> execution).
>
> The point of doing that is that if I start a full import and the  
> process is
> aborted for any reason, I want to be able to start a delta import  
> and start
> indexing from the last indexed id of the full import.
>
> To do that basically I have created functions in solrwriter.java and
> dataimporter.java. The funcions I have created are the same as the  
> ones to
> write and retrieve the timestamp to the dataimport.properties but  
> mines do
> it with an id (long instead of date).
> I call this functions in docbuilder.java (in the places were  
> functions for
> timestamp were created)
> I do one more thing... i write in the dataimport.properties every  
> time I
> call the function upload in docbuilder to upload a document.
>
> The problem is that not every time the upload function (in  
> docbuilder) is
> called a commit is called aswell. So, if I kill -9 the process in  
> the middle
> of the execution i will have in the dataimport.properties the last  
> uploaded
> id but in the index (opening it with luke) I will have the last  
> commited.
>
> I have done some tests calling writer.commit(false) just after the  
> upload or
> setting in solrconfig.xml  <maxBufferedDocs>2</maxBufferedDocs>.  
> With both
> it works fine but opiously the indexer works extremely slow.
>
> Is there any way to write in the dataimport.properties
> (writer.persistIndexLastID(arow.get("id").toString())) just after  
> every
> commit but not calling myself the commit function? If not, I would  
> apreciate
> any advice about other ways to reach this goals.
>
> If I get it done I will open an issue and upload there the patch  
> cause I
> thing that this can be a common use case.
> Thanks in advanced
>
>
>
> -- 
> View this message in context: http://www.nabble.com/Delta-import-hack-to-use-last-indexed-id-document-tp20872450p20872450.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Delta-import hack to use last indexed id document

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
On Sat, Dec 6, 2008 at 10:59 PM, Marc Sturlese <ma...@gmail.com> wrote:
>
> Hey there,
> I am doing some hacks to some parts of the solr source. I am doing a feature
> for everytime I use delta import hanlder I want it to start geting info from
> the db starting from the last indexed document id (from the latest
> execution).
>
> The point of doing that is that if I start a full import and the process is
> aborted for any reason, I want to be able to start a delta import and start
> indexing from the last indexed id of the full import.
>
> To do that basically I have created functions in solrwriter.java and
> dataimporter.java. The funcions I have created are the same as the ones to
> write and retrieve the timestamp to the dataimport.properties but mines do
> it with an id (long instead of date).
> I call this functions in docbuilder.java (in the places were functions for
> timestamp were created)
> I do one more thing... i write in the dataimport.properties every time I
> call the function upload in docbuilder to upload a document.
>
> The problem is that not every time the upload function (in docbuilder) is
> called a commit is called aswell. So, if I kill -9 the process in the middle
> of the execution i will have in the dataimport.properties the last uploaded
> id but in the index (opening it with luke) I will have the last commited.
>
> I have done some tests calling writer.commit(false) just after the upload or
> setting in solrconfig.xml  <maxBufferedDocs>2</maxBufferedDocs>. With both
> it works fine but opiously the indexer works extremely slow.
>
> Is there any way to write in the dataimport.properties
> (writer.persistIndexLastID(arow.get("id").toString())) just after every
> commit but not calling myself the commit function? If not, I would apreciate
> any advice about other ways to reach this goals.
I have recommended what would be an ideal solution as an API

the Context can have extra methods to do these
persist(String key, String val)

getPersisted(String key)

So any component can use these functionalities. SolrWriter is not
meant to be a public interface .It can change. But the implementation
of the above methods can be done in SolrWriter.java
>
> If I get it done I will open an issue and upload there the patch cause I
> thing that this can be a common use case.
> Thanks in advanced
>
>
>
> --
> View this message in context: http://www.nabble.com/Delta-import-hack-to-use-last-indexed-id-document-tp20872450p20872450.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul