You are viewing a plain text version of this content. The canonical link for it is here.
Posted to slide-dev@jakarta.apache.org by Eirikur Hrafnsson <ei...@idega.is> on 2005/03/08 16:11:22 UTC

Lucene reindexer?

Hi all (long time no bugging you... ; )

a while ago I asked if there was a way to re-index the lucene index for 
slide. This is pretty crucial feature in my opinion since the Slide 
index is always stored on the file system regardless of what kind of 
store you have thus making it harder to move a website from development 
to production, backing it up and especially when you want to enable the 
lucene indexing on an existing Slide store...

Is this possible today?

Best Regards

Eirikur S. Hrafnsson, eiki@idega.is
Chief Software Engineer
Idega Software
http://www.idega.com

p.s.
the SimpleXMLExtractor XPath stuff still doesn't work if you specify a 
namespace other than "DAV:"  : (


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-dev-help@jakarta.apache.org


Re: Lucene reindexer?

Posted by Eirikur Hrafnsson <ei...@idega.is>.
Duhhh hehe ok I will try that thanks :)
-Eiki

On 4.4.2005, at 07:49, Stefan Lützkendorf wrote:

> Simply delete the index directory and restart the tomcat. If there is  
> no index
> the indexers should run an initialization on server startup.
>
> Stefan
>
> Eirikur Hrafnsson wrote:
>> Cool, how do I try it?
>> On 30.3.2005, at 16:48, Stefan Lützkendorf wrote:
>>> Hi Eirikur,
>>>
>>> I recently checked in some first try for initalizing an index for an  
>>>  existing store
>>> (Tested with txfile store).
>>>
>>> The indexer scans all docs in the store if there is no index on   
>>> startup.
>>>
>>> Give it a try if you want.
>>>
>>> Rgadards, Stefan
>>>
>>> Eirikur Hrafnsson wrote:
>>>
>>>> Hi James,
>>>> do you have time now to update/integrate the batch indexer for  
>>>> Slide?  I  really need it badly : /
>>>> best regards
>>>> Eirikur, Idega.
>>>> On 12.3.2005, at 06:15, James Mason wrote:
>>>>
>>>>> Sorry I didn't post this early. I was hoping I'd have time to  
>>>>> clean  it
>>>>> up and actually integrate it into Slide, but I've been completely
>>>>> swamped lately.
>>>>>
>>>>> I've uploaded a code dump from my first working version to
>>>>> http://cvs.apache.org/~masonjm/batchindexer/
>>>>>
>>>>> I know that it contains bugs, since I've fixed a few I made the  
>>>>> dump.
>>>>> Also, keep in mind that the code won't work as posted. The only
>>>>> implementation I've made works with Autonomy for the search  
>>>>> engine,  and
>>>>> I didn't post the piece that actually talks to Autonomy. There's    
>>>>> nothing
>>>>> in there that would be useful for Lucene anyway.
>>>>>
>>>>> To make this generally useful there will need to be an   
>>>>> implementation  of
>>>>> QueueProcessor that supports Lucene. I've included an example
>>>>> implementation (for Autonomy) that should be a good starting point.
>>>>>
>>>>> There also needs to be a way to start/stop the batch indexer. I've
>>>>> implemented a Spring-based MVC webapp for controlling it on my   
>>>>> server,
>>>>> but I'm not sure if this is the best approach for a more general
>>>>> solution. Also, this is one area I know for sure contains bugs.   
>>>>> Someone
>>>>> who actually knows what they're doing should take a look at the  
>>>>> run()
>>>>> logic for BatchIndexer to make it properly resumable. My latest   
>>>>> version
>>>>> seems to work alright, but this is an earlier snapshot so the logic
>>>>> still has errors.
>>>>>
>>>>> Also, since this whole thing uses Spring to glue everything  
>>>>> together
>>>>> you'll need to get the Spring jars for it to work. I *think* I   
>>>>> patched
>>>>> the code in CVS to expose the ApplicationContext to the lower   
>>>>> levels. I
>>>>> think a servlet filter would be a better approach, but be aware  
>>>>> that  if
>>>>> you want to do this with Slide 2.1 you'll need to go through some   
>>>>> extra
>>>>> steps.
>>>>>
>>>>> Holler if there are any questions.
>>>>>
>>>>> -James
>>>>>
>>>>> On Wed, 2005-03-09 at 11:15 +0000, Eirikur Hrafnsson wrote:
>>>>>
>>>>>> Hi Stefan,
>>>>>>
>>>>>> On 9.3.2005, at 08:52, Stefan Lützkendorf wrote:
>>>>>>
>>>>>>> Hi Eirikur,
>>>>>>>
>>>>>>> the reindex problem is still unresolved :-(.
>>>>>>> I'm currently thinking about this, because I think it's crucial   
>>>>>>> too.
>>>>>>
>>>>>>
>>>>>> Yup, especially when you want to use Lucene on an existing store.
>>>>>> Somebody mentioned he was working on a batch indexer when we last
>>>>>> discussed this and he was going to commit it, was it Christophe or
>>>>>> Daniel perhaps...I can't find the email....
>>>>>>
>>>>>> cheers
>>>>>> Eiki, Idega.
>>>>>>
>>>>>>>
>>>>>>> Stefan
>>>>>>>
>>>>>>> Eirikur Hrafnsson wrote:
>>>>>>>
>>>>>>>> Hi all (long time no bugging you... ; )
>>>>>>>> a while ago I asked if there was a way to re-index the lucene   
>>>>>>>> index
>>>>>>>> for slide. This is pretty crucial feature in my opinion since  
>>>>>>>> the
>>>>>>>> Slide index is always stored on the file system regardless of  
>>>>>>>> what
>>>>>>>> kind of store you have thus making it harder to move a website   
>>>>>>>> from
>>>>>>>> development to production, backing it up and especially when  
>>>>>>>> you   want
>>>>>>>> to enable the lucene indexing on an existing Slide store...
>>>>>>>> Is this possible today?
>>>>>>>> Best Regards
>>>>>>>> Eirikur S. Hrafnsson, eiki@idega.is
>>>>>>>> Chief Software Engineer
>>>>>>>> Idega Software
>>>>>>>> http://www.idega.com
>>>>>>>> p.s.
>>>>>>>> the SimpleXMLExtractor XPath stuff still doesn't work if you   
>>>>>>>> specify
>>>>>>>> a namespace other than "DAV:"  : (
>>>>>>>> ---------------------------------------------------------------- 
>>>>>>>> -- -- -
>>>>>>>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>>>>>>>> For additional commands, e-mail:  
>>>>>>>> slide-dev-help@jakarta.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -- 
>>>>>>> Stefan Lützkendorf  --  luetzkendorf@apache.org
>>>>>>>
>>>>>>> ----------------------------------------------------------------- 
>>>>>>> -- --
>>>>>>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>>>>>>> For additional commands, e-mail:  
>>>>>>> slide-dev-help@jakarta.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Best Regards
>>>>>>
>>>>>> Eirikur S. Hrafnsson, eiki@idega.is
>>>>>> Chief Software Engineer
>>>>>> Idega Software
>>>>>> http://www.idega.com
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------ 
>>>>>> -- -
>>>>>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>>>>>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------- 
>>>>> --
>>>>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>>>>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>>>>>
>>>>>
>>>>>
>>>> Best Regards
>>>> Eirikur S. Hrafnsson, eiki@idega.is
>>>> Chief Software Engineer
>>>> Idega Software
>>>> http://www.idega.com
>>>> -------------------------------------------------------------------- 
>>>> -
>>>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>>>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>>>
>>>
>>>
>>> -- 
>>> Stefan Lützkendorf  --  luetzkendorf@apache.org
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>>>
>>>
>>>
>> Best Regards
>> Eirikur S. Hrafnsson, eiki@idega.is
>> Chief Software Engineer
>> Idega Software
>> http://www.idega.com
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>
>
> -- 
> Stefan Lützkendorf  --  luetzkendorf@apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>
>
>
Best Regards

Eirikur S. Hrafnsson, eiki@idega.is
Chief Software Engineer
Idega Software
http://www.idega.com


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-dev-help@jakarta.apache.org


Re: Lucene reindexer?

Posted by Stefan Lützkendorf <lu...@apache.org>.
Simply delete the index directory and restart the tomcat. If there is no index
the indexers should run an initialization on server startup.

Stefan

Eirikur Hrafnsson wrote:
> Cool, how do I try it?
> 
> 
> On 30.3.2005, at 16:48, Stefan Lützkendorf wrote:
> 
>> Hi Eirikur,
>>
>> I recently checked in some first try for initalizing an index for an  
>> existing store
>> (Tested with txfile store).
>>
>> The indexer scans all docs in the store if there is no index on  startup.
>>
>> Give it a try if you want.
>>
>> Rgadards, Stefan
>>
>> Eirikur Hrafnsson wrote:
>>
>>> Hi James,
>>> do you have time now to update/integrate the batch indexer for 
>>> Slide?  I  really need it badly : /
>>> best regards
>>> Eirikur, Idega.
>>> On 12.3.2005, at 06:15, James Mason wrote:
>>>
>>>> Sorry I didn't post this early. I was hoping I'd have time to clean  it
>>>> up and actually integrate it into Slide, but I've been completely
>>>> swamped lately.
>>>>
>>>> I've uploaded a code dump from my first working version to
>>>> http://cvs.apache.org/~masonjm/batchindexer/
>>>>
>>>> I know that it contains bugs, since I've fixed a few I made the dump.
>>>> Also, keep in mind that the code won't work as posted. The only
>>>> implementation I've made works with Autonomy for the search engine,  
>>>> and
>>>> I didn't post the piece that actually talks to Autonomy. There's   
>>>> nothing
>>>> in there that would be useful for Lucene anyway.
>>>>
>>>> To make this generally useful there will need to be an  
>>>> implementation  of
>>>> QueueProcessor that supports Lucene. I've included an example
>>>> implementation (for Autonomy) that should be a good starting point.
>>>>
>>>> There also needs to be a way to start/stop the batch indexer. I've
>>>> implemented a Spring-based MVC webapp for controlling it on my  server,
>>>> but I'm not sure if this is the best approach for a more general
>>>> solution. Also, this is one area I know for sure contains bugs.  
>>>> Someone
>>>> who actually knows what they're doing should take a look at the run()
>>>> logic for BatchIndexer to make it properly resumable. My latest  
>>>> version
>>>> seems to work alright, but this is an earlier snapshot so the logic
>>>> still has errors.
>>>>
>>>> Also, since this whole thing uses Spring to glue everything together
>>>> you'll need to get the Spring jars for it to work. I *think* I  patched
>>>> the code in CVS to expose the ApplicationContext to the lower  
>>>> levels. I
>>>> think a servlet filter would be a better approach, but be aware 
>>>> that  if
>>>> you want to do this with Slide 2.1 you'll need to go through some  
>>>> extra
>>>> steps.
>>>>
>>>> Holler if there are any questions.
>>>>
>>>> -James
>>>>
>>>> On Wed, 2005-03-09 at 11:15 +0000, Eirikur Hrafnsson wrote:
>>>>
>>>>> Hi Stefan,
>>>>>
>>>>> On 9.3.2005, at 08:52, Stefan Lützkendorf wrote:
>>>>>
>>>>>> Hi Eirikur,
>>>>>>
>>>>>> the reindex problem is still unresolved :-(.
>>>>>> I'm currently thinking about this, because I think it's crucial  too.
>>>>>
>>>>>
>>>>> Yup, especially when you want to use Lucene on an existing store.
>>>>> Somebody mentioned he was working on a batch indexer when we last
>>>>> discussed this and he was going to commit it, was it Christophe or
>>>>> Daniel perhaps...I can't find the email....
>>>>>
>>>>> cheers
>>>>> Eiki, Idega.
>>>>>
>>>>>>
>>>>>> Stefan
>>>>>>
>>>>>> Eirikur Hrafnsson wrote:
>>>>>>
>>>>>>> Hi all (long time no bugging you... ; )
>>>>>>> a while ago I asked if there was a way to re-index the lucene  index
>>>>>>> for slide. This is pretty crucial feature in my opinion since the
>>>>>>> Slide index is always stored on the file system regardless of what
>>>>>>> kind of store you have thus making it harder to move a website  from
>>>>>>> development to production, backing it up and especially when 
>>>>>>> you   want
>>>>>>> to enable the lucene indexing on an existing Slide store...
>>>>>>> Is this possible today?
>>>>>>> Best Regards
>>>>>>> Eirikur S. Hrafnsson, eiki@idega.is
>>>>>>> Chief Software Engineer
>>>>>>> Idega Software
>>>>>>> http://www.idega.com
>>>>>>> p.s.
>>>>>>> the SimpleXMLExtractor XPath stuff still doesn't work if you  
>>>>>>> specify
>>>>>>> a namespace other than "DAV:"  : (
>>>>>>> ------------------------------------------------------------------ 
>>>>>>> -- -
>>>>>>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>>>>>>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> Stefan Lützkendorf  --  luetzkendorf@apache.org
>>>>>>
>>>>>> ------------------------------------------------------------------- 
>>>>>> -- 
>>>>>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>>>>>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>> Best Regards
>>>>>
>>>>> Eirikur S. Hrafnsson, eiki@idega.is
>>>>> Chief Software Engineer
>>>>> Idega Software
>>>>> http://www.idega.com
>>>>>
>>>>>
>>>>> -------------------------------------------------------------------- -
>>>>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>>>>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>>>>>
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>>>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>>>>
>>>>
>>>>
>>> Best Regards
>>> Eirikur S. Hrafnsson, eiki@idega.is
>>> Chief Software Engineer
>>> Idega Software
>>> http://www.idega.com
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>>
>>
>>
>> -- 
>> Stefan Lützkendorf  --  luetzkendorf@apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>>
>>
>>
> Best Regards
> 
> Eirikur S. Hrafnsson, eiki@idega.is
> Chief Software Engineer
> Idega Software
> http://www.idega.com
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
> 


-- 
Stefan Lützkendorf  --  luetzkendorf@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-dev-help@jakarta.apache.org


Re: Lucene reindexer?

Posted by Eirikur Hrafnsson <ei...@idega.is>.
Cool, how do I try it?


On 30.3.2005, at 16:48, Stefan Lützkendorf wrote:

> Hi Eirikur,
>
> I recently checked in some first try for initalizing an index for an  
> existing store
> (Tested with txfile store).
>
> The indexer scans all docs in the store if there is no index on  
> startup.
>
> Give it a try if you want.
>
> Rgadards, Stefan
>
> Eirikur Hrafnsson wrote:
>> Hi James,
>> do you have time now to update/integrate the batch indexer for Slide?  
>> I  really need it badly : /
>> best regards
>> Eirikur, Idega.
>> On 12.3.2005, at 06:15, James Mason wrote:
>>> Sorry I didn't post this early. I was hoping I'd have time to clean  
>>> it
>>> up and actually integrate it into Slide, but I've been completely
>>> swamped lately.
>>>
>>> I've uploaded a code dump from my first working version to
>>> http://cvs.apache.org/~masonjm/batchindexer/
>>>
>>> I know that it contains bugs, since I've fixed a few I made the dump.
>>> Also, keep in mind that the code won't work as posted. The only
>>> implementation I've made works with Autonomy for the search engine,  
>>> and
>>> I didn't post the piece that actually talks to Autonomy. There's   
>>> nothing
>>> in there that would be useful for Lucene anyway.
>>>
>>> To make this generally useful there will need to be an  
>>> implementation  of
>>> QueueProcessor that supports Lucene. I've included an example
>>> implementation (for Autonomy) that should be a good starting point.
>>>
>>> There also needs to be a way to start/stop the batch indexer. I've
>>> implemented a Spring-based MVC webapp for controlling it on my  
>>> server,
>>> but I'm not sure if this is the best approach for a more general
>>> solution. Also, this is one area I know for sure contains bugs.  
>>> Someone
>>> who actually knows what they're doing should take a look at the run()
>>> logic for BatchIndexer to make it properly resumable. My latest  
>>> version
>>> seems to work alright, but this is an earlier snapshot so the logic
>>> still has errors.
>>>
>>> Also, since this whole thing uses Spring to glue everything together
>>> you'll need to get the Spring jars for it to work. I *think* I  
>>> patched
>>> the code in CVS to expose the ApplicationContext to the lower  
>>> levels. I
>>> think a servlet filter would be a better approach, but be aware that  
>>> if
>>> you want to do this with Slide 2.1 you'll need to go through some  
>>> extra
>>> steps.
>>>
>>> Holler if there are any questions.
>>>
>>> -James
>>>
>>> On Wed, 2005-03-09 at 11:15 +0000, Eirikur Hrafnsson wrote:
>>>
>>>> Hi Stefan,
>>>>
>>>> On 9.3.2005, at 08:52, Stefan Lützkendorf wrote:
>>>>
>>>>> Hi Eirikur,
>>>>>
>>>>> the reindex problem is still unresolved :-(.
>>>>> I'm currently thinking about this, because I think it's crucial  
>>>>> too.
>>>>
>>>> Yup, especially when you want to use Lucene on an existing store.
>>>> Somebody mentioned he was working on a batch indexer when we last
>>>> discussed this and he was going to commit it, was it Christophe or
>>>> Daniel perhaps...I can't find the email....
>>>>
>>>> cheers
>>>> Eiki, Idega.
>>>>
>>>>>
>>>>> Stefan
>>>>>
>>>>> Eirikur Hrafnsson wrote:
>>>>>
>>>>>> Hi all (long time no bugging you... ; )
>>>>>> a while ago I asked if there was a way to re-index the lucene  
>>>>>> index
>>>>>> for slide. This is pretty crucial feature in my opinion since the
>>>>>> Slide index is always stored on the file system regardless of what
>>>>>> kind of store you have thus making it harder to move a website  
>>>>>> from
>>>>>> development to production, backing it up and especially when you   
>>>>>> want
>>>>>> to enable the lucene indexing on an existing Slide store...
>>>>>> Is this possible today?
>>>>>> Best Regards
>>>>>> Eirikur S. Hrafnsson, eiki@idega.is
>>>>>> Chief Software Engineer
>>>>>> Idega Software
>>>>>> http://www.idega.com
>>>>>> p.s.
>>>>>> the SimpleXMLExtractor XPath stuff still doesn't work if you  
>>>>>> specify
>>>>>> a namespace other than "DAV:"  : (
>>>>>> ------------------------------------------------------------------ 
>>>>>> -- -
>>>>>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>>>>>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> Stefan Lützkendorf  --  luetzkendorf@apache.org
>>>>>
>>>>> ------------------------------------------------------------------- 
>>>>> --
>>>>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>>>>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>>>>>
>>>>>
>>>>>
>>>> Best Regards
>>>>
>>>> Eirikur S. Hrafnsson, eiki@idega.is
>>>> Chief Software Engineer
>>>> Idega Software
>>>> http://www.idega.com
>>>>
>>>>
>>>> -------------------------------------------------------------------- 
>>>> -
>>>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>>>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>>>>
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>>>
>>>
>>>
>> Best Regards
>> Eirikur S. Hrafnsson, eiki@idega.is
>> Chief Software Engineer
>> Idega Software
>> http://www.idega.com
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>
>
> -- 
> Stefan Lützkendorf  --  luetzkendorf@apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>
>
>
Best Regards

Eirikur S. Hrafnsson, eiki@idega.is
Chief Software Engineer
Idega Software
http://www.idega.com


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-dev-help@jakarta.apache.org


Re: Lucene reindexer?

Posted by Stefan Lützkendorf <lu...@apache.org>.
Hi Eirikur,

I recently checked in some first try for initalizing an index for an existing store
(Tested with txfile store).

The indexer scans all docs in the store if there is no index on startup.

Give it a try if you want.

Rgadards, Stefan

Eirikur Hrafnsson wrote:
> Hi James,
> 
> do you have time now to update/integrate the batch indexer for Slide? I  
> really need it badly : /
> 
> best regards
> Eirikur, Idega.
> 
> 
> On 12.3.2005, at 06:15, James Mason wrote:
> 
>> Sorry I didn't post this early. I was hoping I'd have time to clean it
>> up and actually integrate it into Slide, but I've been completely
>> swamped lately.
>>
>> I've uploaded a code dump from my first working version to
>> http://cvs.apache.org/~masonjm/batchindexer/
>>
>> I know that it contains bugs, since I've fixed a few I made the dump.
>> Also, keep in mind that the code won't work as posted. The only
>> implementation I've made works with Autonomy for the search engine, and
>> I didn't post the piece that actually talks to Autonomy. There's  nothing
>> in there that would be useful for Lucene anyway.
>>
>> To make this generally useful there will need to be an implementation  of
>> QueueProcessor that supports Lucene. I've included an example
>> implementation (for Autonomy) that should be a good starting point.
>>
>> There also needs to be a way to start/stop the batch indexer. I've
>> implemented a Spring-based MVC webapp for controlling it on my server,
>> but I'm not sure if this is the best approach for a more general
>> solution. Also, this is one area I know for sure contains bugs. Someone
>> who actually knows what they're doing should take a look at the run()
>> logic for BatchIndexer to make it properly resumable. My latest version
>> seems to work alright, but this is an earlier snapshot so the logic
>> still has errors.
>>
>> Also, since this whole thing uses Spring to glue everything together
>> you'll need to get the Spring jars for it to work. I *think* I patched
>> the code in CVS to expose the ApplicationContext to the lower levels. I
>> think a servlet filter would be a better approach, but be aware that if
>> you want to do this with Slide 2.1 you'll need to go through some extra
>> steps.
>>
>> Holler if there are any questions.
>>
>> -James
>>
>> On Wed, 2005-03-09 at 11:15 +0000, Eirikur Hrafnsson wrote:
>>
>>> Hi Stefan,
>>>
>>> On 9.3.2005, at 08:52, Stefan Lützkendorf wrote:
>>>
>>>> Hi Eirikur,
>>>>
>>>> the reindex problem is still unresolved :-(.
>>>> I'm currently thinking about this, because I think it's crucial too.
>>>
>>> Yup, especially when you want to use Lucene on an existing store.
>>> Somebody mentioned he was working on a batch indexer when we last
>>> discussed this and he was going to commit it, was it Christophe or
>>> Daniel perhaps...I can't find the email....
>>>
>>> cheers
>>> Eiki, Idega.
>>>
>>>>
>>>> Stefan
>>>>
>>>> Eirikur Hrafnsson wrote:
>>>>
>>>>> Hi all (long time no bugging you... ; )
>>>>> a while ago I asked if there was a way to re-index the lucene index
>>>>> for slide. This is pretty crucial feature in my opinion since the
>>>>> Slide index is always stored on the file system regardless of what
>>>>> kind of store you have thus making it harder to move a website from
>>>>> development to production, backing it up and especially when you  want
>>>>> to enable the lucene indexing on an existing Slide store...
>>>>> Is this possible today?
>>>>> Best Regards
>>>>> Eirikur S. Hrafnsson, eiki@idega.is
>>>>> Chief Software Engineer
>>>>> Idega Software
>>>>> http://www.idega.com
>>>>> p.s.
>>>>> the SimpleXMLExtractor XPath stuff still doesn't work if you specify
>>>>> a namespace other than "DAV:"  : (
>>>>> -------------------------------------------------------------------- -
>>>>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>>>>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>>>>
>>>>
>>>>
>>>> -- 
>>>> Stefan Lützkendorf  --  luetzkendorf@apache.org
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>>>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>>>>
>>>>
>>>>
>>> Best Regards
>>>
>>> Eirikur S. Hrafnsson, eiki@idega.is
>>> Chief Software Engineer
>>> Idega Software
>>> http://www.idega.com
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>>
>>
>>
> Best Regards
> 
> Eirikur S. Hrafnsson, eiki@idega.is
> Chief Software Engineer
> Idega Software
> http://www.idega.com
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
> 


-- 
Stefan Lützkendorf  --  luetzkendorf@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-dev-help@jakarta.apache.org


Re: Lucene reindexer?

Posted by Eirikur Hrafnsson <ei...@idega.is>.
Hi James,

do you have time now to update/integrate the batch indexer for Slide? I  
really need it badly : /

best regards
Eirikur, Idega.


On 12.3.2005, at 06:15, James Mason wrote:

> Sorry I didn't post this early. I was hoping I'd have time to clean it
> up and actually integrate it into Slide, but I've been completely
> swamped lately.
>
> I've uploaded a code dump from my first working version to
> http://cvs.apache.org/~masonjm/batchindexer/
>
> I know that it contains bugs, since I've fixed a few I made the dump.
> Also, keep in mind that the code won't work as posted. The only
> implementation I've made works with Autonomy for the search engine, and
> I didn't post the piece that actually talks to Autonomy. There's  
> nothing
> in there that would be useful for Lucene anyway.
>
> To make this generally useful there will need to be an implementation  
> of
> QueueProcessor that supports Lucene. I've included an example
> implementation (for Autonomy) that should be a good starting point.
>
> There also needs to be a way to start/stop the batch indexer. I've
> implemented a Spring-based MVC webapp for controlling it on my server,
> but I'm not sure if this is the best approach for a more general
> solution. Also, this is one area I know for sure contains bugs. Someone
> who actually knows what they're doing should take a look at the run()
> logic for BatchIndexer to make it properly resumable. My latest version
> seems to work alright, but this is an earlier snapshot so the logic
> still has errors.
>
> Also, since this whole thing uses Spring to glue everything together
> you'll need to get the Spring jars for it to work. I *think* I patched
> the code in CVS to expose the ApplicationContext to the lower levels. I
> think a servlet filter would be a better approach, but be aware that if
> you want to do this with Slide 2.1 you'll need to go through some extra
> steps.
>
> Holler if there are any questions.
>
> -James
>
> On Wed, 2005-03-09 at 11:15 +0000, Eirikur Hrafnsson wrote:
>> Hi Stefan,
>>
>> On 9.3.2005, at 08:52, Stefan Lützkendorf wrote:
>>
>>> Hi Eirikur,
>>>
>>> the reindex problem is still unresolved :-(.
>>> I'm currently thinking about this, because I think it's crucial too.
>> Yup, especially when you want to use Lucene on an existing store.
>> Somebody mentioned he was working on a batch indexer when we last
>> discussed this and he was going to commit it, was it Christophe or
>> Daniel perhaps...I can't find the email....
>>
>> cheers
>> Eiki, Idega.
>>
>>>
>>> Stefan
>>>
>>> Eirikur Hrafnsson wrote:
>>>> Hi all (long time no bugging you... ; )
>>>> a while ago I asked if there was a way to re-index the lucene index
>>>> for slide. This is pretty crucial feature in my opinion since the
>>>> Slide index is always stored on the file system regardless of what
>>>> kind of store you have thus making it harder to move a website from
>>>> development to production, backing it up and especially when you  
>>>> want
>>>> to enable the lucene indexing on an existing Slide store...
>>>> Is this possible today?
>>>> Best Regards
>>>> Eirikur S. Hrafnsson, eiki@idega.is
>>>> Chief Software Engineer
>>>> Idega Software
>>>> http://www.idega.com
>>>> p.s.
>>>> the SimpleXMLExtractor XPath stuff still doesn't work if you specify
>>>> a namespace other than "DAV:"  : (
>>>> -------------------------------------------------------------------- 
>>>> -
>>>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>>>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>>>
>>>
>>> -- 
>>> Stefan Lützkendorf  --  luetzkendorf@apache.org
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>>>
>>>
>>>
>> Best Regards
>>
>> Eirikur S. Hrafnsson, eiki@idega.is
>> Chief Software Engineer
>> Idega Software
>> http://www.idega.com
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>
>
>
Best Regards

Eirikur S. Hrafnsson, eiki@idega.is
Chief Software Engineer
Idega Software
http://www.idega.com


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-dev-help@jakarta.apache.org


Re: Lucene reindexer?

Posted by James Mason <ma...@apache.org>.
Sorry I didn't post this early. I was hoping I'd have time to clean it
up and actually integrate it into Slide, but I've been completely
swamped lately.

I've uploaded a code dump from my first working version to
http://cvs.apache.org/~masonjm/batchindexer/

I know that it contains bugs, since I've fixed a few I made the dump.
Also, keep in mind that the code won't work as posted. The only
implementation I've made works with Autonomy for the search engine, and
I didn't post the piece that actually talks to Autonomy. There's nothing
in there that would be useful for Lucene anyway.

To make this generally useful there will need to be an implementation of
QueueProcessor that supports Lucene. I've included an example
implementation (for Autonomy) that should be a good starting point.

There also needs to be a way to start/stop the batch indexer. I've
implemented a Spring-based MVC webapp for controlling it on my server,
but I'm not sure if this is the best approach for a more general
solution. Also, this is one area I know for sure contains bugs. Someone
who actually knows what they're doing should take a look at the run()
logic for BatchIndexer to make it properly resumable. My latest version
seems to work alright, but this is an earlier snapshot so the logic
still has errors.

Also, since this whole thing uses Spring to glue everything together
you'll need to get the Spring jars for it to work. I *think* I patched
the code in CVS to expose the ApplicationContext to the lower levels. I
think a servlet filter would be a better approach, but be aware that if
you want to do this with Slide 2.1 you'll need to go through some extra
steps.

Holler if there are any questions.

-James

On Wed, 2005-03-09 at 11:15 +0000, Eirikur Hrafnsson wrote:
> Hi Stefan,
> 
> On 9.3.2005, at 08:52, Stefan Lützkendorf wrote:
> 
> > Hi Eirikur,
> >
> > the reindex problem is still unresolved :-(.
> > I'm currently thinking about this, because I think it's crucial too.
> Yup, especially when you want to use Lucene on an existing store. 
> Somebody mentioned he was working on a batch indexer when we last 
> discussed this and he was going to commit it, was it Christophe or 
> Daniel perhaps...I can't find the email....
> 
> cheers
> Eiki, Idega.
> 
> >
> > Stefan
> >
> > Eirikur Hrafnsson wrote:
> >> Hi all (long time no bugging you... ; )
> >> a while ago I asked if there was a way to re-index the lucene index 
> >> for slide. This is pretty crucial feature in my opinion since the 
> >> Slide index is always stored on the file system regardless of what 
> >> kind of store you have thus making it harder to move a website from 
> >> development to production, backing it up and especially when you want 
> >> to enable the lucene indexing on an existing Slide store...
> >> Is this possible today?
> >> Best Regards
> >> Eirikur S. Hrafnsson, eiki@idega.is
> >> Chief Software Engineer
> >> Idega Software
> >> http://www.idega.com
> >> p.s.
> >> the SimpleXMLExtractor XPath stuff still doesn't work if you specify 
> >> a namespace other than "DAV:"  : (
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
> >> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
> >
> >
> > -- 
> > Stefan Lützkendorf  --  luetzkendorf@apache.org
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: slide-dev-help@jakarta.apache.org
> >
> >
> >
> Best Regards
> 
> Eirikur S. Hrafnsson, eiki@idega.is
> Chief Software Engineer
> Idega Software
> http://www.idega.com
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-dev-help@jakarta.apache.org


Re: Lucene reindexer?

Posted by Eirikur Hrafnsson <ei...@idega.is>.
Hi Stefan,

On 9.3.2005, at 08:52, Stefan Lützkendorf wrote:

> Hi Eirikur,
>
> the reindex problem is still unresolved :-(.
> I'm currently thinking about this, because I think it's crucial too.
Yup, especially when you want to use Lucene on an existing store. 
Somebody mentioned he was working on a batch indexer when we last 
discussed this and he was going to commit it, was it Christophe or 
Daniel perhaps...I can't find the email....

cheers
Eiki, Idega.

>
> Stefan
>
> Eirikur Hrafnsson wrote:
>> Hi all (long time no bugging you... ; )
>> a while ago I asked if there was a way to re-index the lucene index 
>> for slide. This is pretty crucial feature in my opinion since the 
>> Slide index is always stored on the file system regardless of what 
>> kind of store you have thus making it harder to move a website from 
>> development to production, backing it up and especially when you want 
>> to enable the lucene indexing on an existing Slide store...
>> Is this possible today?
>> Best Regards
>> Eirikur S. Hrafnsson, eiki@idega.is
>> Chief Software Engineer
>> Idega Software
>> http://www.idega.com
>> p.s.
>> the SimpleXMLExtractor XPath stuff still doesn't work if you specify 
>> a namespace other than "DAV:"  : (
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>
>
> -- 
> Stefan Lützkendorf  --  luetzkendorf@apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>
>
>
Best Regards

Eirikur S. Hrafnsson, eiki@idega.is
Chief Software Engineer
Idega Software
http://www.idega.com


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-dev-help@jakarta.apache.org


Re: Lucene reindexer?

Posted by Stefan Lützkendorf <lu...@apache.org>.
Hi Eirikur,

the reindex problem is still unresolved :-(.
I'm currently thinking about this, because I think it's crucial too.

Stefan

Eirikur Hrafnsson wrote:
> Hi all (long time no bugging you... ; )
> 
> a while ago I asked if there was a way to re-index the lucene index for 
> slide. This is pretty crucial feature in my opinion since the Slide 
> index is always stored on the file system regardless of what kind of 
> store you have thus making it harder to move a website from development 
> to production, backing it up and especially when you want to enable the 
> lucene indexing on an existing Slide store...
> 
> Is this possible today?
> 
> Best Regards
> 
> Eirikur S. Hrafnsson, eiki@idega.is
> Chief Software Engineer
> Idega Software
> http://www.idega.com
> 
> p.s.
> the SimpleXMLExtractor XPath stuff still doesn't work if you specify a 
> namespace other than "DAV:"  : (
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
> 


-- 
Stefan Lützkendorf  --  luetzkendorf@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-dev-help@jakarta.apache.org


Re: Lucene reindexer?

Posted by Eirikur Hrafnsson <ei...@idega.is>.
Hehe you guys lost me there...obviously know a lot more about this than 
I do. Can you collaborate on it perhaps?

Eiki, idega.

On 9.3.2005, at 11:53, karl@gan.no wrote:

> Honoré David wrote:
>
>> It would be more efficient to store lucene index into slide BUT
>> implement org.apache.lucene.store.Directory via slide API (on the 
>> server
>> side) directly and not over and HTTP connection.
>> (Anyway the indexer is server side.)
>
> Yes, implementing it directly onto the Slide API is more efficient, but
> not efficient enough alone I’m afraid. I got a Lucene index for a 
> customer
> that is over 2GB of size and even if this is split into several Lucene
> files, many of these files are just to large to handle without some 
> paging
> I think. Further more a clean WebDAV implementation would be much more
> flexible and reusable for the Lucene project to, not only Slide.
>
>> note: Lucene juste need (Create => Write_only => Close) and (Open =>
>> Random_Read_only => Close). and delete Lucene never "modify" the 
>> "file",
>> but you need not to just implement org.apache.lucene.store.Directory
>> Interface but also a org.apache.lucene.store.InputStream,
>> org.apache.lucene.store.OutputStream, ... etc
>>
>> I can find some time to work on this if someone think this is 
>> something
>> usefull.
>
> Lucenes InputStream and OutputStream requires seek() even if the actual
> files doesn’t. I have abused the RAMDirectory for a while now and I 
> think
> that taking the code in that implementation would be a nice start for
> implementing a DAVDirectory. It already does paging into arrays, so the
> work would be to save and load these arrays into slide and adjust the
> maximum allowed size of each “file”. Lucene sees collections of these
> paged arrays as files, so we would need a filename translation that 
> says
> that file “lucene.segment” actually is chunked out into a set of small
> WebDAV resources.
>
>> But all of this doesn't help for re-index... : (
>
> Eh... true, but wouldnt it be cool? :-)
>
> Mvh Karl Øie
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>
>
>
Best Regards

Eirikur S. Hrafnsson, eiki@idega.is
Chief Software Engineer
Idega Software
http://www.idega.com


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-dev-help@jakarta.apache.org


Re: Lucene reindexer?

Posted by ka...@gan.no.
Honoré David wrote:

> It would be more efficient to store lucene index into slide BUT
> implement org.apache.lucene.store.Directory via slide API (on the server
> side) directly and not over and HTTP connection.
> (Anyway the indexer is server side.)

Yes, implementing it directly onto the Slide API is more efficient, but
not efficient enough alone I’m afraid. I got a Lucene index for a customer
that is over 2GB of size and even if this is split into several Lucene
files, many of these files are just to large to handle without some paging
I think. Further more a clean WebDAV implementation would be much more
flexible and reusable for the Lucene project to, not only Slide.

> note: Lucene juste need (Create => Write_only => Close) and (Open =>
> Random_Read_only => Close). and delete Lucene never "modify" the "file",
> but you need not to just implement org.apache.lucene.store.Directory
> Interface but also a org.apache.lucene.store.InputStream,
> org.apache.lucene.store.OutputStream, ... etc
>
> I can find some time to work on this if someone think this is something
> usefull.

Lucenes InputStream and OutputStream requires seek() even if the actual
files doesn’t. I have abused the RAMDirectory for a while now and I think
that taking the code in that implementation would be a nice start for
implementing a DAVDirectory. It already does paging into arrays, so the
work would be to save and load these arrays into slide and adjust the
maximum allowed size of each “file”. Lucene sees collections of these
paged arrays as files, so we would need a filename translation that says
that file “lucene.segment” actually is chunked out into a set of small
WebDAV resources.

> But all of this doesn't help for re-index... : (

Eh... true, but wouldnt it be cool? :-)

Mvh Karl Øie

---------------------------------------------------------------------
To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-dev-help@jakarta.apache.org


Re: Lucene reindexer?

Posted by Eirikur Hrafnsson <ei...@idega.is>.
On 9.3.2005, at 11:35, Honoré David wrote:

> Eirikur Hrafnsson wrote:
>
>> Hey Karl :)
>> On 9.3.2005, at 01:58, karl@gan.no wrote:
>>
>>> Hi again Eirikur!
>>>
>>> Getting Lucene to store its index into a Slide store should very 
>>> much be
>>> doable as webdav is a filesystem. I think the key is the
>>> org.apache.lucene.store.Directory class. Yake a look at
>>> org.apache.lucene.store.FSDirectory and if you agree i think it 
>>> should be
>>> possible to create a Directory class that writes and reads it's 
>>> files from
>>> a webdav repository.
>>> I did an experiment where i used Novell's Netdrive to mount a Slide
>>> repository as a drive letter in Windows. Then i configured Lucene to 
>>> use
>>> that drive as its filesystem. It worked, but not very speedy :-(
>>>
>> Yes I had thought about that also and wondered if it could be fast 
>> enough.
>
> It would be more efficient to store lucene index into slide BUT 
> implement org.apache.lucene.store.Directory via slide API (on the 
> server side) directly and not over and HTTP connection.
> (Anyway the indexer is server side.)
>
> note: Lucene juste need (Create => Write_only => Close) and (Open => 
> Random_Read_only => Close). and delete Lucene never "modify" the 
> "file", but you need not to just implement 
> org.apache.lucene.store.Directory Interface but also a 
> org.apache.lucene.store.InputStream, 
> org.apache.lucene.store.OutputStream, ... etc
>
> I can find some time to work on this if someone think this is 
> something usefull.
Ahhh somebody with an insight into Lucene and Slide :) If you can find 
the time to start work on this that would be GREAT, I can pitch in also 
if you like : )

>
>
> But all of this doesn't help for re-index... : (
No but perhaps there is a batch reindexer available somewhere for 
FSDirectory that we could use with "SlideDirectory"?
Stefan, I think, is going to look into the reindexing.

>
>>
>>>
>>> ...
>>
>>
>
>
> Honoré David
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>
>
>
Best Regards

Eirikur S. Hrafnsson, eiki@idega.is
Chief Software Engineer
Idega Software
http://www.idega.com


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-dev-help@jakarta.apache.org


Re: Lucene reindexer?

Posted by Honoré David <dh...@oma.be>.
Eirikur Hrafnsson wrote:

> Hey Karl :)
> On 9.3.2005, at 01:58, karl@gan.no wrote:
>
>> Hi again Eirikur!
>>
>> Getting Lucene to store its index into a Slide store should very much be
>> doable as webdav is a filesystem. I think the key is the
>> org.apache.lucene.store.Directory class. Yake a look at
>> org.apache.lucene.store.FSDirectory and if you agree i think it 
>> should be
>> possible to create a Directory class that writes and reads it's files 
>> from
>> a webdav repository.
>> I did an experiment where i used Novell's Netdrive to mount a Slide
>> repository as a drive letter in Windows. Then i configured Lucene to use
>> that drive as its filesystem. It worked, but not very speedy :-(
>>
> Yes I had thought about that also and wondered if it could be fast 
> enough.

It would be more efficient to store lucene index into slide BUT 
implement org.apache.lucene.store.Directory via slide API (on the server 
side) directly and not over and HTTP connection.
(Anyway the indexer is server side.)

note: Lucene juste need (Create => Write_only => Close) and (Open => 
Random_Read_only => Close). and delete Lucene never "modify" the "file", 
but you need not to just implement org.apache.lucene.store.Directory 
Interface but also a org.apache.lucene.store.InputStream, 
org.apache.lucene.store.OutputStream, ... etc

I can find some time to work on this if someone think this is something 
usefull.


But all of this doesn't help for re-index... : (

>
>>
>> ...
>
>


Honoré David



---------------------------------------------------------------------
To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-dev-help@jakarta.apache.org


Re: Lucene reindexer?

Posted by Eirikur Hrafnsson <ei...@idega.is>.
Hey Karl :)
On 9.3.2005, at 01:58, karl@gan.no wrote:

> Hi again Eirikur!
>
> Getting Lucene to store its index into a Slide store should very much 
> be
> doable as webdav is a filesystem. I think the key is the
> org.apache.lucene.store.Directory class. Yake a look at
> org.apache.lucene.store.FSDirectory and if you agree i think it should 
> be
> possible to create a Directory class that writes and reads it's files 
> from
> a webdav repository.
> I did an experiment where i used Novell's Netdrive to mount a Slide
> repository as a drive letter in Windows. Then i configured Lucene to 
> use
> that drive as its filesystem. It worked, but not very speedy :-(
>
Yes I had thought about that also and wondered if it could be fast 
enough.

> The problem is that Lucene requires seek() on its data files. And with
> webdav you got to get the whole resource at once since its going 
> through
> http. The solution to this could be to make the imlementation like a
> paging system where webdav resource of say 32k gets paged inn and out 
> when
> a seek() is performed.
Sounds interesting, doesn't Slide also cache files so it could probably 
be fast if you can spare the memory.
Do you think that for every search the index files are accessed because 
using the webdavclient for that would probably be a killer for 
performance.

>
> I would be happy to assist and/or help out in such an
> org.apache.lucene.store.Directory implementation as this is something i
> would like to use aswell..
Me too.

cheers
Eiki, Idega.

>
> Im still not back at work, but this sounds very interesting!
>
> Mvh Karl Øie
>
>> Hi all (long time no bugging you... ; )
>>
>> a while ago I asked if there was a way to re-index the lucene index 
>> for
>> slide. This is pretty crucial feature in my opinion since the Slide
>> index is always stored on the file system regardless of what kind of
>> store you have thus making it harder to move a website from 
>> development
>> to production, backing it up and especially when you want to enable 
>> the
>> lucene indexing on an existing Slide store...
>>
>> Is this possible today?
>>
>> Best Regards
>>
>> Eirikur S. Hrafnsson, eiki@idega.is
>> Chief Software Engineer
>> Idega Software
>> http://www.idega.com
>>
>> p.s.
>> the SimpleXMLExtractor XPath stuff still doesn't work if you specify a
>> namespace other than "DAV:"  : (
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>
>
>
Best Regards

Eirikur S. Hrafnsson, eiki@idega.is
Chief Software Engineer
Idega Software
http://www.idega.com


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-dev-help@jakarta.apache.org


Re: Lucene reindexer?

Posted by ka...@gan.no.
Hi again Eirikur!

Getting Lucene to store its index into a Slide store should very much be
doable as webdav is a filesystem. I think the key is the
org.apache.lucene.store.Directory class. Yake a look at
org.apache.lucene.store.FSDirectory and if you agree i think it should be
possible to create a Directory class that writes and reads it's files from
a webdav repository.

I did an experiment where i used Novell's Netdrive to mount a Slide
repository as a drive letter in Windows. Then i configured Lucene to use
that drive as its filesystem. It worked, but not very speedy :-(

The problem is that Lucene requires seek() on its data files. And with
webdav you got to get the whole resource at once since its going through
http. The solution to this could be to make the imlementation like a
paging system where webdav resource of say 32k gets paged inn and out when
a seek() is performed.

I would be happy to assist and/or help out in such an
org.apache.lucene.store.Directory implementation as this is something i
would like to use aswell..

Im still not back at work, but this sounds very interesting!

Mvh Karl Øie

> Hi all (long time no bugging you... ; )
>
> a while ago I asked if there was a way to re-index the lucene index for
> slide. This is pretty crucial feature in my opinion since the Slide
> index is always stored on the file system regardless of what kind of
> store you have thus making it harder to move a website from development
> to production, backing it up and especially when you want to enable the
> lucene indexing on an existing Slide store...
>
> Is this possible today?
>
> Best Regards
>
> Eirikur S. Hrafnsson, eiki@idega.is
> Chief Software Engineer
> Idega Software
> http://www.idega.com
>
> p.s.
> the SimpleXMLExtractor XPath stuff still doesn't work if you specify a
> namespace other than "DAV:"  : (
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-dev-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-dev-help@jakarta.apache.org