You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Rupinder Singh Mazara <rs...@ebi.ac.uk> on 2004/08/20 10:32:09 UTC

lucene and ejb applications

hi all

   purely due to a policy decision, we would like to host our lucene search
application , in a j2ee container, preferable by means of a ejb.
Since access to java.io is restricted by the ejb specification, what would
be the best way to create desgin the application ?
  i have taken a look at ejindex@sf.net but it my relies on mbeans and not a
session bean
  does any one have pointers or samples that can be looked at






---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: lucene and ejb applications

Posted by Praveen Peddi <pp...@contextmedia.com>.
Infact we do the same exact thing. Session bean method called search()
delegates to a POJO SearchService. We lazy load the IndexSearch cache it in
memory and invalidate that object when someone else modifies the index. This
trick works wonderfually for us. The search has become faster after caching
the searcher.

Praveen
----- Original Message ----- 
From: "Erik Hatcher" <er...@ehatchersolutions.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Friday, August 20, 2004 12:02 PM
Subject: Re: lucene and ejb applications


> On Aug 20, 2004, at 7:54 AM, Rupinder Singh Mazara wrote:
> > hi erik
> >
> >  thanks for the warning and the code.
> >  Let me re-phrase the question,
> >
> >  i have a index generated by lucene, i need to have the search
> > capabilty
> >  to have a high availabilty. What solutions would be the most optimal
>
> I'm guessing from your descriptions that you want a search server that
> multiple applications can access.  Correct?  Is that what you mean by
> "high availability"?
>
> Take a look at Nutch for examples of doing this kind of thing.  And
> also...
>
> >
> >  Currentlly i have two senarions in mind
> >   a) setup a RMI based app. that on start-up initializes a
> > IndexSearcher
> > object
> >      and waits for invocation of a method like Vector
> > executeQuery(Query )
>
> Lucene has built-in RMI capability, so you don't need to recreate this
> yourself.  Look at RemoteSearchable (and the test cases that use it).
>
> >   b) create a web based app(jsp/servlet or struts)  that initialises
> > the
> > IndexSearcher object, and stores in the servletContext on
> > intialization, and
> > all request invoke the Hits search(Query q)
>
> This is ok, but you have the same issues with servlet context
> (application scope or even session scope) with distributed
> applications.  IndexSearcher, at the very least, should be transient
> and lazy initialized, perhaps nested under a controller object of your
> making.
>
> >   with senario a)  i can have more control over updates, insert, and
> > deletes
> >   where as with  senario b) has higher availabilty
>
> I disagree with your analysis of those scenarios.  Neither has more or
> less control or availability than the other.
>
> >  I want to create and store the IndexSearcher object, during
> > initailization
> > to save on
> >  mutlitple open and reads. once updates are ready signal can be sent to
> > block further searches while the updates are integrated into the
> > existing
> > index.
>
> It is a good thing to keep an IndexSearcher instance around for big
> indexes to save on that I/O, I completely agree.  A simple
> IndexSearcher-encapsulating Java object which lazy initializes and
> keeps IndexSearcher as a transient would be quite sufficient, I think.
> Store that object wherever you like - application scope seems to be
> appropriate for your web application scenario.
>
> Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: lucene and ejb applications

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Aug 20, 2004, at 7:54 AM, Rupinder Singh Mazara wrote:
> hi erik
>
>  thanks for the warning and the code.
>  Let me re-phrase the question,
>
>  i have a index generated by lucene, i need to have the search 
> capabilty
>  to have a high availabilty. What solutions would be the most optimal

I'm guessing from your descriptions that you want a search server that 
multiple applications can access.  Correct?  Is that what you mean by 
"high availability"?

Take a look at Nutch for examples of doing this kind of thing.  And 
also...

>
>  Currentlly i have two senarions in mind
>   a) setup a RMI based app. that on start-up initializes a 
> IndexSearcher
> object
>      and waits for invocation of a method like Vector 
> executeQuery(Query )

Lucene has built-in RMI capability, so you don't need to recreate this 
yourself.  Look at RemoteSearchable (and the test cases that use it).

>   b) create a web based app(jsp/servlet or struts)  that initialises 
> the
> IndexSearcher object, and stores in the servletContext on 
> intialization, and
> all request invoke the Hits search(Query q)

This is ok, but you have the same issues with servlet context 
(application scope or even session scope) with distributed 
applications.  IndexSearcher, at the very least, should be transient 
and lazy initialized, perhaps nested under a controller object of your 
making.

>   with senario a)  i can have more control over updates, insert, and 
> deletes
>   where as with  senario b) has higher availabilty

I disagree with your analysis of those scenarios.  Neither has more or 
less control or availability than the other.

>  I want to create and store the IndexSearcher object, during 
> initailization
> to save on
>  mutlitple open and reads. once updates are ready signal can be sent to
> block further searches while the updates are integrated into the 
> existing
> index.

It is a good thing to keep an IndexSearcher instance around for big 
indexes to save on that I/O, I completely agree.  A simple 
IndexSearcher-encapsulating Java object which lazy initializes and 
keeps IndexSearcher as a transient would be quite sufficient, I think.  
Store that object wherever you like - application scope seems to be 
appropriate for your web application scenario.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: lucene and ejb applications

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Option b) sounds simpler and sufficient to me.  I don't see why you
would need to involve RMI for something as simple as this.  I use
something similar to your b) option for some indices behind
http://www.simpy.com/ .  I don't store IndexSearcher in the servlet
context, though - I just have some logic like this:


    /**
     * Returns an instance of {@link IndexDescriptor} for the given
     * <code>indexID</code>, which must represent an absolute file
     * path to the index directory.
     * <p/>
     * This method caches {@link IndexDescriptor}s in a LRU Map and
     * first tries to retrieve them from there.
     * <p/>
     * If the specified index has been changed since the the last time
     * it was used, its {@link Searcher} is reloaded.
     *
     * @param indexID the full path to the index directory
     * @return an instance of {@link IndexDescriptor}
     * @throws SearcherException if the given index cannot be accessed
     */
    IndexDescriptor getUserSearcherIndexDescriptor(String indexID)
        throws SearcherException
    {
        File indexDir = validateIndex(indexID);
        IndexDescriptor indexDescriptor =
getIndexDescriptorFromCache(indexDir);

        try
        {
            // if this is a known index
            if (indexDescriptor != null)
            {
                // if the index has changed since this Searcher was
created, make a new Searcher
                long currentVersion =
IndexReader.getCurrentVersion(indexDir);
                if (currentVersion > indexDescriptor.lastKnownVersion)
                {
                    indexDescriptor.lastKnownVersion = currentVersion;
                    indexDescriptor.searcher = new
LuceneUserSearcher(indexDir);
                }
            }
            // if this is a new index
            else
            {
                indexDescriptor = new IndexDescriptor();
                indexDescriptor.indexDir = indexDir;
                indexDescriptor.lastKnownVersion =
IndexReader.getCurrentVersion(indexDir);
                indexDescriptor.searcher = new
LuceneUserSearcher(indexDir);
            }
            return cacheIndexDescriptor(indexDescriptor);
        }
        catch (IOException e)
        {
            throw new SearcherException("Cannot open index: " +
indexDir, e);
        }
    }

IndexDescriptor is a simple struct-like class.


Otis


--- Rupinder Singh Mazara <rs...@ebi.ac.uk> wrote:

> hi erik
> 
>  thanks for the warning and the code.
>  Let me re-phrase the question,
> 
>  i have a index generated by lucene, i need to have the search
> capabilty
>  to have a high availabilty. What solutions would be the most optimal
> 
>  Currentlly i have two senarions in mind
>   a) setup a RMI based app. that on start-up initializes a
> IndexSearcher
> object
>      and waits for invocation of a method like Vector
> executeQuery(Query )
> 
>   b) create a web based app(jsp/servlet or struts)  that initialises
> the
> IndexSearcher object, and stores in the servletContext on
> intialization, and
> all request invoke the Hits search(Query q)
> 
>   with senario a)  i can have more control over updates, insert, and
> deletes
>   where as with  senario b) has higher availabilty
> 
>  I want to create and store the IndexSearcher object, during
> initailization
> to save on
>  mutlitple open and reads. once updates are ready signal can be sent
> to
> block further searches while the updates are integrated into the
> existing
> index.
> 
> 
> 
> >-----Original Message-----
> >From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> >Sent: 20 August 2004 11:13
> >To: Lucene Users List
> >Subject: Re: lucene and ejb applications
> >
> >
> >What would be the best way?  Use Lucene outside of EJB.  It's quite
> >silly to make such a decision "purely due to a policy decision" when
> >the technicalities of it show that it is an unwise decision.
> >
> >You're going to navigate Hits through a session bean?  And as you
> said,
> >the EJB spec says not to use file I/O from EJB's.  That is a good
> >recommendation if you are distributing your system across servers
> and
> >replication is occurring - if another call to a session bean occurs
> and
> >ends up on a different server, then the file handle is lost.
> >
> >I violate the spec in my JavaDevWithAnt project and have one mode
> where
> >I have a stateless session bean returning search results:
> >http://www.ehatchersolutions.com/JavaDevWithAnt - but I definitely
> do
> >not recommend it.  It works when you are in a single-server
> >environment.
> >
> >In summary - EJB and Lucene are not a good mix - don't force it just
> to
> >be buzzword compliant.
> >
> >	Erik
> >
> >
> >On Aug 20, 2004, at 4:32 AM, Rupinder Singh Mazara wrote:
> >
> >> hi all
> >>
> >>    purely due to a policy decision, we would like to host our
> lucene
> >> search
> >> application , in a j2ee container, preferable by means of a ejb.
> >> Since access to java.io is restricted by the ejb specification,
> what
> >> would
> >> be the best way to create desgin the application ?
> >>   i have taken a look at ejindex@sf.net but it my relies on mbeans
> and
> >> not a
> >> session bean
> >>   does any one have pointers or samples that can be looked at



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: lucene and ejb applications

Posted by Rupinder Singh Mazara <rs...@ebi.ac.uk>.
hi erik

 thanks for the warning and the code.
 Let me re-phrase the question,

 i have a index generated by lucene, i need to have the search capabilty
 to have a high availabilty. What solutions would be the most optimal

 Currentlly i have two senarions in mind
  a) setup a RMI based app. that on start-up initializes a IndexSearcher
object
     and waits for invocation of a method like Vector executeQuery(Query )

  b) create a web based app(jsp/servlet or struts)  that initialises the
IndexSearcher object, and stores in the servletContext on intialization, and
all request invoke the Hits search(Query q)

  with senario a)  i can have more control over updates, insert, and deletes
  where as with  senario b) has higher availabilty

 I want to create and store the IndexSearcher object, during initailization
to save on
 mutlitple open and reads. once updates are ready signal can be sent to
block further searches while the updates are integrated into the existing
index.



>-----Original Message-----
>From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
>Sent: 20 August 2004 11:13
>To: Lucene Users List
>Subject: Re: lucene and ejb applications
>
>
>What would be the best way?  Use Lucene outside of EJB.  It's quite
>silly to make such a decision "purely due to a policy decision" when
>the technicalities of it show that it is an unwise decision.
>
>You're going to navigate Hits through a session bean?  And as you said,
>the EJB spec says not to use file I/O from EJB's.  That is a good
>recommendation if you are distributing your system across servers and
>replication is occurring - if another call to a session bean occurs and
>ends up on a different server, then the file handle is lost.
>
>I violate the spec in my JavaDevWithAnt project and have one mode where
>I have a stateless session bean returning search results:
>http://www.ehatchersolutions.com/JavaDevWithAnt - but I definitely do
>not recommend it.  It works when you are in a single-server
>environment.
>
>In summary - EJB and Lucene are not a good mix - don't force it just to
>be buzzword compliant.
>
>	Erik
>
>
>On Aug 20, 2004, at 4:32 AM, Rupinder Singh Mazara wrote:
>
>> hi all
>>
>>    purely due to a policy decision, we would like to host our lucene
>> search
>> application , in a j2ee container, preferable by means of a ejb.
>> Since access to java.io is restricted by the ejb specification, what
>> would
>> be the best way to create desgin the application ?
>>   i have taken a look at ejindex@sf.net but it my relies on mbeans and
>> not a
>> session bean
>>   does any one have pointers or samples that can be looked at
>>
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: lucene and ejb applications

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
What would be the best way?  Use Lucene outside of EJB.  It's quite 
silly to make such a decision "purely due to a policy decision" when 
the technicalities of it show that it is an unwise decision.

You're going to navigate Hits through a session bean?  And as you said, 
the EJB spec says not to use file I/O from EJB's.  That is a good 
recommendation if you are distributing your system across servers and 
replication is occurring - if another call to a session bean occurs and 
ends up on a different server, then the file handle is lost.

I violate the spec in my JavaDevWithAnt project and have one mode where 
I have a stateless session bean returning search results: 
http://www.ehatchersolutions.com/JavaDevWithAnt - but I definitely do 
not recommend it.  It works when you are in a single-server 
environment.

In summary - EJB and Lucene are not a good mix - don't force it just to 
be buzzword compliant.

	Erik


On Aug 20, 2004, at 4:32 AM, Rupinder Singh Mazara wrote:

> hi all
>
>    purely due to a policy decision, we would like to host our lucene 
> search
> application , in a j2ee container, preferable by means of a ejb.
> Since access to java.io is restricted by the ejb specification, what 
> would
> be the best way to create desgin the application ?
>   i have taken a look at ejindex@sf.net but it my relies on mbeans and 
> not a
> session bean
>   does any one have pointers or samples that can be looked at
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org