You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by John Wang <jo...@gmail.com> on 2004/07/20 17:28:28 UTC

lucene cutomized indexing

Hi:
   I am trying to store some Databased like field values into lucene.
I have my own way of storing field values in a customized format.

   I guess my question is wheather we can make the Reader/Writer
classes, e.g. FieldReader, FieldWriter, DocumentReader/Writer classes
non-final?

   I have asked to make the Lucene API less restrictive many many many
times but got no replies. Is this request feasible?

Thanks

-John

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: lucene cutomized indexing

Posted by John Wang <jo...@gmail.com>.

That is what exactly they did and that's probably what I have to do.
But that means we are diverging from the lucene code base and future
fixes and enhancements need to be synchronized and that maybe a pain.

-John

On Tue, 20 Jul 2004 20:03:05 +0200, Daniel Naber
<da...@t-online.de> wrote:
> On Tuesday 20 July 2004 18:12, John Wang wrote:
> 
> > They make sure during deployment their "versions"
> > gets loaded before the same classes in the lucene .jar.
> 
> I don't see why people cannot just make their own lucene.jar. Just remove
> the "final" and recompile. Finally, Lucene is Open Source.
> 
> Regards
> Daniel
> 
> --
> http://www.danielnaber.de
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: lucene cutomized indexing

Posted by Daniel Naber <da...@t-online.de>.

On Tuesday 20 July 2004 18:12, John Wang wrote:

> They make sure during deployment their "versions"
> gets loaded before the same classes in the lucene .jar.

I don't see why people cannot just make their own lucene.jar. Just remove 
the "final" and recompile. Finally, Lucene is Open Source.

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: lucene cutomized indexing

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Jul 20, 2004, at 2:10 PM, John Wang wrote:
>> I have already provided my opinion on this one - I think it would be
>> fine to allow Token to be public.  I'll let others respond to the
>> additional requests you've made.
>
> Great, what processes need to be in place before this gets in the code 
> base?

You're doing the right thing.  Although codebase details are most 
appropriate for the lucene-dev list.  And filing issues in Bugzilla 
ensures your requests do not get lost e-mail inboxes.

At this point, Lucene 1.4 has been released and Doug has put forth a 
proposal for Lucene 2.0 (with a migration path of a version 1.9 
intermediate release).  I'm not sure when the best time is to make this 
change.  We should put API changes to a VOTE on the lucene-dev list 
though.  In fact, I'll post a VOTE for Token now! :)

>> Then they should speak up :)
>
> Well, I AM speaking up. So have some other people in earlier emails.
> But alike me, are getting ignored.

You are not being ignored - not at all.  Look at the replies you've 
gotten already.

>  The HayStack changes were needed
> specifically due to the fact that many classes are declared to be
> final and not extensible.

Did they post their changes back?  Did they discuss them here?  I do 
not recall such discussions (although see above about being lost in 
e-mail inboxes - mine is swamped beyond belief).  Are there Bugzilla 
issues with their patches?

>> Making things extensible for no good reason is asking for maintenance
>> troubles later when you need more control internally.  Lucene has been
>> well designed from the start with extensibility only where it was
>> needed in mind.  It has evolved to be more open in very specific areas
>> after careful consideration of the performance impact has been 
>> weighed.
>>  "Breaking" is not really the concern with extensibility, I don't
>> think.  Real-world use cases are needed to show that changes need to 
>> be
>> made.
>
> I thought I gave many "real-world use cases" in the previous email.
> And evidently also applies to the Haystack project. What other
> information do we need to provide?

I was not referring to your requests in my comment, but rather a 
general comment regarding requests to make things "public" when quite 
sufficient alternatives exist.

	Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: lucene cutomized indexing

Posted by John Wang <jo...@gmail.com>.

On Tue, 20 Jul 2004 13:40:28 -0400, Erik Hatcher
<er...@ehatchersolutions.com> wrote:
> On Jul 20, 2004, at 12:12 PM, John Wang wrote:
> >      There are few things I want to do to be able to customize lucene:
> >
> [...]
> >
> > 3) to be able to customize analyzers to add more information to the
> > Token while doing tokenization.
> 
> I have already provided my opinion on this one - I think it would be
> fine to allow Token to be public.  I'll let others respond to the
> additional requests you've made.

Great, what processes need to be in place before this gets in the code base? 
> 
> > Oleg mentioned about the HayStack project. In the HayStack source
> > code, they had to modifiy many lucene class to make them non-final in
> > order to customzie. They make sure during deployment their "versions"
> > gets loaded before the same classes in the lucene .jar. It is
> > cumbersome, but it is a Lucene restriction they had to live with.
> 
> Wow - I didn't realize that they've made local changes.  Did they post
> with requests for opening things up as you have?  Did they submit
> patches with their local changes?
> 
> > I believe there are many other users feel the same way.
> 
> Then they should speak up :)

Well, I AM speaking up. So have some other people in earlier emails.
But alike me, are getting ignored. The HayStack changes were needed
specifically due to the fact that many classes are declared to be
final and not extensible.

> 
> > If I write some classes that derives from the lucene API and it
> > breaks, then it is my responsibility to fix it. I don't understand why
> > it would add burden to the Lucene developers.
> 
> Making things extensible for no good reason is asking for maintenance
> troubles later when you need more control internally.  Lucene has been
> well designed from the start with extensibility only where it was
> needed in mind.  It has evolved to be more open in very specific areas
> after careful consideration of the performance impact has been weighed.
>  "Breaking" is not really the concern with extensibility, I don't
> think.  Real-world use cases are needed to show that changes need to be
> made.

I thought I gave many "real-world use cases" in the previous email.
And evidently also applies to the Haystack project. What other
information do we need to provide?

I don't want to diverge from the Lucene codebase like Haystack has
done. But I may not have a choice.

Thanks

-John

> 
>        Erik
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: lucene cutomized indexing

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Jul 20, 2004, at 12:12 PM, John Wang wrote:
>      There are few things I want to do to be able to customize lucene:
>
[...]
>
> 3) to be able to customize analyzers to add more information to the
> Token while doing tokenization.

I have already provided my opinion on this one - I think it would be 
fine to allow Token to be public.  I'll let others respond to the 
additional requests you've made.

> Oleg mentioned about the HayStack project. In the HayStack source
> code, they had to modifiy many lucene class to make them non-final in
> order to customzie. They make sure during deployment their "versions"
> gets loaded before the same classes in the lucene .jar. It is
> cumbersome, but it is a Lucene restriction they had to live with.

Wow - I didn't realize that they've made local changes.  Did they post 
with requests for opening things up as you have?  Did they submit 
patches with their local changes?

> I believe there are many other users feel the same way.

Then they should speak up :)

> If I write some classes that derives from the lucene API and it
> breaks, then it is my responsibility to fix it. I don't understand why
> it would add burden to the Lucene developers.

Making things extensible for no good reason is asking for maintenance 
troubles later when you need more control internally.  Lucene has been 
well designed from the start with extensibility only where it was 
needed in mind.  It has evolved to be more open in very specific areas 
after careful consideration of the performance impact has been weighed. 
  "Breaking" is not really the concern with extensibility, I don't 
think.  Real-world use cases are needed to show that changes need to be 
made.

	Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: lucene cutomized indexing

Posted by John Wang <jo...@gmail.com>.

Hi Daniel:

     There are few things I want to do to be able to customize lucene:

1) to be able to plug in a different similarity model (e.g. bayesian,
vector space etc.)

2) to be able to store certain fields in its own format and provide
corresponding readers. I may not want to store every field in the
lexicon/inverted index structure. I may have fields that doesn't make
sense to store the position or frequency information.

3) to be able to customize analyzers to add more information to the
Token while doing tokenization.

Oleg mentioned about the HayStack project. In the HayStack source
code, they had to modifiy many lucene class to make them non-final in
order to customzie. They make sure during deployment their "versions"
gets loaded before the same classes in the lucene .jar. It is
cumbersome, but it is a Lucene restriction they had to live with.

I believe there are many other users feel the same way. 

If I write some classes that derives from the lucene API and it
breaks, then it is my responsibility to fix it. I don't understand why
it would add burden to the Lucene developers.

Thanks

-John

On Tue, 20 Jul 2004 17:56:26 +0200, Daniel Naber
<da...@t-online.de> wrote:
> On Tuesday 20 July 2004 17:28, John Wang wrote:
> 
> >    I have asked to make the Lucene API less restrictive many many many
> > times but got no replies.
> 
> I suggest you just change it in your source and see if it works. Then you can
> still explain what exactly you did and why it's useful. From the developers
> point-of-view having things non-final means more stuff is exposed and making
> changes is more difficult (unless one accepts that derived classes may break
> with the next update).
> 
> Regards
> Daniel
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: lucene cutomized indexing

Posted by Daniel Naber <da...@t-online.de>.

On Tuesday 20 July 2004 17:28, John Wang wrote:

>    I have asked to make the Lucene API less restrictive many many many
> times but got no replies.

I suggest you just change it in your source and see if it works. Then you can 
still explain what exactly you did and why it's useful. From the developers 
point-of-view having things non-final means more stuff is exposed and making 
changes is more difficult (unless one accepts that derived classes may break 
with the next update).

Regards
 Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org