You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Jason Rutherglen <ja...@gmail.com> on 2009/03/24 20:07:31 UTC

MergePolicy public but SegmentInfos package protected?

I'm overriding MergePolicy which is public, however SegmentInfos is package
protected which means the MergePolicy subclass must be in the
org.apache.lucene.index package.  Can we make SegmentInfos public?

Re: MergePolicy public but SegmentInfos package protected?

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Fri, Mar 27, 2009 at 4:22 PM, Marvin Humphrey <ma...@rectangular.com> wrote:

> I think the difference here is that Lucene gets to use multiple threads within
> one process, while Lucy has to at least be capable of using a multiple-process
> concurrency model in order to support real-time search for non-threaded hosts.

OK that was my confusion.  I had forgotten that if the updater process
goes off and does some merging, that means it's blocked on
adding/deleting docs.  In Lucene these are separate threads.

I think we hashed this out already and I just forgot!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: MergePolicy public but SegmentInfos package protected?

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Fri, Mar 27, 2009 at 03:59:09PM -0400, Michael McCandless wrote:

> >> Why must merge policy be made public for realtime search? [In Lucy]
> >
> > Because real-time search under Lucy needs to be able to operate using multiple
> > write processes, since threads will not always be available.
> >
> > You need to be able to tell one indexer *not* to merge anything when
> > performing fast updates, and you need to be able to tell another indexer what
> > to merge when performing background consolidation.
> 
> Is this because you want to not swamp IO system?  

No, the goal is to reduce the worst-case latency between adding new docs or
deletions to the index and being able to see the changes in a search.  

The fast updater should not decide that it's going to merge some big segment
before it writes a snapshot file, because that will cause a sudden spike in
latency.  So we thwart that by assigning it a merge policy to the effect of
"make only small merges on recently added material, or don't merge at all".

But of course we can't keep adding small segments to the index forever, so we
need a background consolidator process.  That process also needs a custom
merge policy.

> Ie you're emulating IO prioritization.  (Which I think makes sense, but,
> it's more of an optimization than purely necessary for realtime search).

How else do we stop the fast updater from applying the default merge policy,
which has poor worst-case latency?

> In the prototype near realtime search in Lucene (on LUCENE-1516), it's
> fully independent of the merge policy (but, yes, a smarter merge
> policy can reduce the turnaround times).

I think the difference here is that Lucene gets to use multiple threads within
one process, while Lucy has to at least be capable of using a multiple-process
concurrency model in order to support real-time search for non-threaded hosts.

Marvin Humphrey

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: MergePolicy public but SegmentInfos package protected?

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Fri, Mar 27, 2009 at 1:12 PM, Marvin Humphrey <ma...@rectangular.com> wrote:

>> Why must merge policy be made public for realtime search? [In Lucy]
>
> Because real-time search under Lucy needs to be able to operate using multiple
> write processes, since threads will not always be available.
>
> You need to be able to tell one indexer *not* to merge anything when
> performing fast updates, and you need to be able to tell another indexer what
> to merge when performing background consolidation.

Is this because you want to not swamp IO system?  Ie you're emulating
IO prioritization.  (Which I think makes sense, but, it's more of an
optimization than purely necessary for realtime search).

In the prototype near realtime search in Lucene (on LUCENE-1516), it's
fully independent of the merge policy (but, yes, a smarter merge
policy can reduce the turnaround times).

>> > Actually, if you're not warming sort caches, launching a Lucene IndexReader
>> > isn't obscenely expensive any more -- just expensive.  Right?
>>
>> We load deleted docs on init (1 bit per doc = fast), terms index (=
>> alot of stuff every 128 terms = maybe slow), norms on the first search
>> that hits that field (1 byte per doc = probably OK), and FieldCache on
>> first search that uses it.  So "it depends" I guess?
>
> For the purposes of MergePolicy, all you would need are the doc counts and the
> delcounts, and optionally other stuff in SegmentInfos.  In theory you could
> lazy load the other stuff like the term dictionary index.  Obviously that
> would be an unacceptable behavioral change, but it's worth noting.

True.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: MergePolicy public but SegmentInfos package protected?

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Fri, Mar 27, 2009 at 12:39:05PM -0400, Michael McCandless wrote:

> Why must merge policy be made public for realtime search? [In Lucy]

Because real-time search under Lucy needs to be able to operate using multiple
write processes, since threads will not always be available.

You need to be able to tell one indexer *not* to merge anything when
performing fast updates, and you need to be able to tell another indexer what
to merge when performing background consolidation.

Looking down from a high level, what I think will work is to supply an
"IndexManager" argument to the indexer's constructor which controls all
merge-related behavior, and to provide FastUpdateManager and
BackgroundMergeManager classes which implement the desired policies.

> > Actually, if you're not warming sort caches, launching a Lucene IndexReader
> > isn't obscenely expensive any more -- just expensive.  Right?
> 
> We load deleted docs on init (1 bit per doc = fast), terms index (=
> alot of stuff every 128 terms = maybe slow), norms on the first search
> that hits that field (1 byte per doc = probably OK), and FieldCache on
> first search that uses it.  So "it depends" I guess?

For the purposes of MergePolicy, all you would need are the doc counts and the
delcounts, and optionally other stuff in SegmentInfos.  In theory you could
lazy load the other stuff like the term dictionary index.  Obviously that
would be an unacceptable behavioral change, but it's worth noting.

Marvin Humphrey

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: MergePolicy public but SegmentInfos package protected?

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Fri, Mar 27, 2009 at 12:13 PM, Marvin Humphrey
<ma...@rectangular.com> wrote:

>> Whereas in Lucene neither MultiSegmentReader nor SegmentReader is public.
>
> I had thought making SegmentReader public was at least under consideration as
> part of the implementation for segment-centric sorted search, but I guess it
> turned out not to be necessary.  Still, you have
> IndexReader.getSequentialSubReaders().  That might be enough -- at least for
> this part of the problem.  :)

Yes, enough for now I suppose.  Though we have LUCENE-831 up next
(fixing FieldCache API).

>> > As for the actual implementation of MergePolicy, I haven't prototyped that out
>> > yet.  Right now in KS, the infrastructure is reasonably primitive:
>> > IndexManager has a method called SegReaders_To_Merge() which accepts a
>> > PolyReader as an argument and returns an array of SegReaders representing
>> > content that should be merged.
>>
>> KS does the fibonacci merge policy right?
>
> Yes.
>
> SegReaders_To_Merge is overridden in certain parts of the test suite, but it's
> not yet public.  However, control over merging policy will soon *have* to be
> made public somehow in order to support real-time indexing, so working out an
> API is on my near-term agenda.

Why must merge policy be made public for realtime search?

>> >> Even though Lucy's SegmentReader is lighter weight, it still seems
>> >> like you shouldn't be opening them in the writer (except for realtime
>> >> search)?
>> >
>> > I don't see why not.
>>
>> But it still ties up resources?
>
> Not enough to worry about, I believe.

Hmm OK.

>> EG mmap uses up chunks of your address space (possibly important on 32 bit
>> machines,
>
> This is an important concern, but I believe that design-wise, we have a
> solution[1] -- on 32-bit systems, we only mmap sliding windows rather than
> whole files.

Nice!

> Furthermore, mmap is called with the MAP_SHARED flag, so IndexReaders across
> multiple processes hitting the same exact memory segment get to share it.
> (This is more important under 64-bit systems, where we do map the whole file
> straightaway.)

Great.

>> opening files takes time & descriptors, etc.
>
> Launching an IndexReader is still plenty fast.
>
> Actually, if you're not warming sort caches, launching a Lucene IndexReader
> isn't obscenely expensive any more -- just expensive.  Right?

We load deleted docs on init (1 bit per doc = fast), terms index (=
alot of stuff every 128 terms = maybe slow), norms on the first search
that hits that field (1 byte per doc = probably OK), and FieldCache on
first search that uses it.  So "it depends" I guess?

> [1] At least on Unixen.  I believe we can support all of this using Windows
>    MapViewOfFile and friends, and I had a crude prototype working before, but
>    right now Windows is still using the old-school load-into-process-memory
>    style.

Excellent!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: MergePolicy public but SegmentInfos package protected?

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Fri, Mar 27, 2009 at 11:09:09AM -0400, Michael McCandless wrote:

> Whereas in Lucene neither MultiSegmentReader nor SegmentReader is public.

I had thought making SegmentReader public was at least under consideration as
part of the implementation for segment-centric sorted search, but I guess it
turned out not to be necessary.  Still, you have
IndexReader.getSequentialSubReaders().  That might be enough -- at least for
this part of the problem.  :)

> > As for the actual implementation of MergePolicy, I haven't prototyped that out
> > yet.  Right now in KS, the infrastructure is reasonably primitive:
> > IndexManager has a method called SegReaders_To_Merge() which accepts a
> > PolyReader as an argument and returns an array of SegReaders representing
> > content that should be merged.
> 
> KS does the fibonacci merge policy right?

Yes.  

SegReaders_To_Merge is overridden in certain parts of the test suite, but it's
not yet public.  However, control over merging policy will soon *have* to be
made public somehow in order to support real-time indexing, so working out an
API is on my near-term agenda.

> >> Even though Lucy's SegmentReader is lighter weight, it still seems
> >> like you shouldn't be opening them in the writer (except for realtime
> >> search)?
> >
> > I don't see why not.
> 
> But it still ties up resources?  

Not enough to worry about, I believe.

> EG mmap uses up chunks of your address space (possibly important on 32 bit
> machines, 

This is an important concern, but I believe that design-wise, we have a
solution[1] -- on 32-bit systems, we only mmap sliding windows rather than
whole files.

Furthermore, mmap is called with the MAP_SHARED flag, so IndexReaders across
multiple processes hitting the same exact memory segment get to share it.
(This is more important under 64-bit systems, where we do map the whole file
straightaway.)

> opening files takes time & descriptors, etc.

Launching an IndexReader is still plenty fast.

Actually, if you're not warming sort caches, launching a Lucene IndexReader
isn't obscenely expensive any more -- just expensive.  Right?

Marvin Humphrey

[1] At least on Unixen.  I believe we can support all of this using Windows
    MapViewOfFile and friends, and I had a crude prototype working before, but
    right now Windows is still using the old-school load-into-process-memory
    style.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: MergePolicy public but SegmentInfos package protected?

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Fri, Mar 27, 2009 at 9:48 AM, Marvin Humphrey <ma...@rectangular.com> wrote:

> Every indexer opens a PolyReader [analogous to MultiSegmentReader], even when
> there's no data in the index, or a single segment. (I modified PolyReader so
> that 1-segment and 0-segment states were officially valid for this purpose.)

OK

> PolyReader is a public class, as is SegReader, and PolyReader allows access to
> its subreaders via a Get_Seg_Readers() accessor.  Once we have SegReaders in
> hand, we can get at per-segment doc counts and deletion counts, which I think
> will be sufficient for planning a merge.

OK.  Whereas in Lucene neither MultiSegmentReader nor SegmentReader is public.

> As for the actual implementation of MergePolicy, I haven't prototyped that out
> yet.  Right now in KS, the infrastructure is reasonably primitive:
> IndexManager has a method called SegReaders_To_Merge() which accepts a
> PolyReader as an argument and returns an array of SegReaders representing
> content that should be merged.

KS does the fibonacci merge policy right?

>> Even though Lucy's SegmentReader is lighter weight, it still seems
>> like you shouldn't be opening them in the writer (except for realtime
>> search)?
>
> I don't see why not.

But it still ties up resources?  EG mmap uses up chunks of your
address space (possibly important on 32 bit machines, eg if you want a
large ram buffer in the writer), opening files takes time &
descriptors, etc.

> When I built this into KS, I thought I was imitating your plan for Lucene.  :)

I think for the time being we'll still allow "pure IndexWriter".

>> Are you going to simply make the segment metadata public?
>
> If you're referring to Segment's Fetch_Metadata method, it's public.  If it
> weren't, then plugin components couldn't use it, which would be unfortunate.
> I think we ought to make it easy for plugins to store their metadata as JSON
> inside segmeta.json file rather than resort to writing it in their own
> proprietary formats.

Yes.

> In theory, any index component can peer into another's metadata, just by
> invoking Seg_Fetch_Metadata(segment, other_component_name).  That would be a
> bad idea, though, because what a component might choose to store there isn't
> public.

I think it's fine to not guard against it.  It's the same "we are all
consenting adults" approach that Python takes.

> I do expect that we will codify what data will be present in parts of
> segmeta.json as part of the official Lucy file format spec, however.  If you
> were both dumb and determined, you could duplicate all the version checking
> code and adhere to the spec, making it possible to (maybe) safely interpret
> that data.

OK

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: MergePolicy public but SegmentInfos package protected?

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Fri, Mar 27, 2009 at 08:08:20AM -0400, Michael McCandless wrote:

> Actually, how will Lucy do this?

Every indexer opens a PolyReader [analogous to MultiSegmentReader], even when
there's no data in the index, or a single segment. (I modified PolyReader so
that 1-segment and 0-segment states were officially valid for this purpose.)

PolyReader is a public class, as is SegReader, and PolyReader allows access to
its subreaders via a Get_Seg_Readers() accessor.  Once we have SegReaders in
hand, we can get at per-segment doc counts and deletion counts, which I think
will be sufficient for planning a merge.

As for the actual implementation of MergePolicy, I haven't prototyped that out
yet.  Right now in KS, the infrastructure is reasonably primitive:
IndexManager has a method called SegReaders_To_Merge() which accepts a
PolyReader as an argument and returns an array of SegReaders representing
content that should be merged.

> Even though Lucy's SegmentReader is lighter weight, it still seems
> like you shouldn't be opening them in the writer (except for realtime
> search)?  

I don't see why not.

When I built this into KS, I thought I was imitating your plan for Lucene.  :)

> Are you going to simply make the segment metadata public?

If you're referring to Segment's Fetch_Metadata method, it's public.  If it
weren't, then plugin components couldn't use it, which would be unfortunate.
I think we ought to make it easy for plugins to store their metadata as JSON
inside segmeta.json file rather than resort to writing it in their own
proprietary formats.

In theory, any index component can peer into another's metadata, just by
invoking Seg_Fetch_Metadata(segment, other_component_name).  That would be a
bad idea, though, because what a component might choose to store there isn't
public.  

I do expect that we will codify what data will be present in parts of
segmeta.json as part of the official Lucy file format spec, however.  If you
were both dumb and determined, you could duplicate all the version checking
code and adhere to the spec, making it possible to (maybe) safely interpret
that data.

Marvin Humphrey

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: MergePolicy public but SegmentInfos package protected?

Posted by Michael McCandless <lu...@mikemccandless.com>.

Actually, how will Lucy do this?

Even though Lucy's SegmentReader is lighter weight, it still seems
like you shouldn't be opening them in the writer (except for realtime
search)?  What's your plan?  Are you going to simply make the segment
metadata public?

Mike

On Thu, Mar 26, 2009 at 9:51 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
> On Thu, Mar 26, 2009 at 07:06:26AM -0400, Michael McCandless wrote:
>
>> We'd need to add a few methods to IndexReader,
>
> Eep.  IndexReader's too big as it is.
>
>> eg querying whether
>> compound file format is in use, whether separate norms are stored,
>> "get me total size in bytes of all files" (or maybe just "get me all
>> files", plus utility method somewhere to add up the sizes), so this
>> approach seems doable.
>
> Do you really need all that?  I think the crucial info is already available:
>
>  * The number of docs in each segment.
>  * The number of deletions in each segment, allowing you to calculate the
>    deletion percentage.
>
> I think it's reasonable to assume an average distribution of document sizes
> across segments.  Sure, that'll be wrong at the long tail of the curve, but
> most of the time it will be right -- and even when it's not, it won't cause
> big problems.
>
>> But: we don't yet have IndexWriter holding open a reader for every
>> segment.  We are working on realtime search (LUCENE-1516), but even
>> then, if you don't ask for a realtime reader from IndexWriter, it
>> won't hold open SegmentReaders for all segments.
>
> Yeah, that's gonna be a bigger problem.  :(  It's cake to give Lucy's indexer
> a reader, because opening readers is cheap.  But the Lucene heavy-IndexReader
> model messes that up -- IndexWriter has traditionally been a fast class to
> open.
>
> Marvin Humphrey
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: MergePolicy public but SegmentInfos package protected?

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Thu, Mar 26, 2009 at 9:51 PM, Marvin Humphrey <ma...@rectangular.com> wrote:

>> eg querying whether
>> compound file format is in use, whether separate norms are stored,
>> "get me total size in bytes of all files" (or maybe just "get me all
>> files", plus utility method somewhere to add up the sizes), so this
>> approach seems doable.
>
> Do you really need all that?  I think the crucial info is already available:
>
>  * The number of docs in each segment.
>  * The number of deletions in each segment, allowing you to calculate the
>    deletion percentage.

I'm just going w/ the info that Log*MergePolicy use today -- checking
CFS, separate dels & norms, is done for "isOptimized"; oh, actually
IndexReader has an isOptimized(), which we could simply use, instead.

> I think it's reasonable to assume an average distribution of document sizes
> across segments.  Sure, that'll be wrong at the long tail of the curve, but
> most of the time it will be right -- and even when it's not, it won't cause
> big problems.

Yeah this might be acceptable in practice, though users who add a
bunch of tiny docs followed by a bunch of big docs (or v/v) may see
poor merge choices.  Maybe in practice it wouldn't be a big deal.

>> But: we don't yet have IndexWriter holding open a reader for every
>> segment.  We are working on realtime search (LUCENE-1516), but even
>> then, if you don't ask for a realtime reader from IndexWriter, it
>> won't hold open SegmentReaders for all segments.
>
> Yeah, that's gonna be a bigger problem.  :(  It's cake to give Lucy's indexer
> a reader, because opening readers is cheap.  But the Lucene heavy-IndexReader
> model messes that up -- IndexWriter has traditionally been a fast class to
> open.

Right, this one seems like the deal breaker: IndexWriter should not in
general go and pool readers on all segments in the index.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: MergePolicy public but SegmentInfos package protected?

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Thu, Mar 26, 2009 at 07:06:26AM -0400, Michael McCandless wrote:

> We'd need to add a few methods to IndexReader, 

Eep.  IndexReader's too big as it is.  

> eg querying whether
> compound file format is in use, whether separate norms are stored,
> "get me total size in bytes of all files" (or maybe just "get me all
> files", plus utility method somewhere to add up the sizes), so this
> approach seems doable.

Do you really need all that?  I think the crucial info is already available:

  * The number of docs in each segment.
  * The number of deletions in each segment, allowing you to calculate the
    deletion percentage.

I think it's reasonable to assume an average distribution of document sizes
across segments.  Sure, that'll be wrong at the long tail of the curve, but
most of the time it will be right -- and even when it's not, it won't cause
big problems.

> But: we don't yet have IndexWriter holding open a reader for every
> segment.  We are working on realtime search (LUCENE-1516), but even
> then, if you don't ask for a realtime reader from IndexWriter, it
> won't hold open SegmentReaders for all segments.

Yeah, that's gonna be a bigger problem.  :(  It's cake to give Lucy's indexer
a reader, because opening readers is cheap.  But the Lucene heavy-IndexReader
model messes that up -- IndexWriter has traditionally been a fast class to
open.

Marvin Humphrey

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: MergePolicy public but SegmentInfos package protected?

Posted by Michael McCandless <lu...@mikemccandless.com>.

Marvin Humphrey <ma...@rectangular.com> wrote:
> On Wed, Mar 25, 2009 at 06:15:35AM -0400, Michael McCandless wrote:
>
>> I'm torn.  MergePolicy (and MergeScheduler) are "expected" to be
>> something expert users could alter; their API is designed to be
>> exposed & stable.  I think they should be visilbe in the javadocs.
>>
>> But, unfortunately, to do their job they must use other package
>> private APIs (SegmentInfos) which we intentionally would like to keep
>> more malleable.
>
> Is all the information that you'd need to perform the merge available via
> public methods on IndexReader and its descendants?  Does IndexWriter always
> have an IndexReader at its disposal yet?  And if the answer to those two
> questions is yes, can you refactor MergePolicy to work off of an IndexReader
> rather than a SegmentInfos?

Good idea...

We'd need to add a few methods to IndexReader, eg querying whether
compound file format is in use, whether separate norms are stored,
"get me total size in bytes of all files" (or maybe just "get me all
files", plus utility method somewhere to add up the sizes), so this
approach seems doable.

But: we don't yet have IndexWriter holding open a reader for every
segment.  We are working on realtime search (LUCENE-1516), but even
then, if you don't ask for a realtime reader from IndexWriter, it
won't hold open SegmentReaders for all segments.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: MergePolicy public but SegmentInfos package protected?

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Wed, Mar 25, 2009 at 06:15:35AM -0400, Michael McCandless wrote:

> I'm torn.  MergePolicy (and MergeScheduler) are "expected" to be
> something expert users could alter; their API is designed to be
> exposed & stable.  I think they should be visilbe in the javadocs.
> 
> But, unfortunately, to do their job they must use other package
> private APIs (SegmentInfos) which we intentionally would like to keep
> more malleable.

Is all the information that you'd need to perform the merge available via
public methods on IndexReader and its descendants?  Does IndexWriter always
have an IndexReader at its disposal yet?  And if the answer to those two
questions is yes, can you refactor MergePolicy to work off of an IndexReader
rather than a SegmentInfos?

Marvin Humphrey

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: MergePolicy public but SegmentInfos package protected?

Posted by Michael McCandless <lu...@mikemccandless.com>.

I'm torn.  MergePolicy (and MergeScheduler) are "expected" to be
something expert users could alter; their API is designed to be
exposed & stable.  I think they should be visilbe in the javadocs.

But, unfortunately, to do their job they must use other package
private APIs (SegmentInfos) which we intentionally would like to keep
more malleable.

Is there some way to make them package private, yet include them (and
only them, ie not all package private classes) in the javadocs?

At a minimum we should update the javadocs expressing this
issue.

Mike

Chris Hostetter <ho...@fucit.org> wrote:
>
> : I'd rather not make SegmentInfos public; it's a large API and we do
> : make changes to it as we change the index format.  It's also quite
> : internal to Lucene.
> :
> : Making your own MergePolicy/Scheduler is very much an "advanced" use
> : case... so I think it's acceptable to have to put it into o.a.l.index
> : package?
>
> i don't know enough aboutthe APIs involved to have an opinion on this, but
> your comments lead me to conclude: if we assume subclassing MergePolicy is
> so advanced that you should only do it in the o.a.l.index package, then
> shouldn't MergePolicy itself be pacakge protected?
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: MergePolicy public but SegmentInfos package protected?

Posted by Chris Hostetter <ho...@fucit.org>.

: I'd rather not make SegmentInfos public; it's a large API and we do
: make changes to it as we change the index format.  It's also quite
: internal to Lucene.
: 
: Making your own MergePolicy/Scheduler is very much an "advanced" use
: case... so I think it's acceptable to have to put it into o.a.l.index
: package?

i don't know enough aboutthe APIs involved to have an opinion on this, but 
your comments lead me to conclude: if we assume subclassing MergePolicy is 
so advanced that you should only do it in the o.a.l.index package, then 
shouldn't MergePolicy itself be pacakge protected?



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: MergePolicy public but SegmentInfos package protected?

Posted by Michael McCandless <lu...@mikemccandless.com>.

I'd rather not make SegmentInfos public; it's a large API and we do
make changes to it as we change the index format.  It's also quite
internal to Lucene.

Making your own MergePolicy/Scheduler is very much an "advanced" use
case... so I think it's acceptable to have to put it into o.a.l.index
package?

Mike

Jason Rutherglen <ja...@gmail.com> wrote:
> I'm overriding MergePolicy which is public, however SegmentInfos is package
> protected which means the MergePolicy subclass must be in the
> org.apache.lucene.index package.  Can we make SegmentInfos public?
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org