You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Jerven Bolleman <je...@isb-sib.ch> on 2010/06/29 11:24:33 UTC

Unsupported operation in TermDocs.next() when migrating from 2.4 to 2.9

Hi All,

I am finally having some time to upgrade our lucene from the 2.4 series 
to the 2.9 series. And I am having a problem that while everything 
compiles great I am getting a new UnsupportedOperationException.


java.lang.UnsupportedOperationException
	at 
org.apache.lucene.index.AbstractAllTermDocs.seek(AbstractAllTermDocs.java:42)
	at 
org.apache.lucene.index.DirectoryReader$MultiTermDocs.termDocs(DirectoryReader.java:1186)
	at 
org.apache.lucene.index.DirectoryReader$MultiTermDocs.next(DirectoryReader.java:1118)
	at 
org.expasy.core.index.SubQueryFilter.fastForLargeResultSets(SubQueryFilter.java:129)

I copied in the code that calls this. See an explanation of what it 
tries to achieve underneath.

private void fastForLargeResultSets(String foreignField, BitSet bits, 
TermDocs docs, TermDocs foreignDocs, IndexReader foreignReader, BitSet 
queryResults)
	throws IOException
{
	int start = queryResults.nextSetBit(0);
	TermEnum foreignEnum = foreignReader.terms(new Term(foreignField, ""));
	while (foreignEnum.next())
		{
		Term term = foreignEnum.term();
		if (term == null || !term.field().equals(foreignField))
			break;
		if (!term.text().equals("not_null"))
		{
			foreignDocs.skipTo(start);
			foreignDocs.seek(term);
//Source of exception in my code
			while (foreignDocs.next())
			{
				int doc = foreignDocs.doc();
				if (queryResults.get(doc))
				{
					foreignDocs.skipTo(doc);
					if (term != null && term.text() != null)
						buffer.add(term.text());
				}
// Use a buffer to avoid jumping around on disk to much.
//
				if (buffer.size() >= BUFFERSIZE)
				{
					emptyBuffer(buffer, bits, docs);
				}
			}
		}
	}

	if (!buffer.isEmpty())
	{
		emptyBuffer(buffer, bits, docs);
	}
}

The purpose of this code is to fill a bitset as a filter. The filter is 
used to find documents in index a who have a linking key value to them 
in index b.

While resource intensive this code path was quite fast for when you have 
multimillion documents in index b pointing to multimillion documents in 
index b.

i.e. it creates a "join" between two queries on different indexes.

for a live example
http://www.uniprot.org/uniprot/?query=citation%3A%28author%3Afink%29
this a search for fink in the field author in the "citation" index.
For each document in the "citation" index that matches term "fink" in 
the field "author" retrieve the terms that contain an uniquely 
identifying key value for documents in the "uniprot" index. Generate a 
bitset to use in filtering the documents in the "uniprot" index (done in 
the emptybuffer method).

Is this a bug? and does anyone have ideas for an effective (maybe 
superior) work around?

Regards and thanks for a great project!

Jerven

Re: Unsupported operation in TermDocs.next() when migrating from 2.4 to 2.9

Posted by Michael McCandless <lu...@mikemccandless.com>.

That is spooky.  It certainly sounds like a regression.

It's odd that your MultiTermEnum is pulling an AllTermDocs under the
hood -- this should only happen if you did a .seek(null) on it, but
your code seems to first check that term != null, so it should never
pass a null term.

Can you add a temporary assert to DirectoryReader.java, in 29x, around
line 1191.  It should be this method:

    protected TermDocs termDocs(IndexReader reader)
      throws IOException {
      return term==null ? reader.termDocs(null) : reader.termDocs();
    }

Add an assert term != null, and run you code w/ assertions on, and see
if it trips (the assert is not safe, in general, but should not trip
in how I think you are using it).  If it does trip... try to track
down how a null term got in there?

Mike

On Tue, Jun 29, 2010 at 5:24 AM, Jerven Bolleman
<je...@isb-sib.ch> wrote:
> Hi All,
>
> I am finally having some time to upgrade our lucene from the 2.4 series to
> the 2.9 series. And I am having a problem that while everything compiles
> great I am getting a new UnsupportedOperationException.
>
>
> java.lang.UnsupportedOperationException
>        at
> org.apache.lucene.index.AbstractAllTermDocs.seek(AbstractAllTermDocs.java:42)
>        at
> org.apache.lucene.index.DirectoryReader$MultiTermDocs.termDocs(DirectoryReader.java:1186)
>        at
> org.apache.lucene.index.DirectoryReader$MultiTermDocs.next(DirectoryReader.java:1118)
>        at
> org.expasy.core.index.SubQueryFilter.fastForLargeResultSets(SubQueryFilter.java:129)
>
> I copied in the code that calls this. See an explanation of what it tries to
> achieve underneath.
>
> private void fastForLargeResultSets(String foreignField, BitSet bits,
> TermDocs docs, TermDocs foreignDocs, IndexReader foreignReader, BitSet
> queryResults)
>        throws IOException
> {
>        int start = queryResults.nextSetBit(0);
>        TermEnum foreignEnum = foreignReader.terms(new Term(foreignField,
> ""));
>        while (foreignEnum.next())
>                {
>                Term term = foreignEnum.term();
>                if (term == null || !term.field().equals(foreignField))
>                        break;
>                if (!term.text().equals("not_null"))
>                {
>                        foreignDocs.skipTo(start);
>                        foreignDocs.seek(term);
> //Source of exception in my code
>                        while (foreignDocs.next())
>                        {
>                                int doc = foreignDocs.doc();
>                                if (queryResults.get(doc))
>                                {
>                                        foreignDocs.skipTo(doc);
>                                        if (term != null && term.text() !=
> null)
>                                                buffer.add(term.text());
>                                }
> // Use a buffer to avoid jumping around on disk to much.
> //
>                                if (buffer.size() >= BUFFERSIZE)
>                                {
>                                        emptyBuffer(buffer, bits, docs);
>                                }
>                        }
>                }
>        }
>
>        if (!buffer.isEmpty())
>        {
>                emptyBuffer(buffer, bits, docs);
>        }
> }
>
> The purpose of this code is to fill a bitset as a filter. The filter is used
> to find documents in index a who have a linking key value to them in index
> b.
>
> While resource intensive this code path was quite fast for when you have
> multimillion documents in index b pointing to multimillion documents in
> index b.
>
> i.e. it creates a "join" between two queries on different indexes.
>
> for a live example
> http://www.uniprot.org/uniprot/?query=citation%3A%28author%3Afink%29
> this a search for fink in the field author in the "citation" index.
> For each document in the "citation" index that matches term "fink" in the
> field "author" retrieve the terms that contain an uniquely identifying key
> value for documents in the "uniprot" index. Generate a bitset to use in
> filtering the documents in the "uniprot" index (done in the emptybuffer
> method).
>
> Is this a bug? and does anyone have ideas for an effective (maybe superior)
> work around?
>
> Regards and thanks for a great project!
>
> Jerven
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org