You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Burton-West, Tom" <tb...@umich.edu> on 2010/04/14 20:14:41 UTC
Bug in contrib/misc/HighFreqTerms.java?
When I try to run HighFreqTerms.java in Lucene Revision: 933722 I get the the exception appended below. I believe the line of code involved is a result of the flex indexing merge. Should I post this as a comment to LUCENE-2370 (Reintegrate flex branch into trunk)?
Or is there simply something wrong with my configuration?
Exception in thread "main" java.lang.UnsupportedOperationException: please use MultiFields.getFields if you really need a top level Fields (NOTE that it's usually better to work per segment instead)
at org.apache.lucene.index.DirectoryReader.fields(DirectoryReader.java:762)
at org.apache.lucene.misc.HighFreqTerms.main(HighFreqTerms.java:71)
Tom Burton-West
Re: Fix to contrib/misc/HighFreqTerms.java
Posted by Michael McCandless <lu...@mikemccandless.com>.
Thanks Tom!
On Mon, Apr 19, 2010 at 4:27 PM, Burton-West, Tom <tb...@umich.edu> wrote:
> Ok opened LUCENE-2403.
>
> I could make the change to make the two lines consistent but to use a BytesRef directly wouldn't Term.java need to use BytesRef instead of String, or is there a new flex "Term" class that uses a BytesRef to use?
There is no Term class that takes a BytesRef... other things need
this, too (eg TermQuery needs to accept a BytesRef). But we are
considering deprecating Term entirely (it's not used in that many
further places).
> Otherwise, TermInfo could change to use the name of the field and a BytesRef instead of a term.
+1 -- I think we should take this approach?
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
RE: Fix to contrib/misc/HighFreqTerms.java
Posted by "Burton-West, Tom" <tb...@umich.edu>.
Ok opened LUCENE-2403.
I could make the change to make the two lines consistent but to use a BytesRef directly wouldn't Term.java need to use BytesRef instead of String, or is there a new flex "Term" class that uses a BytesRef to use?
Otherwise, TermInfo could change to use the name of the field and a BytesRef instead of a term.
Tom
-----Original Message-----
From: Michael McCandless [mailto:lucene@mikemccandless.com]
Sent: Saturday, April 17, 2010 11:43 AM
To: java-dev@lucene.apache.org
Subject: Re: Fix to contrib/misc/HighFreqTerms.java
Ahh you're right!
Though, really, we should not be converting to String (flex terms in
general are an arbitrary byte[], not necessarily utf8). We should
just use a BytesRef directly in the key.
Can you open an issue for this Tom? Thanks!
Mike
On Fri, Apr 16, 2010 at 2:41 PM, Burton-West, Tom <tb...@umich.edu> wrote:
> Hi Mike,
>
> Thanks for making the fix and changing the display from bytes to utf8. It needs a very minor change:
> The latest fix converts to utf8 if you give a field argument on the command line but still shows bytes if you don't.
>
> Line 89 should parallel line 70 and use term.utf8ToString() instead of term.toString;
>
> 70 tiq.insertWithOverflow(new TermInfo(new Term(field, term.utf8ToString()), termsEnum.docFreq()));
> 89 tiq.insertWithOverflow(new TermInfo(new Term(field, term.toString()), terms.docFreq()));
>
> Tom
>
> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Wednesday, April 14, 2010 3:50 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Bug in contrib/misc/HighFreqTerms.java?
>
> OK I committed the fix. I ran it on a flex wikipedia index I had...
> it produces output like this:
>
> body:[3c 21 2d 2d] 509050
> body:[73 68 6f 75 6c 64] 515495
> body:[74 68 65 6e] 525176
> body:[74 69 74 6c 65] 525361
> body:[5b 5b 55 6e 69 74 65 64] 532586
> body:[6b 6e 6f 77 6e] 533558
> body:[75 6e 64 65 72] 536480
> body:[55 6e 69 74 65 64] 543746
>
> Which is not very readable, but, it does this because flex terms are
> arbitrary byte[], not necessarily utf8... maybe we should fix it to
> print both hex and String if we assume bytes are utf8?
>
> Mike
>
> On Wed, Apr 14, 2010 at 3:25 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> Ugh, I'll fix this.
>>
>> With the new flex API, you can't ask a composite (Multi/DirReader) for
>> its postings -- you have to go through the static methods on
>> MultiFields. I'm trying to put some distance b/w IndexReader and
>> composite readers... because I'd like to eventually deprecate them.
>> Ie, the composite readers should "hold" an ordered collection of
>> sub-readers, but should not themselves implement IndexReader's API, I
>> think.
>>
>> Thanks for raising this Tom,
>>
>> Mike
>>
>> On Wed, Apr 14, 2010 at 2:14 PM, Burton-West, Tom <tb...@umich.edu> wrote:
>>> When I try to run HighFreqTerms.java in Lucene Revision: 933722 I get the
>>> the exception appended below. I believe the line of code involved is a
>>> result of the flex indexing merge. Should I post this as a comment to
>>> LUCENE-2370 (Reintegrate flex branch into trunk)?
>>>
>>> Or is there simply something wrong with my configuration?
>>>
>>> Exception in thread "main" java.lang.UnsupportedOperationException: please
>>> use MultiFields.getFields if you really need a top level Fields (NOTE that
>>> it's usually better to work per segment instead)
>>> at
>>> org.apache.lucene.index.DirectoryReader.fields(DirectoryReader.java:762)
>>> at org.apache.lucene.misc.HighFreqTerms.main(HighFreqTerms.java:71)
>>>
>>> Tom Burton-West
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Fix to contrib/misc/HighFreqTerms.java
Posted by Michael McCandless <lu...@mikemccandless.com>.
Ahh you're right!
Though, really, we should not be converting to String (flex terms in
general are an arbitrary byte[], not necessarily utf8). We should
just use a BytesRef directly in the key.
Can you open an issue for this Tom? Thanks!
Mike
On Fri, Apr 16, 2010 at 2:41 PM, Burton-West, Tom <tb...@umich.edu> wrote:
> Hi Mike,
>
> Thanks for making the fix and changing the display from bytes to utf8. It needs a very minor change:
> The latest fix converts to utf8 if you give a field argument on the command line but still shows bytes if you don't.
>
> Line 89 should parallel line 70 and use term.utf8ToString() instead of term.toString;
>
> 70 tiq.insertWithOverflow(new TermInfo(new Term(field, term.utf8ToString()), termsEnum.docFreq()));
> 89 tiq.insertWithOverflow(new TermInfo(new Term(field, term.toString()), terms.docFreq()));
>
> Tom
>
> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Wednesday, April 14, 2010 3:50 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Bug in contrib/misc/HighFreqTerms.java?
>
> OK I committed the fix. I ran it on a flex wikipedia index I had...
> it produces output like this:
>
> body:[3c 21 2d 2d] 509050
> body:[73 68 6f 75 6c 64] 515495
> body:[74 68 65 6e] 525176
> body:[74 69 74 6c 65] 525361
> body:[5b 5b 55 6e 69 74 65 64] 532586
> body:[6b 6e 6f 77 6e] 533558
> body:[75 6e 64 65 72] 536480
> body:[55 6e 69 74 65 64] 543746
>
> Which is not very readable, but, it does this because flex terms are
> arbitrary byte[], not necessarily utf8... maybe we should fix it to
> print both hex and String if we assume bytes are utf8?
>
> Mike
>
> On Wed, Apr 14, 2010 at 3:25 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> Ugh, I'll fix this.
>>
>> With the new flex API, you can't ask a composite (Multi/DirReader) for
>> its postings -- you have to go through the static methods on
>> MultiFields. I'm trying to put some distance b/w IndexReader and
>> composite readers... because I'd like to eventually deprecate them.
>> Ie, the composite readers should "hold" an ordered collection of
>> sub-readers, but should not themselves implement IndexReader's API, I
>> think.
>>
>> Thanks for raising this Tom,
>>
>> Mike
>>
>> On Wed, Apr 14, 2010 at 2:14 PM, Burton-West, Tom <tb...@umich.edu> wrote:
>>> When I try to run HighFreqTerms.java in Lucene Revision: 933722 I get the
>>> the exception appended below. I believe the line of code involved is a
>>> result of the flex indexing merge. Should I post this as a comment to
>>> LUCENE-2370 (Reintegrate flex branch into trunk)?
>>>
>>> Or is there simply something wrong with my configuration?
>>>
>>> Exception in thread "main" java.lang.UnsupportedOperationException: please
>>> use MultiFields.getFields if you really need a top level Fields (NOTE that
>>> it's usually better to work per segment instead)
>>> at
>>> org.apache.lucene.index.DirectoryReader.fields(DirectoryReader.java:762)
>>> at org.apache.lucene.misc.HighFreqTerms.main(HighFreqTerms.java:71)
>>>
>>> Tom Burton-West
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
RE: Fix to contrib/misc/HighFreqTerms.java
Posted by "Burton-West, Tom" <tb...@umich.edu>.
Hi Mike,
Thanks for making the fix and changing the display from bytes to utf8. It needs a very minor change:
The latest fix converts to utf8 if you give a field argument on the command line but still shows bytes if you don't.
Line 89 should parallel line 70 and use term.utf8ToString() instead of term.toString;
70 tiq.insertWithOverflow(new TermInfo(new Term(field, term.utf8ToString()), termsEnum.docFreq()));
89 tiq.insertWithOverflow(new TermInfo(new Term(field, term.toString()), terms.docFreq()));
Tom
-----Original Message-----
From: Michael McCandless [mailto:lucene@mikemccandless.com]
Sent: Wednesday, April 14, 2010 3:50 PM
To: java-dev@lucene.apache.org
Subject: Re: Bug in contrib/misc/HighFreqTerms.java?
OK I committed the fix. I ran it on a flex wikipedia index I had...
it produces output like this:
body:[3c 21 2d 2d] 509050
body:[73 68 6f 75 6c 64] 515495
body:[74 68 65 6e] 525176
body:[74 69 74 6c 65] 525361
body:[5b 5b 55 6e 69 74 65 64] 532586
body:[6b 6e 6f 77 6e] 533558
body:[75 6e 64 65 72] 536480
body:[55 6e 69 74 65 64] 543746
Which is not very readable, but, it does this because flex terms are
arbitrary byte[], not necessarily utf8... maybe we should fix it to
print both hex and String if we assume bytes are utf8?
Mike
On Wed, Apr 14, 2010 at 3:25 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> Ugh, I'll fix this.
>
> With the new flex API, you can't ask a composite (Multi/DirReader) for
> its postings -- you have to go through the static methods on
> MultiFields. I'm trying to put some distance b/w IndexReader and
> composite readers... because I'd like to eventually deprecate them.
> Ie, the composite readers should "hold" an ordered collection of
> sub-readers, but should not themselves implement IndexReader's API, I
> think.
>
> Thanks for raising this Tom,
>
> Mike
>
> On Wed, Apr 14, 2010 at 2:14 PM, Burton-West, Tom <tb...@umich.edu> wrote:
>> When I try to run HighFreqTerms.java in Lucene Revision: 933722 I get the
>> the exception appended below. I believe the line of code involved is a
>> result of the flex indexing merge. Should I post this as a comment to
>> LUCENE-2370 (Reintegrate flex branch into trunk)?
>>
>> Or is there simply something wrong with my configuration?
>>
>> Exception in thread "main" java.lang.UnsupportedOperationException: please
>> use MultiFields.getFields if you really need a top level Fields (NOTE that
>> it's usually better to work per segment instead)
>> at
>> org.apache.lucene.index.DirectoryReader.fields(DirectoryReader.java:762)
>> at org.apache.lucene.misc.HighFreqTerms.main(HighFreqTerms.java:71)
>>
>> Tom Burton-West
>>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Bug in contrib/misc/HighFreqTerms.java?
Posted by Michael McCandless <lu...@mikemccandless.com>.
OK I committed the fix. I ran it on a flex wikipedia index I had...
it produces output like this:
body:[3c 21 2d 2d] 509050
body:[73 68 6f 75 6c 64] 515495
body:[74 68 65 6e] 525176
body:[74 69 74 6c 65] 525361
body:[5b 5b 55 6e 69 74 65 64] 532586
body:[6b 6e 6f 77 6e] 533558
body:[75 6e 64 65 72] 536480
body:[55 6e 69 74 65 64] 543746
Which is not very readable, but, it does this because flex terms are
arbitrary byte[], not necessarily utf8... maybe we should fix it to
print both hex and String if we assume bytes are utf8?
Mike
On Wed, Apr 14, 2010 at 3:25 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> Ugh, I'll fix this.
>
> With the new flex API, you can't ask a composite (Multi/DirReader) for
> its postings -- you have to go through the static methods on
> MultiFields. I'm trying to put some distance b/w IndexReader and
> composite readers... because I'd like to eventually deprecate them.
> Ie, the composite readers should "hold" an ordered collection of
> sub-readers, but should not themselves implement IndexReader's API, I
> think.
>
> Thanks for raising this Tom,
>
> Mike
>
> On Wed, Apr 14, 2010 at 2:14 PM, Burton-West, Tom <tb...@umich.edu> wrote:
>> When I try to run HighFreqTerms.java in Lucene Revision: 933722 I get the
>> the exception appended below. I believe the line of code involved is a
>> result of the flex indexing merge. Should I post this as a comment to
>> LUCENE-2370 (Reintegrate flex branch into trunk)?
>>
>> Or is there simply something wrong with my configuration?
>>
>> Exception in thread "main" java.lang.UnsupportedOperationException: please
>> use MultiFields.getFields if you really need a top level Fields (NOTE that
>> it's usually better to work per segment instead)
>> at
>> org.apache.lucene.index.DirectoryReader.fields(DirectoryReader.java:762)
>> at org.apache.lucene.misc.HighFreqTerms.main(HighFreqTerms.java:71)
>>
>> Tom Burton-West
>>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Bug in contrib/misc/HighFreqTerms.java?
Posted by Michael McCandless <lu...@mikemccandless.com>.
Ugh, I'll fix this.
With the new flex API, you can't ask a composite (Multi/DirReader) for
its postings -- you have to go through the static methods on
MultiFields. I'm trying to put some distance b/w IndexReader and
composite readers... because I'd like to eventually deprecate them.
Ie, the composite readers should "hold" an ordered collection of
sub-readers, but should not themselves implement IndexReader's API, I
think.
Thanks for raising this Tom,
Mike
On Wed, Apr 14, 2010 at 2:14 PM, Burton-West, Tom <tb...@umich.edu> wrote:
> When I try to run HighFreqTerms.java in Lucene Revision: 933722 I get the
> the exception appended below. I believe the line of code involved is a
> result of the flex indexing merge. Should I post this as a comment to
> LUCENE-2370 (Reintegrate flex branch into trunk)?
>
> Or is there simply something wrong with my configuration?
>
> Exception in thread "main" java.lang.UnsupportedOperationException: please
> use MultiFields.getFields if you really need a top level Fields (NOTE that
> it's usually better to work per segment instead)
> at
> org.apache.lucene.index.DirectoryReader.fields(DirectoryReader.java:762)
> at org.apache.lucene.misc.HighFreqTerms.main(HighFreqTerms.java:71)
>
> Tom Burton-West
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org