You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Erick Erickson <er...@gmail.com> on 2009/04/23 03:33:36 UTC

Greetings and questions about patches

Hi all:

I've been participating in the user list for some time, and I'd like
to start helping maintain/enhance the code. So I thought I'd start
with something small, mostly to get the process down. Unit tests
sure fit the bill it seems to me, less chance of introducing errors
through ignorance but a fine way to extend *my* understanding
of Lucene.

I managed to check out the code and run the unit tests, which
was amazingly easy. I even managed to get the project into
IntelliJ and connect the codestyle.xml file. Kudos for whoever
set up the checkout/build process, I was dreading spending
days setting this up, fortunately I didn't have to.

So I, with Chris's help, found the code coverage report and
chose something pretty straightforward to test, BitUtil since it
was nice and self-contained. As I said, I'm looking at understanding
the process rather than adding much value the first time.

Alas, even something as simple as BitUtil generates questions
that I'm asking mostly to understand what approach the veterans
prefer. I'll argue with y'all next year sometime <G>.

So, according to the coverage report, there are two methods that
are never executed by the unit tests (actually 4, 2 that operate on
ints and 2 that operate on longs), isPowerOfTwo and
nextHighestPowerOfTwo. nextHighestPowerOfTwo is especially
clever, had to get out a paper and pencil to really understand it.

Issues:
1> none of these methods is ever called. I commented them out
     and ran all the unit tests and all is well. Additionally, commenting
     out one of the other methods produces compile-time errors so I'm
    fairly sure I didn't do something completely stupid that just *looked*
    like it was OK. I grepped recursively and they're nowhere in the
    *.java files.
  1a> What's the consensus about unused code? Take it out (my
         preference) along with leaving a comment on where it can
         be found (since it *is* clever code)? Leave it in because someone
         found some pretty neat algorithms that we may need sometime?
  1b> I'm not entirely sure about the contrib area, but the contrib jars
         are all new so I assume "ant clean test" compiles them as well.

2> I don't agree with the behavior of nextHighestPowerOfTwo. Should
     I make changes if we decide to keep it?
  2a> Why should it return the parameter passed in when it happens to be
        a perfect power of two? e.g. this passes:
       assertEquals(BitUtil.nextHighestPowerOfTwo(128L), 128);
       I'd expect this to actually return 256, given the name.
2b> Why should it ever return 0? There's no power of two that is
       zero. e.g. this passes:
       assertEquals(BitUtil.nextHighestPowerOfTwo(-1), 0);
       as does this: assertEquals(BitUtil.nextHighestPowerOfTwo(0), 0).
       *Assuming* that someone wants to use this sometime to, say, size
        an array they'd have to test against a return of 0.


I'm fully aware that these are trivial issues in the grand scheme of things,

and I *really* don't want to waste much time hashing them over. I'll provide

a patch either way and go on to something slightly more complicated for
my next trick.

Best
Erick

Re: Greetings and questions about patches

Posted by Erick Erickson <er...@gmail.com>.

Thanks all. Despite my aesthetic preference for removing unused code,
I'm *really* not in favor of causing extra work (for myself or others) to
satisfy it <G>.. Especially when there's reasonable expectations that the
code in question *will* be used in the foreseeable future.....

Ok, I'll leave the code in place as-is and provide a patch with unit tests
sometime real soon now.

Erick

On Thu, Apr 23, 2009 at 6:15 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Welcome Erick!
>
> Because nextHighestPowerOfTwo methods are public, I think we cannot
> change what they return, nor remove them.  At most we could deprecate
> them now (and remove in 3.0), though I think it's fine to simply keep
> them around even though nothing inside Lucene uses them today: since
> we are heavy users of BitSet/Vector/Array/etc., it seems possible
> we'll need them at some point.
>
> EG we are looking to make a better data structure to share
> nearly-identical deleted doc bitsets between near realtime readers,
> which could conceivably use advanced BitUtil methods.
>
> Keep hacking ;)
>
> Mike
>
> On Wed, Apr 22, 2009 at 9:33 PM, Erick Erickson <er...@gmail.com>
> wrote:
> > Hi all:
> >
> > I've been participating in the user list for some time, and I'd like
> > to start helping maintain/enhance the code. So I thought I'd start
> > with something small, mostly to get the process down. Unit tests
> > sure fit the bill it seems to me, less chance of introducing errors
> > through ignorance but a fine way to extend *my* understanding
> > of Lucene.
> >
> > I managed to check out the code and run the unit tests, which
> > was amazingly easy. I even managed to get the project into
> > IntelliJ and connect the codestyle.xml file. Kudos for whoever
> > set up the checkout/build process, I was dreading spending
> > days setting this up, fortunately I didn't have to.
> >
> > So I, with Chris's help, found the code coverage report and
> > chose something pretty straightforward to test, BitUtil since it
> > was nice and self-contained. As I said, I'm looking at understanding
> > the process rather than adding much value the first time.
> >
> > Alas, even something as simple as BitUtil generates questions
> > that I'm asking mostly to understand what approach the veterans
> > prefer. I'll argue with y'all next year sometime <G>.
> >
> > So, according to the coverage report, there are two methods that
> > are never executed by the unit tests (actually 4, 2 that operate on
> > ints and 2 that operate on longs), isPowerOfTwo and
> > nextHighestPowerOfTwo. nextHighestPowerOfTwo is especially
> > clever, had to get out a paper and pencil to really understand it.
> >
> > Issues:
> > 1> none of these methods is ever called. I commented them out
> >      and ran all the unit tests and all is well. Additionally, commenting
> >      out one of the other methods produces compile-time errors so I'm
> >     fairly sure I didn't do something completely stupid that just
> *looked*
> >     like it was OK. I grepped recursively and they're nowhere in the
> >     *.java files.
> >   1a> What's the consensus about unused code? Take it out (my
> >          preference) along with leaving a comment on where it can
> >          be found (since it *is* clever code)? Leave it in because
> someone
> >          found some pretty neat algorithms that we may need sometime?
> >   1b> I'm not entirely sure about the contrib area, but the contrib jars
> >          are all new so I assume "ant clean test" compiles them as well.
> >
> > 2> I don't agree with the behavior of nextHighestPowerOfTwo. Should
> >      I make changes if we decide to keep it?
> >   2a> Why should it return the parameter passed in when it happens to be
> >         a perfect power of two? e.g. this passes:
> >        assertEquals(BitUtil.nextHighestPowerOfTwo(128L), 128);
> >        I'd expect this to actually return 256, given the name.
> > 2b> Why should it ever return 0? There's no power of two that is
> >        zero. e.g. this passes:
> >        assertEquals(BitUtil.nextHighestPowerOfTwo(-1), 0);
> >        as does this: assertEquals(BitUtil.nextHighestPowerOfTwo(0), 0).
> >        *Assuming* that someone wants to use this sometime to, say, size
> >         an array they'd have to test against a return of 0.
> >
> >
> > I'm fully aware that these are trivial issues in the grand scheme of
> things,
> > and I *really* don't want to waste much time hashing them over. I'll
> provide
> > a patch either way and go on to something slightly more complicated for
> > my next trick.
> >
> > Best
> > Erick
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Greetings and questions about patches

Posted by Michael McCandless <lu...@mikemccandless.com>.

Welcome Erick!

Because nextHighestPowerOfTwo methods are public, I think we cannot
change what they return, nor remove them.  At most we could deprecate
them now (and remove in 3.0), though I think it's fine to simply keep
them around even though nothing inside Lucene uses them today: since
we are heavy users of BitSet/Vector/Array/etc., it seems possible
we'll need them at some point.

EG we are looking to make a better data structure to share
nearly-identical deleted doc bitsets between near realtime readers,
which could conceivably use advanced BitUtil methods.

Keep hacking ;)

Mike

On Wed, Apr 22, 2009 at 9:33 PM, Erick Erickson <er...@gmail.com> wrote:
> Hi all:
>
> I've been participating in the user list for some time, and I'd like
> to start helping maintain/enhance the code. So I thought I'd start
> with something small, mostly to get the process down. Unit tests
> sure fit the bill it seems to me, less chance of introducing errors
> through ignorance but a fine way to extend *my* understanding
> of Lucene.
>
> I managed to check out the code and run the unit tests, which
> was amazingly easy. I even managed to get the project into
> IntelliJ and connect the codestyle.xml file. Kudos for whoever
> set up the checkout/build process, I was dreading spending
> days setting this up, fortunately I didn't have to.
>
> So I, with Chris's help, found the code coverage report and
> chose something pretty straightforward to test, BitUtil since it
> was nice and self-contained. As I said, I'm looking at understanding
> the process rather than adding much value the first time.
>
> Alas, even something as simple as BitUtil generates questions
> that I'm asking mostly to understand what approach the veterans
> prefer. I'll argue with y'all next year sometime <G>.
>
> So, according to the coverage report, there are two methods that
> are never executed by the unit tests (actually 4, 2 that operate on
> ints and 2 that operate on longs), isPowerOfTwo and
> nextHighestPowerOfTwo. nextHighestPowerOfTwo is especially
> clever, had to get out a paper and pencil to really understand it.
>
> Issues:
> 1> none of these methods is ever called. I commented them out
>      and ran all the unit tests and all is well. Additionally, commenting
>      out one of the other methods produces compile-time errors so I'm
>     fairly sure I didn't do something completely stupid that just *looked*
>     like it was OK. I grepped recursively and they're nowhere in the
>     *.java files.
>   1a> What's the consensus about unused code? Take it out (my
>          preference) along with leaving a comment on where it can
>          be found (since it *is* clever code)? Leave it in because someone
>          found some pretty neat algorithms that we may need sometime?
>   1b> I'm not entirely sure about the contrib area, but the contrib jars
>          are all new so I assume "ant clean test" compiles them as well.
>
> 2> I don't agree with the behavior of nextHighestPowerOfTwo. Should
>      I make changes if we decide to keep it?
>   2a> Why should it return the parameter passed in when it happens to be
>         a perfect power of two? e.g. this passes:
>        assertEquals(BitUtil.nextHighestPowerOfTwo(128L), 128);
>        I'd expect this to actually return 256, given the name.
> 2b> Why should it ever return 0? There's no power of two that is
>        zero. e.g. this passes:
>        assertEquals(BitUtil.nextHighestPowerOfTwo(-1), 0);
>        as does this: assertEquals(BitUtil.nextHighestPowerOfTwo(0), 0).
>        *Assuming* that someone wants to use this sometime to, say, size
>         an array they'd have to test against a return of 0.
>
>
> I'm fully aware that these are trivial issues in the grand scheme of things,
> and I *really* don't want to waste much time hashing them over. I'll provide
> a patch either way and go on to something slightly more complicated for
> my next trick.
>
> Best
> Erick
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Greetings and questions about patches

Posted by Chris Miller <ch...@kbcfp.com>.

> Issues:
> 1> none of these methods is ever called.

Note that Yonik's suggested patch for LUCENE-1607 contains the following 
code:

+  public SimpleStringInterner(int sz) {
+    cache = new String[BitUtil.nextHighestPowerOfTwo(sz)];
+  }

...so the int flavour of nextHighestPowerOfTwo() might be in use shortly! :-)





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Greetings and questions about patches

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Wed, Apr 22, 2009 at 9:33 PM, Erick Erickson <er...@gmail.com> wrote:
> So, according to the coverage report, there are two methods that
> are never executed by the unit tests (actually 4, 2 that operate on
> ints and 2 that operate on longs), isPowerOfTwo and
> nextHighestPowerOfTwo. nextHighestPowerOfTwo is especially
> clever, had to get out a paper and pencil to really understand it.
>
> Issues:
> 1> none of these methods is ever called.

OpenBitset, BitUtil, etc, were developed in Solr and later moved to
Lucene.  Solr does use nextHighestPowerOfTwo().
isPowerOfTwo() I had used at some point as a short circuit to check if
something was already a power of two already... probably only worth it
for programmer constants that are supposed to be a power of two (i.e.
the short circuit would be worth it).

>   1a> What's the consensus about unused code? Take it out (my
>          preference)

That's fine with me... I don't mind moving stuff that's only used by
solr back to solr.

> 2> I don't agree with the behavior of nextHighestPowerOfTwo. Should
>      I make changes if we decide to keep it?
>   2a> Why should it return the parameter passed in when it happens to be
>         a perfect power of two? e.g. this passes:
>        assertEquals(BitUtil.nextHighestPowerOfTwo(128L), 128);
>        I'd expect this to actually return 256, given the name.

It's really to ensure that you have a power of two (for fast
hashtables).  Perhaps the name should be changed rather than the
functionality?

> 2b> Why should it ever return 0? There's no power of two that is
>        zero. e.g. this passes:
>        assertEquals(BitUtil.nextHighestPowerOfTwo(-1), 0);
>        as does this: assertEquals(BitUtil.nextHighestPowerOfTwo(0), 0).
>        *Assuming* that someone wants to use this sometime to, say, size
>         an array they'd have to test against a return of 0.

An array of size zero would actually work to store 0 elements though.
And it's a utility function that should be kept as fast as possible -
a zero check would add a branch.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org