You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by David Linde <da...@gmail.com> on 2010/12/02 03:54:22 UTC

a proof that every word is indexing properly

Has anyone figured out a way to logically prove that lucene indexes ever
word properly?

Our company has done alot of research into lucene, all of our IT department
is really impressed and excited about lucene *except* one of the older
search/indexing experts.
Who doesn't want to move to a new search engine, is there anyway to
logically prove, that lucene indexes every word properly?

One idea we considered is attempting to rebuild the source from the index,
but it seems like doing that would take a huge effort.

Any ideas or thoughts, would be very much appreciated.

Thanks in advance
David

Re: a proof that every word is indexing properly

Posted by Erick Erickson <er...@gmail.com>.
I'm really curious how you expert knows that the present system
"indexes every word properly". You can certainly test any scenario that
can be defined precisely via unit tests as Lance suggests.

Ask for *concrete* examples he's concerned with. Write tests to show that
each
example works. Ask for more. Do NOT accept requiring "proof" that's not
demonstrable by tests, that way madness lies. Besides, it's impossible.

Or just propose a pilot project. Underlying the whole discussion is that
using Lucene makes your business go faster/more profitably/leaner/whatever.
Find a pain point in the current system and volunteer to see if Lucene can
make that pain go away. Then enlist the business side of the company saying
"we can deliver on XYZ better than the old system, should we continue?"

Really, you have to show that you can demonstrate delivering
*business value* with Lucene that isn't being delivered currently. Otherwise
this is just one of those endless technical discussions that has no value
to your company. And if you can't deliver business value, you have no reason
to use Lucene. Or any other new technology for that matter. Once you
demonstrate business value, rational decisions can be made whether the
development
effort is worth the cost. It's really, really effective to take something
that has been
"too hard to do" with the old system and deliver it in, say, three weeks
with new
technology. *That* demonstration takes most of the wind out of abstract
"issues".

Pilot projects are especially attractive to management because they can
define the
cost. As in "You have 4 weeks to demonstrate this with a 3 person team".
They
also give you a much deeper insight into the technology which informs the
discussion.

Of course you should pick a pain point that you have some hope of solving
<G>.

Best
Erick

On Wed, Dec 1, 2010 at 9:54 PM, David Linde <da...@gmail.com> wrote:

> Has anyone figured out a way to logically prove that lucene indexes ever
> word properly?
>
> Our company has done alot of research into lucene, all of our IT department
> is really impressed and excited about lucene *except* one of the older
> search/indexing experts.
> Who doesn't want to move to a new search engine, is there anyway to
> logically prove, that lucene indexes every word properly?
>
> One idea we considered is attempting to rebuild the source from the index,
> but it seems like doing that would take a huge effort.
>
> Any ideas or thoughts, would be very much appreciated.
>
> Thanks in advance
> David
>

Re: a proof that every word is indexing properly

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Thu, 2010-12-02 at 03:54 +0100, David Linde wrote:
> Has anyone figured out a way to logically prove that lucene indexes ever
> word properly?

The "Precision and recall in lucene"-thread seems relevant here.

> Our company has done alot of research into lucene, all of our IT department
> is really impressed and excited about lucene *except* one of the older
> search/indexing experts.
> Who doesn't want to move to a new search engine, is there anyway to
> logically prove, that lucene indexes every word properly?

That is a straw man argument. As the precision-thread shows, it is
extremely hard to define what "properly" means in relation to a
non-trivial retrieval system like Lucene.

If your grouch is an old-school database man, he might equal "properly"
to "Every word that exists in the source should be indexed so that a
search for that word will return all documents that contains it and no
other documents (phew)". As David implies, this is a bad test: It
satisfies the guy but does not a proper search system make.

But I'm just guessing here and it sounds like you're doing the same,
asking for ideas of proving Lucene functionality. Maybe you could turn
it around? Ask the guy what it would take for him to accept Lucene or
any other option for that matter. When you have that, you can discuss
whether his requirements are valid or not.

> One idea we considered is attempting to rebuild the source from the index,
> but it seems like doing that would take a huge effort.

It is also not possible in general. Writing specific code, you could
just cheat massively and store everything, giving you instant and 100%
correct rebuild.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: a proof that every word is indexing properly

Posted by Lance Norskog <go...@gmail.com>.
This is what unit tests are for.

On Wed, Dec 1, 2010 at 6:57 PM, David Fertig <df...@cymfony.com> wrote:
> Stop words are never indexed; you may need to empty your stop list.
>
> Luke (open-source w/code available) can browse and re-create documents
> in indexes using their terms already.  Compare that to the original to
> see if you are satisfied.
>
>
> -----Original Message-----
> From: David Linde [mailto:davidlinde@gmail.com]
> Sent: Wednesday, December 1, 2010 9:54 PM
> To: java-user@lucene.apache.org
> Subject: a proof that every word is indexing properly
>
> Has anyone figured out a way to logically prove that lucene indexes ever
> word properly?
>
> Our company has done alot of research into lucene, all of our IT
> department
> is really impressed and excited about lucene *except* one of the older
> search/indexing experts.
> Who doesn't want to move to a new search engine, is there anyway to
> logically prove, that lucene indexes every word properly?
>
> One idea we considered is attempting to rebuild the source from the
> index,
> but it seems like doing that would take a huge effort.
>
> Any ideas or thoughts, would be very much appreciated.
>
> Thanks in advance
> David
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>



-- 
Lance Norskog
goksron@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: a proof that every word is indexing properly

Posted by David Fertig <df...@cymfony.com>.
Stop words are never indexed; you may need to empty your stop list.

Luke (open-source w/code available) can browse and re-create documents
in indexes using their terms already.  Compare that to the original to
see if you are satisfied.


-----Original Message-----
From: David Linde [mailto:davidlinde@gmail.com] 
Sent: Wednesday, December 1, 2010 9:54 PM
To: java-user@lucene.apache.org
Subject: a proof that every word is indexing properly

Has anyone figured out a way to logically prove that lucene indexes ever
word properly?

Our company has done alot of research into lucene, all of our IT
department
is really impressed and excited about lucene *except* one of the older
search/indexing experts.
Who doesn't want to move to a new search engine, is there anyway to
logically prove, that lucene indexes every word properly?

One idea we considered is attempting to rebuild the source from the
index,
but it seems like doing that would take a huge effort.

Any ideas or thoughts, would be very much appreciated.

Thanks in advance
David

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org