You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-dev@lucene.apache.org by Yonik Seeley <yo...@lucidimagination.com> on 2009/05/27 22:58:39 UTC

update Lucene

I think we should upgrade Lucene again since the index file format has changed:
https://issues.apache.org/jira/browse/LUCENE-1654

This also contains a fix for unifying the FieldCache and
ExtendedFieldCache instances.

$ svn diff -r r776177 CHANGES.txt
Index: CHANGES.txt
===================================================================
--- CHANGES.txt	(revision 776177)
+++ CHANGES.txt	(working copy)
@@ -27,7 +27,11 @@
     implement Searchable or extend Searcher, you should change you
     code to implement this method.  If you already extend
     IndexSearcher, no further changes are needed to use Collector.
-    (Shai Erera via Mike McCandless)
+
+    Finally, the values Float.Nan, Float.NEGATIVE_INFINITY and
+    Float.POSITIVE_INFINITY are not valid scores.  Lucene uses these
+    values internally in certain places, so if you have hits with such
+    scores it will cause problems. (Shai Erera via Mike McCandless)

 Changes in runtime behavior

@@ -107,10 +111,10 @@
    that's visited.  All core collectors now use this API.  (Mark
    Miller, Mike McCandless)

-8. LUCENE-1546: Add IndexReader.flush(String commitUserData), allowing
-   you to record an opaque commitUserData into the commit written by
-   IndexReader.  This matches IndexWriter's commit methods.  (Jason
-   Rutherglen via Mike McCandless)
+8. LUCENE-1546: Add IndexReader.flush(Map commitUserData), allowing
+   you to record an opaque commitUserData (maps String -> String) into
+   the commit written by IndexReader.  This matches IndexWriter's
+   commit methods.  (Jason Rutherglen via Mike McCandless)

 9. LUCENE-652: Added org.apache.lucene.document.CompressionTools, to
    enable compressing & decompressing binary content, external to
@@ -135,6 +139,9 @@
     not make sense for all subclasses of MultiTermQuery. Check individual
     subclasses to see if they support #getTerm().  (Mark Miller)

+14. LUCENE-1636: Make TokenFilter.input final so it's set only
+    once. (Wouter Heijke, Uwe Schindler via Mike McCandless).
+
 Bug fixes

 1. LUCENE-1415: MultiPhraseQuery has incorrect hashCode() and equals()
@@ -176,6 +183,9 @@
    sort) by doc Id in a consistent manner (i.e., if Sort.FIELD_DOC
was used vs.
    when it wasn't). (Shai Erera via Michael McCandless)

+10. LUCENE-1647: Fix case where IndexReader.undeleteAll would cause
+    the segment's deletion count to be incorrect. (Mike McCandless)
+
  New features

  1. LUCENE-1411: Added expert API to open an IndexWriter on a prior
@@ -186,10 +196,11 @@
     when building transactional support on top of Lucene.  (Mike
     McCandless)

- 2. LUCENE-1382: Add an optional arbitrary String "commitUserData" to
-    IndexWriter.commit(), which is stored in the segments file and is
-    then retrievable via IndexReader.getCommitUserData instance and
-    static methods.  (Shalin Shekhar Mangar via Mike McCandless)
+ 2. LUCENE-1382: Add an optional arbitrary Map (String -> String)
+    "commitUserData" to IndexWriter.commit(), which is stored in the
+    segments file and is then retrievable via
+    IndexReader.getCommitUserData instance and static methods.
+    (Shalin Shekhar Mangar via Mike McCandless)

  3. LUCENE-1406: Added Arabic analyzer.  (Robert Muir via Grant Ingersoll)

@@ -311,6 +322,10 @@
 25. LUCENE-1634: Add calibrateSizeByDeletes to LogMergePolicy, to take
     deletions into account when considering merges.  (Yasuhiro Matsuda
     via Mike McCandless)
+
+26. LUCENE-1550: Added new n-gram based String distance measure for
spell checking.
+    See the Javadocs for NGramDistance.java for a reference paper on
why this is helpful (Tom Morton via Grant Ingersoll)
+

 Optimizations


-Yonik
http://www.lucidimagination.com

Re: update Lucene

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

On Thu, May 28, 2009 at 2:28 AM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> I think we should upgrade Lucene again since the index file format has
> changed:
> https://issues.apache.org/jira/browse/LUCENE-1654
>
>
+1

-- 
Regards,
Shalin Shekhar Mangar.

Re: update Lucene

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Clearly I meant "...along with *Lucene* jars" :)

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Otis Gospodnetic <ot...@yahoo.com>
> To: solr-dev@lucene.apache.org
> Sent: Wednesday, May 27, 2009 11:59:18 PM
> Subject: Re: update Lucene
> 
> 
> I wonder if it would be useful to commit Lucene's CHANGES.txt into Solr along 
> with Solr jars.  It would then be very easy to tell what changed in Lucene since 
> the version Solr has and the current version of Lucene (or some newer released 
> version, if we were able to be behind).
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> ----- Original Message ----
> > From: Yonik Seeley 
> > To: solr-dev@lucene.apache.org
> > Sent: Wednesday, May 27, 2009 4:58:39 PM
> > Subject: update Lucene
> > 
> > I think we should upgrade Lucene again since the index file format has 
> changed:
> > https://issues.apache.org/jira/browse/LUCENE-1654
> > 
> > This also contains a fix for unifying the FieldCache and
> > ExtendedFieldCache instances.
> > 
> > $ svn diff -r r776177 CHANGES.txt
> > Index: CHANGES.txt
> > ===================================================================
> > --- CHANGES.txt    (revision 776177)
> > +++ CHANGES.txt    (working copy)
> > @@ -27,7 +27,11 @@
> >      implement Searchable or extend Searcher, you should change you
> >      code to implement this method.  If you already extend
> >      IndexSearcher, no further changes are needed to use Collector.
> > -    (Shai Erera via Mike McCandless)
> > +
> > +    Finally, the values Float.Nan, Float.NEGATIVE_INFINITY and
> > +    Float.POSITIVE_INFINITY are not valid scores.  Lucene uses these
> > +    values internally in certain places, so if you have hits with such
> > +    scores it will cause problems. (Shai Erera via Mike McCandless)
> > 
> > Changes in runtime behavior
> > 
> > @@ -107,10 +111,10 @@
> >     that's visited.  All core collectors now use this API.  (Mark
> >     Miller, Mike McCandless)
> > 
> > -8. LUCENE-1546: Add IndexReader.flush(String commitUserData), allowing
> > -   you to record an opaque commitUserData into the commit written by
> > -   IndexReader.  This matches IndexWriter's commit methods.  (Jason
> > -   Rutherglen via Mike McCandless)
> > +8. LUCENE-1546: Add IndexReader.flush(Map commitUserData), allowing
> > +   you to record an opaque commitUserData (maps String -> String) into
> > +   the commit written by IndexReader.  This matches IndexWriter's
> > +   commit methods.  (Jason Rutherglen via Mike McCandless)
> > 
> > 9. LUCENE-652: Added org.apache.lucene.document.CompressionTools, to
> >     enable compressing & decompressing binary content, external to
> > @@ -135,6 +139,9 @@
> >      not make sense for all subclasses of MultiTermQuery. Check individual
> >      subclasses to see if they support #getTerm().  (Mark Miller)
> > 
> > +14. LUCENE-1636: Make TokenFilter.input final so it's set only
> > +    once. (Wouter Heijke, Uwe Schindler via Mike McCandless).
> > +
> > Bug fixes
> > 
> > 1. LUCENE-1415: MultiPhraseQuery has incorrect hashCode() and equals()
> > @@ -176,6 +183,9 @@
> >     sort) by doc Id in a consistent manner (i.e., if Sort.FIELD_DOC
> > was used vs.
> >     when it wasn't). (Shai Erera via Michael McCandless)
> > 
> > +10. LUCENE-1647: Fix case where IndexReader.undeleteAll would cause
> > +    the segment's deletion count to be incorrect. (Mike McCandless)
> > +
> >   New features
> > 
> >   1. LUCENE-1411: Added expert API to open an IndexWriter on a prior
> > @@ -186,10 +196,11 @@
> >      when building transactional support on top of Lucene.  (Mike
> >      McCandless)
> > 
> > - 2. LUCENE-1382: Add an optional arbitrary String "commitUserData" to
> > -    IndexWriter.commit(), which is stored in the segments file and is
> > -    then retrievable via IndexReader.getCommitUserData instance and
> > -    static methods.  (Shalin Shekhar Mangar via Mike McCandless)
> > + 2. LUCENE-1382: Add an optional arbitrary Map (String -> String)
> > +    "commitUserData" to IndexWriter.commit(), which is stored in the
> > +    segments file and is then retrievable via
> > +    IndexReader.getCommitUserData instance and static methods.
> > +    (Shalin Shekhar Mangar via Mike McCandless)
> > 
> >   3. LUCENE-1406: Added Arabic analyzer.  (Robert Muir via Grant Ingersoll)
> > 
> > @@ -311,6 +322,10 @@
> > 25. LUCENE-1634: Add calibrateSizeByDeletes to LogMergePolicy, to take
> >      deletions into account when considering merges.  (Yasuhiro Matsuda
> >      via Mike McCandless)
> > +
> > +26. LUCENE-1550: Added new n-gram based String distance measure for
> > spell checking.
> > +    See the Javadocs for NGramDistance.java for a reference paper on
> > why this is helpful (Tom Morton via Grant Ingersoll)
> > +
> > 
> > Optimizations
> > 
> > 
> > -Yonik
> > http://www.lucidimagination.com

Re: update Lucene

Posted by Otis Gospodnetic <ot...@yahoo.com>.

I wonder if it would be useful to commit Lucene's CHANGES.txt into Solr along with Solr jars.  It would then be very easy to tell what changed in Lucene since the version Solr has and the current version of Lucene (or some newer released version, if we were able to be behind).

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Yonik Seeley <yo...@lucidimagination.com>
> To: solr-dev@lucene.apache.org
> Sent: Wednesday, May 27, 2009 4:58:39 PM
> Subject: update Lucene
> 
> I think we should upgrade Lucene again since the index file format has changed:
> https://issues.apache.org/jira/browse/LUCENE-1654
> 
> This also contains a fix for unifying the FieldCache and
> ExtendedFieldCache instances.
> 
> $ svn diff -r r776177 CHANGES.txt
> Index: CHANGES.txt
> ===================================================================
> --- CHANGES.txt    (revision 776177)
> +++ CHANGES.txt    (working copy)
> @@ -27,7 +27,11 @@
>      implement Searchable or extend Searcher, you should change you
>      code to implement this method.  If you already extend
>      IndexSearcher, no further changes are needed to use Collector.
> -    (Shai Erera via Mike McCandless)
> +
> +    Finally, the values Float.Nan, Float.NEGATIVE_INFINITY and
> +    Float.POSITIVE_INFINITY are not valid scores.  Lucene uses these
> +    values internally in certain places, so if you have hits with such
> +    scores it will cause problems. (Shai Erera via Mike McCandless)
> 
> Changes in runtime behavior
> 
> @@ -107,10 +111,10 @@
>     that's visited.  All core collectors now use this API.  (Mark
>     Miller, Mike McCandless)
> 
> -8. LUCENE-1546: Add IndexReader.flush(String commitUserData), allowing
> -   you to record an opaque commitUserData into the commit written by
> -   IndexReader.  This matches IndexWriter's commit methods.  (Jason
> -   Rutherglen via Mike McCandless)
> +8. LUCENE-1546: Add IndexReader.flush(Map commitUserData), allowing
> +   you to record an opaque commitUserData (maps String -> String) into
> +   the commit written by IndexReader.  This matches IndexWriter's
> +   commit methods.  (Jason Rutherglen via Mike McCandless)
> 
> 9. LUCENE-652: Added org.apache.lucene.document.CompressionTools, to
>     enable compressing & decompressing binary content, external to
> @@ -135,6 +139,9 @@
>      not make sense for all subclasses of MultiTermQuery. Check individual
>      subclasses to see if they support #getTerm().  (Mark Miller)
> 
> +14. LUCENE-1636: Make TokenFilter.input final so it's set only
> +    once. (Wouter Heijke, Uwe Schindler via Mike McCandless).
> +
> Bug fixes
> 
> 1. LUCENE-1415: MultiPhraseQuery has incorrect hashCode() and equals()
> @@ -176,6 +183,9 @@
>     sort) by doc Id in a consistent manner (i.e., if Sort.FIELD_DOC
> was used vs.
>     when it wasn't). (Shai Erera via Michael McCandless)
> 
> +10. LUCENE-1647: Fix case where IndexReader.undeleteAll would cause
> +    the segment's deletion count to be incorrect. (Mike McCandless)
> +
>   New features
> 
>   1. LUCENE-1411: Added expert API to open an IndexWriter on a prior
> @@ -186,10 +196,11 @@
>      when building transactional support on top of Lucene.  (Mike
>      McCandless)
> 
> - 2. LUCENE-1382: Add an optional arbitrary String "commitUserData" to
> -    IndexWriter.commit(), which is stored in the segments file and is
> -    then retrievable via IndexReader.getCommitUserData instance and
> -    static methods.  (Shalin Shekhar Mangar via Mike McCandless)
> + 2. LUCENE-1382: Add an optional arbitrary Map (String -> String)
> +    "commitUserData" to IndexWriter.commit(), which is stored in the
> +    segments file and is then retrievable via
> +    IndexReader.getCommitUserData instance and static methods.
> +    (Shalin Shekhar Mangar via Mike McCandless)
> 
>   3. LUCENE-1406: Added Arabic analyzer.  (Robert Muir via Grant Ingersoll)
> 
> @@ -311,6 +322,10 @@
> 25. LUCENE-1634: Add calibrateSizeByDeletes to LogMergePolicy, to take
>      deletions into account when considering merges.  (Yasuhiro Matsuda
>      via Mike McCandless)
> +
> +26. LUCENE-1550: Added new n-gram based String distance measure for
> spell checking.
> +    See the Javadocs for NGramDistance.java for a reference paper on
> why this is helpful (Tom Morton via Grant Ingersoll)
> +
> 
> Optimizations
> 
> 
> -Yonik
> http://www.lucidimagination.com