You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Luis A Lastras <la...@us.ibm.com> on 2015/01/25 03:34:29 UTC

Absolute term position in scoring


Is it possible to incorporate in Lucene's scoring function the position of
a matching term (say as measured from the top of the document). The
scenario is, if the set of documents tend to lk about the most important
stuff at the beginning of the document, then we would like to give
preference to documents that mention a term close to the top.

Thanks,

Luis

                                                                               
                                                                               
                                                                               
  Luis A Lastras, Ph.D.                                                        
  Research Staff Member                                                        
  & Manager, Concept                                                           
  Analytics, IBM Watson                                                        
  Member of the iBM                                                            
  Academy of Technology                                                        
  IBM Master Inventor                                                          
  email:                                                                       
  lastrasl@us.ibm.com |                                                        
  Tel: 914-945-3613 |                                                          
  Cell: 914-382-1879                                                           
  address:  1101                                                               
  Kitchawan Rd, Office                                                         
  28-132, Yorktown                                                             
  Heights, NY, 10598

RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

Posted by Piotr Idzikowski <pi...@gmail.com>.

Hello.
My question was general.  As in this thread G1 garbage collector was
discussed.
So lucene wiki says to not use it. But on the other side solr wiki says
that it is ok.
But solr is using lucene.
So the wuestion was who is right?

Regards
On 6 Feb 2015 18:12, "McKinley, James T" <ja...@cengage.com> wrote:

> Just to be clear in case there was any confusion about my previous message
> regarding G1GC, we do not use Solr, my team works on a proprietary
> Lucene-based search engine.  Consequently, I can't really give any advice
> regarding Solr with G1GC, but for our uses (so far anyway), G1GC seems to
> work well with Lucene.
>
> Jim
> ________________________________________
> From: Piotr Idzikowski [piotridzikowski@gmail.com]
> Sent: Friday, February 06, 2015 5:35 AM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>
> Hello.
> A little bit delayed question. But recently I have found this articles:
> https://wiki.apache.org/solr/SolrPerformanceProblems
> https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
>
> Especially this part from first url:
> *Using the ConcurrentMarkSweep (CMS) collector with tuning parameters is a
> very good option for for Solr, but with the latest Java 7 releases (7u72 at
> the time of this writing), G1 is looking like a better option, if the
> -XX:+ParallelRefProcEnabled option is used.*
>
> How does it play with *"Do not, under any circumstances, run Lucene with
> the G1 garbage collector."*
> from https://wiki.apache.org/lucene-java/JavaBugs?
>
> Regards
> Piotr
>
> On Tue, Jan 27, 2015 at 9:55 PM, McKinley, James T <
> james.mckinley@cengage.com> wrote:
>
> > Hi Uwe,
> >
> > OK, thanks for the info.  We'll see if we can download the Lucene test
> > suite and check it out.
> >
> > FWIW, we use G1GC in our production runtime (~70 12-16 core Cisco UCS and
> > HP Gen7/Gen8 nodes with 20+ GB heaps using Java 7 and Lucene 4.8.1 with
> > pairs of 30 index partitions with 15M-23M docs each) and have not
> > experienced any VM crashes (well, maybe a couple, but not directly
> > traceable to G1 to my knowledge).  We have found some undocumented pauses
> > in G1 due to very large object arrays and filed a bug report which was
> > confirmed and also affects CMS (we worked around this in our code using
> > memory mapping of some files whose contents we previously held all in
> > RAM).  I think the only index corruption we've ever seen was in our index
> > creation workflow (~30 HP Gen7 nodes with 27GB heaps) but this was using
> > Parallel GC since it is a batch system, so that corruption (which we've
> not
> > seen recently and never found a cause for) was definitely not due to
> G1GC.
> >
> > G1GC has bugs as does CMS but we've found it to work pretty well so far
> in
> > our runtime system.  Of course YMMV, thanks again for the info.
> >
> > Jim
> > ________________________________________
> > From: Uwe Schindler [uwe@thetaphi.de]
> > Sent: Tuesday, January 27, 2015 3:02 PM
> > To: java-user@lucene.apache.org
> > Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
> >
> > Hi.,
> >
> > About G1GC. We consistently see problems when running the Lucene
> Testsuite
> > with G1GC enabled. The people from Elasticsearch concluded:
> >
> > "There is a newer GC called the Garbage First GC (G1GC). This newer GC is
> > designed to minimize pausing even more than CMS, and operate on large
> > heaps. It works by dividing the heap into regions and predicting which
> > regions contain the most reclaimable space. By collecting those regions
> > first (garbage first), it can minimize pauses and operate on very large
> > heaps.
> >
> > Sounds great! Unfortunately, G1GC is still new, and fresh bugs are found
> > routinely. These bugs are usually of the segfault variety, and will cause
> > hard crashes. The Lucene test suite is brutal on GC algorithms, and it
> > seems that G1GC hasn’t had the kinks worked out yet.
> >
> > We would like to recommend G1GC someday, but for now, it is simply not
> > stable enough to meet the demands of Elasticsearch and Lucene."
> > (
> >
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_don_8217_t_touch_these_settings.html
> > )
> >
> > In fact, the problems with G1GC can sometimes lead to index corruption,
> > and are hard to reproduce. So better don't use...
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> > > -----Original Message-----
> > > From: McKinley, James T [mailto:james.mckinley@cengage.com]
> > > Sent: Tuesday, January 27, 2015 8:58 PM
> > > To: java-user@lucene.apache.org
> > > Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
> > >
> > > Why do you say not to use G1GC?  We are using Java 7 & G1GC with Lucene
> > > 4.8.1 in production.  Thanks.
> > >
> > > Jim
> > > ________________________________________
> > > From: Uwe Schindler [uwe@thetaphi.de]
> > > Sent: Tuesday, January 27, 2015 2:49 PM
> > > To: java-user@lucene.apache.org; 'kiwi clive'
> > > Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
> > >
> > > Java 8 update 20 or later is also fine. At current time, always use
> > latest update
> > > release and you are be fine with Java 7 and Java 8. Don't use older
> > releases
> > > and don't use G1 Garbage Collector.
> > >
> > > -----
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > http://www.thetaphi.de
> > > eMail: uwe@thetaphi.de
> > >
> > >
> > > > -----Original Message-----
> > > > From: kiwi clive [mailto:kiwi_clive@yahoo.com.INVALID]
> > > > Sent: Tuesday, January 27, 2015 8:03 PM
> > > > To: java-user@lucene.apache.org
> > > > Subject: Re: Lucene Version Upgrade (3->4) and Java JVM
> Versions(6->8)
> > > >
> > > > Hi Hoss,
> > > > Many thanks for the information. This looks very encouraging as the
> > > > Java7 bug I remember  was fixed and as far as I know, we should not
> be
> > > > affected by the others.
> > > > I'll put a few tests together and put my toe in the water :-) Clive
> > > >
> > > >       From: Chris Hostetter <ho...@fucit.org>
> > > >  To: "java-user@lucene.apache.org" <ja...@lucene.apache.org>;
> kiwi
> > > > clive <ki...@yahoo.com>
> > > >  Sent: Tuesday, January 27, 2015 4:01 PM
> > > >  Subject: Re: Lucene Version Upgrade (3->4) and Java JVM
> > > > Versions(6->8)
> > > >
> > > >
> > > >
> > > >
> > > > : I seem to remember reading that certain versions of lucene were
> > > > : incompatible with some java versions although I cannot find
> anything
> > > > to
> > > > : verify this. As we have tens of thousands of large indexes,
> > > > backwards
> > > > : compatibility without the need to reindex on an upgrade is of prime
> > > > : importance to us.
> > > >
> > > > All known JVM bugs affecting Lucene are listed here...
> > > >
> > > > https://wiki.apache.org/lucene-java/JavaBugs
> > > >
> > > >
> > > > -Hoss
> > > > http://www.lucidworks.com/
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>