You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Luis A Lastras <la...@us.ibm.com> on 2015/01/25 03:34:29 UTC

Absolute term position in scoring


Is it possible to incorporate in Lucene's scoring function the position of
a matching term (say as measured from the top of the document). The
scenario is, if the set of documents tend to lk about the most important
stuff at the beginning of the document, then we would like to give
preference to documents that mention a term close to the top.

Thanks,

Luis

                                                                               
                                                                               
                                                                               
  Luis A Lastras, Ph.D.                                                        
  Research Staff Member                                                        
  & Manager, Concept                                                           
  Analytics, IBM Watson                                                        
  Member of the iBM                                                            
  Academy of Technology                                                        
  IBM Master Inventor                                                          
  email:                                                                       
  lastrasl@us.ibm.com |                                                        
  Tel: 914-945-3613 |                                                          
  Cell: 914-382-1879                                                           
  address:  1101                                                               
  Kitchawan Rd, Office                                                         
  28-132, Yorktown                                                             
  Heights, NY, 10598                                                           
                                                                               
                                                                               
                                                                               
                                                                               
                                                                               


RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

Posted by Piotr Idzikowski <pi...@gmail.com>.
Hello.
My question was general.  As in this thread G1 garbage collector was
discussed.
So lucene wiki says to not use it. But on the other side solr wiki says
that it is ok.
But solr is using lucene.
So the wuestion was who is right?

Regards
On 6 Feb 2015 18:12, "McKinley, James T" <ja...@cengage.com> wrote:

> Just to be clear in case there was any confusion about my previous message
> regarding G1GC, we do not use Solr, my team works on a proprietary
> Lucene-based search engine.  Consequently, I can't really give any advice
> regarding Solr with G1GC, but for our uses (so far anyway), G1GC seems to
> work well with Lucene.
>
> Jim
> ________________________________________
> From: Piotr Idzikowski [piotridzikowski@gmail.com]
> Sent: Friday, February 06, 2015 5:35 AM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>
> Hello.
> A little bit delayed question. But recently I have found this articles:
> https://wiki.apache.org/solr/SolrPerformanceProblems
> https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
>
> Especially this part from first url:
> *Using the ConcurrentMarkSweep (CMS) collector with tuning parameters is a
> very good option for for Solr, but with the latest Java 7 releases (7u72 at
> the time of this writing), G1 is looking like a better option, if the
> -XX:+ParallelRefProcEnabled option is used.*
>
> How does it play with *"Do not, under any circumstances, run Lucene with
> the G1 garbage collector."*
> from https://wiki.apache.org/lucene-java/JavaBugs?
>
> Regards
> Piotr
>
> On Tue, Jan 27, 2015 at 9:55 PM, McKinley, James T <
> james.mckinley@cengage.com> wrote:
>
> > Hi Uwe,
> >
> > OK, thanks for the info.  We'll see if we can download the Lucene test
> > suite and check it out.
> >
> > FWIW, we use G1GC in our production runtime (~70 12-16 core Cisco UCS and
> > HP Gen7/Gen8 nodes with 20+ GB heaps using Java 7 and Lucene 4.8.1 with
> > pairs of 30 index partitions with 15M-23M docs each) and have not
> > experienced any VM crashes (well, maybe a couple, but not directly
> > traceable to G1 to my knowledge).  We have found some undocumented pauses
> > in G1 due to very large object arrays and filed a bug report which was
> > confirmed and also affects CMS (we worked around this in our code using
> > memory mapping of some files whose contents we previously held all in
> > RAM).  I think the only index corruption we've ever seen was in our index
> > creation workflow (~30 HP Gen7 nodes with 27GB heaps) but this was using
> > Parallel GC since it is a batch system, so that corruption (which we've
> not
> > seen recently and never found a cause for) was definitely not due to
> G1GC.
> >
> > G1GC has bugs as does CMS but we've found it to work pretty well so far
> in
> > our runtime system.  Of course YMMV, thanks again for the info.
> >
> > Jim
> > ________________________________________
> > From: Uwe Schindler [uwe@thetaphi.de]
> > Sent: Tuesday, January 27, 2015 3:02 PM
> > To: java-user@lucene.apache.org
> > Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
> >
> > Hi.,
> >
> > About G1GC. We consistently see problems when running the Lucene
> Testsuite
> > with G1GC enabled. The people from Elasticsearch concluded:
> >
> > "There is a newer GC called the Garbage First GC (G1GC). This newer GC is
> > designed to minimize pausing even more than CMS, and operate on large
> > heaps. It works by dividing the heap into regions and predicting which
> > regions contain the most reclaimable space. By collecting those regions
> > first (garbage first), it can minimize pauses and operate on very large
> > heaps.
> >
> > Sounds great! Unfortunately, G1GC is still new, and fresh bugs are found
> > routinely. These bugs are usually of the segfault variety, and will cause
> > hard crashes. The Lucene test suite is brutal on GC algorithms, and it
> > seems that G1GC hasn’t had the kinks worked out yet.
> >
> > We would like to recommend G1GC someday, but for now, it is simply not
> > stable enough to meet the demands of Elasticsearch and Lucene."
> > (
> >
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_don_8217_t_touch_these_settings.html
> > )
> >
> > In fact, the problems with G1GC can sometimes lead to index corruption,
> > and are hard to reproduce. So better don't use...
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> > > -----Original Message-----
> > > From: McKinley, James T [mailto:james.mckinley@cengage.com]
> > > Sent: Tuesday, January 27, 2015 8:58 PM
> > > To: java-user@lucene.apache.org
> > > Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
> > >
> > > Why do you say not to use G1GC?  We are using Java 7 & G1GC with Lucene
> > > 4.8.1 in production.  Thanks.
> > >
> > > Jim
> > > ________________________________________
> > > From: Uwe Schindler [uwe@thetaphi.de]
> > > Sent: Tuesday, January 27, 2015 2:49 PM
> > > To: java-user@lucene.apache.org; 'kiwi clive'
> > > Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
> > >
> > > Java 8 update 20 or later is also fine. At current time, always use
> > latest update
> > > release and you are be fine with Java 7 and Java 8. Don't use older
> > releases
> > > and don't use G1 Garbage Collector.
> > >
> > > -----
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > http://www.thetaphi.de
> > > eMail: uwe@thetaphi.de
> > >
> > >
> > > > -----Original Message-----
> > > > From: kiwi clive [mailto:kiwi_clive@yahoo.com.INVALID]
> > > > Sent: Tuesday, January 27, 2015 8:03 PM
> > > > To: java-user@lucene.apache.org
> > > > Subject: Re: Lucene Version Upgrade (3->4) and Java JVM
> Versions(6->8)
> > > >
> > > > Hi Hoss,
> > > > Many thanks for the information. This looks very encouraging as the
> > > > Java7 bug I remember  was fixed and as far as I know, we should not
> be
> > > > affected by the others.
> > > > I'll put a few tests together and put my toe in the water :-) Clive
> > > >
> > > >       From: Chris Hostetter <ho...@fucit.org>
> > > >  To: "java-user@lucene.apache.org" <ja...@lucene.apache.org>;
> kiwi
> > > > clive <ki...@yahoo.com>
> > > >  Sent: Tuesday, January 27, 2015 4:01 PM
> > > >  Subject: Re: Lucene Version Upgrade (3->4) and Java JVM
> > > > Versions(6->8)
> > > >
> > > >
> > > >
> > > >
> > > > : I seem to remember reading that certain versions of lucene were
> > > > : incompatible with some java versions although I cannot find
> anything
> > > > to
> > > > : verify this. As we have tens of thousands of large indexes,
> > > > backwards
> > > > : compatibility without the need to reindex on an upgrade is of prime
> > > > : importance to us.
> > > >
> > > > All known JVM bugs affecting Lucene are listed here...
> > > >
> > > > https://wiki.apache.org/lucene-java/JavaBugs
> > > >
> > > >
> > > > -Hoss
> > > > http://www.lucidworks.com/
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

Posted by Robert Muir <rc...@gmail.com>.
On Thu, Feb 12, 2015 at 11:58 AM, McKinley, James T
<ja...@cengage.com> wrote:
> Hi Robert,
>
> Thanks for responding to my message.  Are you saying that you or others have encountered problems running Lucene 4.8+ on the 64-bit Java SE 1.7 JVM with G1 and was it on Windows or on Linux?  If so, where can I find out more?  I only looked into the one bug because that was the only bug I saw on the https://wiki.apache.org/lucene-java/JavaBugs page that was related to G1.  If there are other Lucene on Java 1.7 with G1 related bugs how can I find them?  Also, are these failures something that would be triggered by running the standard Lucene 4.8.1 test suite or are there other tests I should run in order to reproduce these bugs?

You can't reproduce them easily. That is the nature of such bugs. When
i see the crashes, i generally try to confirm its not a lucene bug.
E.g. ill run it a thousand times with/without g1 and if only g1 fails,
i move on with life. There just isnt time.

Occasionally G1 frustrates me enough, ill go and open an issue, like
this one: https://issues.apache.org/jira/browse/LUCENE-6098

Thats a perfect example of what these bugs look like, horribly scary
failures that can cause bad things, and reproduce like 1/1000 times
with G1, essentially impossible to debug. They happen quite often in
our various jenkins servers, on both 32-bit and 64-bit, and even with
the most recent (e.g. 1.8.0_25 or 1.8.0_40-ea) jvms.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

Posted by "McKinley, James T" <ja...@cengage.com>.
Hi Robert,

Thanks for responding to my message.  Are you saying that you or others have encountered problems running Lucene 4.8+ on the 64-bit Java SE 1.7 JVM with G1 and was it on Windows or on Linux?  If so, where can I find out more?  I only looked into the one bug because that was the only bug I saw on the https://wiki.apache.org/lucene-java/JavaBugs page that was related to G1.  If there are other Lucene on Java 1.7 with G1 related bugs how can I find them?  Also, are these failures something that would be triggered by running the standard Lucene 4.8.1 test suite or are there other tests I should run in order to reproduce these bugs?

We have been running the user facing runtime portion of our search engine using Java SE 1.7.0_04 with the G1 garbage collector for almost two years now and I was not aware of these JVM bugs with Lucene.  However, the indexing workflow portion of our system uses Parallel GC since it is a batch system and is not constrained by user facing response time requirements.  From what I understood from the JDK-8038348 bug comments, it is a compiler bug that can be tripped when using G1 and if the compiler is producing incorrect code I guess any behaviour is possible.  

We have experienced index corruption 3 times so far since upgrading to Lucene 4.8.1 from Lucene 4.4 (I don't recall any corruption prior to moving to 4.8) but as I said we are using Parallel GC (-XX:+UseParallelGC -XX:+UseParallelOldGC) in the indexing workflow that writes the indexes, we only use G1 in the runtime system that does no index writing.  We have twice encountered index corruption during the index creation workflow (the runtime system never opened the indexes) and once found the index to be corrupt when we restarted the runtime on it.  So this may just be JVM bugs that can be triggered regardless of which garbage collector is used (which is of course even worse).  We do have relatively large indexes (530M+ docs total across 30 partitions), so maybe we're more likely to see corruption even when using Parallel GC?  We haven't seen any corruption since the end of September 2014, but we have now added an index checking step to our workflow to ensure we don't ever point the runtime at a bad batch.  When we've encountered index corruption in the past we've just deleted the bad batch and re-ran the workflow and the subsequent runs have succeeded.  We've never figured out what caused the corruption.  Thanks for any further help.

Jim
________________________________________
From: Robert Muir [rcmuir@gmail.com]
Sent: Wednesday, February 11, 2015 5:05 PM
To: java-user
Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

No, because you only looked into one bug. We have seen and do so see
many G1 related test failures, including latest 1.8.0 update 40 early
access editions. These include things like corruption.

I added this message with *every intention* to scare away users,
because I don't want them having index corruption.

I am sick of people asking "but isnt it fine on the latest version"
and so on. It is not.

On Wed, Feb 11, 2015 at 11:41 AM, McKinley, James T
<ja...@cengage.com> wrote:
> Hi,
>
> A couple mailing list members have brought the following paragraph from the https://wiki.apache.org/lucene-java/JavaBugs page to my attention:
>
> "Do not, under any circumstances, run Lucene with the G1 garbage collector. Lucene's test suite fails with the G1 garbage collector on a regular basis, including bugs that cause index corruption. There is no person on this planet that seems to understand such bugs (see https://bugs.openjdk.java.net/browse/JDK-8038348, open for over a year), so don't count on the situation changing soon. This information is not out of date, and don't think that the next oracle java release will fix the situation."
>
> Since we run Lucene 4.8.1 on Java(TM) SE Runtime Environment (build 1.7.0_04-b20) Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode) using G1GC in production I felt I should look into the issue and see if it is reproducible in our environment.  First I read the bug linked in the above paragraph as well as https://issues.apache.org/jira/browse/LUCENE-5168 and it appears quite a bit of work in trying to track down this bug has already been done by Dawid Weiss and Vladmir Kozlov but it seems it is limited to the 32-bit JVM (maybe even only on Windows), to quote Dawid Weiss from the Jira bug:
>
> "My quest continues
>
> I thought it'd be interesting to see how far back I can trace this
> issue. I fetched the official binaries for jdk17 (windows, 32-bit) and
> did a binary search with the failing Lucene test command. The results
> show that, in short:
>
> ...
> jdk1.7.0_03: PASSES
> jdk1.7.0_04: FAILS
> ...
>
> and are consistent before and after. jdk1.7.0_04, 64-bit does *NOT*
> exhibit the issue (and neither does any version afterwards, it only
> happens on 32-bit; perhaps it's because of smaller number of available
> registers and the need to spill?).
>
> jdk1.7.0_04 was when G1GC was "officially" made supported but I don't
> think this plays a big difference. I'll see if I can bsearch on
> mercurial revisions to see which particular revision introduced the
> problem. Anyway, the problem has to be a long-standing issue and not a
> regression. Which makes it even more interesting I guess.
>
> Dawid"
>
> In addition the second to last comment in the LUCENE-5168 bug is "I don't think this is closely related to G1GC. It looks more that G1GC happily triggers this bug in this special case."
>
> Just to make sure the bug wasn't reproducible with our specific environment I checked out the tag for Lucene 4.8.1 (http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_8_1) and made the following change to common-build.xml:
>
> gada@C006129:~/workspace-java/lucene_solr_4_8_1/lucene$ svn diff common-build.xml
> Index: common-build.xml
> ===================================================================
> --- common-build.xml    (revision 1658458)
> +++ common-build.xml    (working copy)
> @@ -92,7 +92,7 @@
>    </path>
>
>    <!-- default arguments to pass to JVM executing tests -->
> -  <property name="args" value=""/>
> +  <property name="args" value="-XX:+UnlockDiagnosticVMOptions -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:InitiatingHeapOccupancyPercent=65 -XX:ParallelGCThreads=12 -verbose:gc -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/home/gada/tmp/lucene-test-gc.log -XX:LogFile=/home/gada/tmp/lucene-test-vmop.log -XX:+LogVMOutput -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1"/>
>
>    <property name="tests.seed" value="" />
>
> I then ran the following script:
>
> #!/bin/bash
> count=0
> while ant test ; do
>         count=$[$count +1]
>         printf "\n\n\nrun $count completed without errors\n\n\n"
>         if [ "$count" -ge 100 ]; then
>                 break
>         fi
>         sleep 1
> done
>
> All tests ran successfully 100 times in a row on a dual 6-core CPU Intel Xeon Lenovo C30 ThinkStation with 64GB RAM running the Ubuntu 14.04 LTS Linux distribution.  I also successfully ran the test suite a few times on Java(TM) SE Runtime Environment (build 1.7.0_55-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) since I had it available.
>
> TL;DR:
>
> I think perhaps the sentence: "Do not, under any circumstances, run Lucene with the G1 garbage collector." is a bit too strong.  Maybe a more balanced statement is in order?  For example, "we've found that the OpenJDK/Oracle 32-bit JVM (if only on Windows, say only on Windows) has a bug that when used in combination with the the G1 garbage collector causes incorrect code to be produced possibly resulting in index corruption", or something along those lines.  It seems a shame to possibly scare new Lucene users away from using G1GC with the 64-bit JVM given that it has better performance on large heaps which are becoming more common today.
>
> FWIW,
> Jim
> ________________________________________
> From: McKinley, James T [james.mckinley@cengage.com]
> Sent: Monday, February 09, 2015 11:00 AM
> To: java-user@lucene.apache.org
> Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>
> OK thanks Erick, I have put a story in our jira backlog to investigate the G1GC issues with the Lucene test suite.  I don't know if we'll be able to shed any light on the issue, but since we're using Lucene with Java 7 G1GC, I guess we better investigate it.
>
> Jim
> ________________________________________
> From: Erick Erickson [erickerickson@gmail.com]
> Sent: Saturday, February 07, 2015 2:22 PM
> To: java-user
> Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>
> The G1C1 issue reference by Robert Muir on the Wiki page is at a
> Lucene level. Lucene, of course, is critically important to Solr so
> from that perspective it is about Solr too.
>
> https://wiki.apache.org/lucene-java/JavaBugs
>
> And, I assume, it also applies to your custom app.
>
> FWIW,
> Erick
>
> On Fri, Feb 6, 2015 at 12:10 PM, McKinley, James T
> <ja...@cengage.com> wrote:
>> Just to be clear in case there was any confusion about my previous message regarding G1GC, we do not use Solr, my team works on a proprietary Lucene-based search engine.  Consequently, I can't really give any advice regarding Solr with G1GC, but for our uses (so far anyway), G1GC seems to work well with Lucene.
>>
>> Jim
>> ________________________________________
>> From: Piotr Idzikowski [piotridzikowski@gmail.com]
>> Sent: Friday, February 06, 2015 5:35 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>>
>> Hello.
>> A little bit delayed question. But recently I have found this articles:
>> https://wiki.apache.org/solr/SolrPerformanceProblems
>> https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
>>
>> Especially this part from first url:
>> *Using the ConcurrentMarkSweep (CMS) collector with tuning parameters is a
>> very good option for for Solr, but with the latest Java 7 releases (7u72 at
>> the time of this writing), G1 is looking like a better option, if the
>> -XX:+ParallelRefProcEnabled option is used.*
>>
>> How does it play with *"Do not, under any circumstances, run Lucene with
>> the G1 garbage collector."*
>> from https://wiki.apache.org/lucene-java/JavaBugs?
>>
>> Regards
>> Piotr
>>
>> On Tue, Jan 27, 2015 at 9:55 PM, McKinley, James T <
>> james.mckinley@cengage.com> wrote:
>>
>>> Hi Uwe,
>>>
>>> OK, thanks for the info.  We'll see if we can download the Lucene test
>>> suite and check it out.
>>>
>>> FWIW, we use G1GC in our production runtime (~70 12-16 core Cisco UCS and
>>> HP Gen7/Gen8 nodes with 20+ GB heaps using Java 7 and Lucene 4.8.1 with
>>> pairs of 30 index partitions with 15M-23M docs each) and have not
>>> experienced any VM crashes (well, maybe a couple, but not directly
>>> traceable to G1 to my knowledge).  We have found some undocumented pauses
>>> in G1 due to very large object arrays and filed a bug report which was
>>> confirmed and also affects CMS (we worked around this in our code using
>>> memory mapping of some files whose contents we previously held all in
>>> RAM).  I think the only index corruption we've ever seen was in our index
>>> creation workflow (~30 HP Gen7 nodes with 27GB heaps) but this was using
>>> Parallel GC since it is a batch system, so that corruption (which we've not
>>> seen recently and never found a cause for) was definitely not due to G1GC.
>>>
>>> G1GC has bugs as does CMS but we've found it to work pretty well so far in
>>> our runtime system.  Of course YMMV, thanks again for the info.
>>>
>>> Jim
>>> ________________________________________
>>> From: Uwe Schindler [uwe@thetaphi.de]
>>> Sent: Tuesday, January 27, 2015 3:02 PM
>>> To: java-user@lucene.apache.org
>>> Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>>>
>>> Hi.,
>>>
>>> About G1GC. We consistently see problems when running the Lucene Testsuite
>>> with G1GC enabled. The people from Elasticsearch concluded:
>>>
>>> "There is a newer GC called the Garbage First GC (G1GC). This newer GC is
>>> designed to minimize pausing even more than CMS, and operate on large
>>> heaps. It works by dividing the heap into regions and predicting which
>>> regions contain the most reclaimable space. By collecting those regions
>>> first (garbage first), it can minimize pauses and operate on very large
>>> heaps.
>>>
>>> Sounds great! Unfortunately, G1GC is still new, and fresh bugs are found
>>> routinely. These bugs are usually of the segfault variety, and will cause
>>> hard crashes. The Lucene test suite is brutal on GC algorithms, and it
>>> seems that G1GC hasn’t had the kinks worked out yet.
>>>
>>> We would like to recommend G1GC someday, but for now, it is simply not
>>> stable enough to meet the demands of Elasticsearch and Lucene."
>>> (
>>> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_don_8217_t_touch_these_settings.html
>>> )
>>>
>>> In fact, the problems with G1GC can sometimes lead to index corruption,
>>> and are hard to reproduce. So better don't use...
>>>
>>> Uwe
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de
>>>
>>>
>>> > -----Original Message-----
>>> > From: McKinley, James T [mailto:james.mckinley@cengage.com]
>>> > Sent: Tuesday, January 27, 2015 8:58 PM
>>> > To: java-user@lucene.apache.org
>>> > Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>>> >
>>> > Why do you say not to use G1GC?  We are using Java 7 & G1GC with Lucene
>>> > 4.8.1 in production.  Thanks.
>>> >
>>> > Jim
>>> > ________________________________________
>>> > From: Uwe Schindler [uwe@thetaphi.de]
>>> > Sent: Tuesday, January 27, 2015 2:49 PM
>>> > To: java-user@lucene.apache.org; 'kiwi clive'
>>> > Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>>> >
>>> > Java 8 update 20 or later is also fine. At current time, always use
>>> latest update
>>> > release and you are be fine with Java 7 and Java 8. Don't use older
>>> releases
>>> > and don't use G1 Garbage Collector.
>>> >
>>> > -----
>>> > Uwe Schindler
>>> > H.-H.-Meier-Allee 63, D-28213 Bremen
>>> > http://www.thetaphi.de
>>> > eMail: uwe@thetaphi.de
>>> >
>>> >
>>> > > -----Original Message-----
>>> > > From: kiwi clive [mailto:kiwi_clive@yahoo.com.INVALID]
>>> > > Sent: Tuesday, January 27, 2015 8:03 PM
>>> > > To: java-user@lucene.apache.org
>>> > > Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>>> > >
>>> > > Hi Hoss,
>>> > > Many thanks for the information. This looks very encouraging as the
>>> > > Java7 bug I remember  was fixed and as far as I know, we should not be
>>> > > affected by the others.
>>> > > I'll put a few tests together and put my toe in the water :-) Clive
>>> > >
>>> > >       From: Chris Hostetter <ho...@fucit.org>
>>> > >  To: "java-user@lucene.apache.org" <ja...@lucene.apache.org>; kiwi
>>> > > clive <ki...@yahoo.com>
>>> > >  Sent: Tuesday, January 27, 2015 4:01 PM
>>> > >  Subject: Re: Lucene Version Upgrade (3->4) and Java JVM
>>> > > Versions(6->8)
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > : I seem to remember reading that certain versions of lucene were
>>> > > : incompatible with some java versions although I cannot find anything
>>> > > to
>>> > > : verify this. As we have tens of thousands of large indexes,
>>> > > backwards
>>> > > : compatibility without the need to reindex on an upgrade is of prime
>>> > > : importance to us.
>>> > >
>>> > > All known JVM bugs affecting Lucene are listed here...
>>> > >
>>> > > https://wiki.apache.org/lucene-java/JavaBugs
>>> > >
>>> > >
>>> > > -Hoss
>>> > > http://www.lucidworks.com/
>>> > >
>>> > > ---------------------------------------------------------------------
>>> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> > > For additional commands, e-mail: java-user-help@lucene.apache.org
>>> > >
>>> > >
>>> > >
>>> > >
>>> >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

Posted by Robert Muir <rc...@gmail.com>.
No, because you only looked into one bug. We have seen and do so see
many G1 related test failures, including latest 1.8.0 update 40 early
access editions. These include things like corruption.

I added this message with *every intention* to scare away users,
because I don't want them having index corruption.

I am sick of people asking "but isnt it fine on the latest version"
and so on. It is not.

On Wed, Feb 11, 2015 at 11:41 AM, McKinley, James T
<ja...@cengage.com> wrote:
> Hi,
>
> A couple mailing list members have brought the following paragraph from the https://wiki.apache.org/lucene-java/JavaBugs page to my attention:
>
> "Do not, under any circumstances, run Lucene with the G1 garbage collector. Lucene's test suite fails with the G1 garbage collector on a regular basis, including bugs that cause index corruption. There is no person on this planet that seems to understand such bugs (see https://bugs.openjdk.java.net/browse/JDK-8038348, open for over a year), so don't count on the situation changing soon. This information is not out of date, and don't think that the next oracle java release will fix the situation."
>
> Since we run Lucene 4.8.1 on Java(TM) SE Runtime Environment (build 1.7.0_04-b20) Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode) using G1GC in production I felt I should look into the issue and see if it is reproducible in our environment.  First I read the bug linked in the above paragraph as well as https://issues.apache.org/jira/browse/LUCENE-5168 and it appears quite a bit of work in trying to track down this bug has already been done by Dawid Weiss and Vladmir Kozlov but it seems it is limited to the 32-bit JVM (maybe even only on Windows), to quote Dawid Weiss from the Jira bug:
>
> "My quest continues
>
> I thought it'd be interesting to see how far back I can trace this
> issue. I fetched the official binaries for jdk17 (windows, 32-bit) and
> did a binary search with the failing Lucene test command. The results
> show that, in short:
>
> ...
> jdk1.7.0_03: PASSES
> jdk1.7.0_04: FAILS
> ...
>
> and are consistent before and after. jdk1.7.0_04, 64-bit does *NOT*
> exhibit the issue (and neither does any version afterwards, it only
> happens on 32-bit; perhaps it's because of smaller number of available
> registers and the need to spill?).
>
> jdk1.7.0_04 was when G1GC was "officially" made supported but I don't
> think this plays a big difference. I'll see if I can bsearch on
> mercurial revisions to see which particular revision introduced the
> problem. Anyway, the problem has to be a long-standing issue and not a
> regression. Which makes it even more interesting I guess.
>
> Dawid"
>
> In addition the second to last comment in the LUCENE-5168 bug is "I don't think this is closely related to G1GC. It looks more that G1GC happily triggers this bug in this special case."
>
> Just to make sure the bug wasn't reproducible with our specific environment I checked out the tag for Lucene 4.8.1 (http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_8_1) and made the following change to common-build.xml:
>
> gada@C006129:~/workspace-java/lucene_solr_4_8_1/lucene$ svn diff common-build.xml
> Index: common-build.xml
> ===================================================================
> --- common-build.xml    (revision 1658458)
> +++ common-build.xml    (working copy)
> @@ -92,7 +92,7 @@
>    </path>
>
>    <!-- default arguments to pass to JVM executing tests -->
> -  <property name="args" value=""/>
> +  <property name="args" value="-XX:+UnlockDiagnosticVMOptions -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:InitiatingHeapOccupancyPercent=65 -XX:ParallelGCThreads=12 -verbose:gc -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/home/gada/tmp/lucene-test-gc.log -XX:LogFile=/home/gada/tmp/lucene-test-vmop.log -XX:+LogVMOutput -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1"/>
>
>    <property name="tests.seed" value="" />
>
> I then ran the following script:
>
> #!/bin/bash
> count=0
> while ant test ; do
>         count=$[$count +1]
>         printf "\n\n\nrun $count completed without errors\n\n\n"
>         if [ "$count" -ge 100 ]; then
>                 break
>         fi
>         sleep 1
> done
>
> All tests ran successfully 100 times in a row on a dual 6-core CPU Intel Xeon Lenovo C30 ThinkStation with 64GB RAM running the Ubuntu 14.04 LTS Linux distribution.  I also successfully ran the test suite a few times on Java(TM) SE Runtime Environment (build 1.7.0_55-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) since I had it available.
>
> TL;DR:
>
> I think perhaps the sentence: "Do not, under any circumstances, run Lucene with the G1 garbage collector." is a bit too strong.  Maybe a more balanced statement is in order?  For example, "we've found that the OpenJDK/Oracle 32-bit JVM (if only on Windows, say only on Windows) has a bug that when used in combination with the the G1 garbage collector causes incorrect code to be produced possibly resulting in index corruption", or something along those lines.  It seems a shame to possibly scare new Lucene users away from using G1GC with the 64-bit JVM given that it has better performance on large heaps which are becoming more common today.
>
> FWIW,
> Jim
> ________________________________________
> From: McKinley, James T [james.mckinley@cengage.com]
> Sent: Monday, February 09, 2015 11:00 AM
> To: java-user@lucene.apache.org
> Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>
> OK thanks Erick, I have put a story in our jira backlog to investigate the G1GC issues with the Lucene test suite.  I don't know if we'll be able to shed any light on the issue, but since we're using Lucene with Java 7 G1GC, I guess we better investigate it.
>
> Jim
> ________________________________________
> From: Erick Erickson [erickerickson@gmail.com]
> Sent: Saturday, February 07, 2015 2:22 PM
> To: java-user
> Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>
> The G1C1 issue reference by Robert Muir on the Wiki page is at a
> Lucene level. Lucene, of course, is critically important to Solr so
> from that perspective it is about Solr too.
>
> https://wiki.apache.org/lucene-java/JavaBugs
>
> And, I assume, it also applies to your custom app.
>
> FWIW,
> Erick
>
> On Fri, Feb 6, 2015 at 12:10 PM, McKinley, James T
> <ja...@cengage.com> wrote:
>> Just to be clear in case there was any confusion about my previous message regarding G1GC, we do not use Solr, my team works on a proprietary Lucene-based search engine.  Consequently, I can't really give any advice regarding Solr with G1GC, but for our uses (so far anyway), G1GC seems to work well with Lucene.
>>
>> Jim
>> ________________________________________
>> From: Piotr Idzikowski [piotridzikowski@gmail.com]
>> Sent: Friday, February 06, 2015 5:35 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>>
>> Hello.
>> A little bit delayed question. But recently I have found this articles:
>> https://wiki.apache.org/solr/SolrPerformanceProblems
>> https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
>>
>> Especially this part from first url:
>> *Using the ConcurrentMarkSweep (CMS) collector with tuning parameters is a
>> very good option for for Solr, but with the latest Java 7 releases (7u72 at
>> the time of this writing), G1 is looking like a better option, if the
>> -XX:+ParallelRefProcEnabled option is used.*
>>
>> How does it play with *"Do not, under any circumstances, run Lucene with
>> the G1 garbage collector."*
>> from https://wiki.apache.org/lucene-java/JavaBugs?
>>
>> Regards
>> Piotr
>>
>> On Tue, Jan 27, 2015 at 9:55 PM, McKinley, James T <
>> james.mckinley@cengage.com> wrote:
>>
>>> Hi Uwe,
>>>
>>> OK, thanks for the info.  We'll see if we can download the Lucene test
>>> suite and check it out.
>>>
>>> FWIW, we use G1GC in our production runtime (~70 12-16 core Cisco UCS and
>>> HP Gen7/Gen8 nodes with 20+ GB heaps using Java 7 and Lucene 4.8.1 with
>>> pairs of 30 index partitions with 15M-23M docs each) and have not
>>> experienced any VM crashes (well, maybe a couple, but not directly
>>> traceable to G1 to my knowledge).  We have found some undocumented pauses
>>> in G1 due to very large object arrays and filed a bug report which was
>>> confirmed and also affects CMS (we worked around this in our code using
>>> memory mapping of some files whose contents we previously held all in
>>> RAM).  I think the only index corruption we've ever seen was in our index
>>> creation workflow (~30 HP Gen7 nodes with 27GB heaps) but this was using
>>> Parallel GC since it is a batch system, so that corruption (which we've not
>>> seen recently and never found a cause for) was definitely not due to G1GC.
>>>
>>> G1GC has bugs as does CMS but we've found it to work pretty well so far in
>>> our runtime system.  Of course YMMV, thanks again for the info.
>>>
>>> Jim
>>> ________________________________________
>>> From: Uwe Schindler [uwe@thetaphi.de]
>>> Sent: Tuesday, January 27, 2015 3:02 PM
>>> To: java-user@lucene.apache.org
>>> Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>>>
>>> Hi.,
>>>
>>> About G1GC. We consistently see problems when running the Lucene Testsuite
>>> with G1GC enabled. The people from Elasticsearch concluded:
>>>
>>> "There is a newer GC called the Garbage First GC (G1GC). This newer GC is
>>> designed to minimize pausing even more than CMS, and operate on large
>>> heaps. It works by dividing the heap into regions and predicting which
>>> regions contain the most reclaimable space. By collecting those regions
>>> first (garbage first), it can minimize pauses and operate on very large
>>> heaps.
>>>
>>> Sounds great! Unfortunately, G1GC is still new, and fresh bugs are found
>>> routinely. These bugs are usually of the segfault variety, and will cause
>>> hard crashes. The Lucene test suite is brutal on GC algorithms, and it
>>> seems that G1GC hasn’t had the kinks worked out yet.
>>>
>>> We would like to recommend G1GC someday, but for now, it is simply not
>>> stable enough to meet the demands of Elasticsearch and Lucene."
>>> (
>>> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_don_8217_t_touch_these_settings.html
>>> )
>>>
>>> In fact, the problems with G1GC can sometimes lead to index corruption,
>>> and are hard to reproduce. So better don't use...
>>>
>>> Uwe
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de
>>>
>>>
>>> > -----Original Message-----
>>> > From: McKinley, James T [mailto:james.mckinley@cengage.com]
>>> > Sent: Tuesday, January 27, 2015 8:58 PM
>>> > To: java-user@lucene.apache.org
>>> > Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>>> >
>>> > Why do you say not to use G1GC?  We are using Java 7 & G1GC with Lucene
>>> > 4.8.1 in production.  Thanks.
>>> >
>>> > Jim
>>> > ________________________________________
>>> > From: Uwe Schindler [uwe@thetaphi.de]
>>> > Sent: Tuesday, January 27, 2015 2:49 PM
>>> > To: java-user@lucene.apache.org; 'kiwi clive'
>>> > Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>>> >
>>> > Java 8 update 20 or later is also fine. At current time, always use
>>> latest update
>>> > release and you are be fine with Java 7 and Java 8. Don't use older
>>> releases
>>> > and don't use G1 Garbage Collector.
>>> >
>>> > -----
>>> > Uwe Schindler
>>> > H.-H.-Meier-Allee 63, D-28213 Bremen
>>> > http://www.thetaphi.de
>>> > eMail: uwe@thetaphi.de
>>> >
>>> >
>>> > > -----Original Message-----
>>> > > From: kiwi clive [mailto:kiwi_clive@yahoo.com.INVALID]
>>> > > Sent: Tuesday, January 27, 2015 8:03 PM
>>> > > To: java-user@lucene.apache.org
>>> > > Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>>> > >
>>> > > Hi Hoss,
>>> > > Many thanks for the information. This looks very encouraging as the
>>> > > Java7 bug I remember  was fixed and as far as I know, we should not be
>>> > > affected by the others.
>>> > > I'll put a few tests together and put my toe in the water :-) Clive
>>> > >
>>> > >       From: Chris Hostetter <ho...@fucit.org>
>>> > >  To: "java-user@lucene.apache.org" <ja...@lucene.apache.org>; kiwi
>>> > > clive <ki...@yahoo.com>
>>> > >  Sent: Tuesday, January 27, 2015 4:01 PM
>>> > >  Subject: Re: Lucene Version Upgrade (3->4) and Java JVM
>>> > > Versions(6->8)
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > : I seem to remember reading that certain versions of lucene were
>>> > > : incompatible with some java versions although I cannot find anything
>>> > > to
>>> > > : verify this. As we have tens of thousands of large indexes,
>>> > > backwards
>>> > > : compatibility without the need to reindex on an upgrade is of prime
>>> > > : importance to us.
>>> > >
>>> > > All known JVM bugs affecting Lucene are listed here...
>>> > >
>>> > > https://wiki.apache.org/lucene-java/JavaBugs
>>> > >
>>> > >
>>> > > -Hoss
>>> > > http://www.lucidworks.com/
>>> > >
>>> > > ---------------------------------------------------------------------
>>> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> > > For additional commands, e-mail: java-user-help@lucene.apache.org
>>> > >
>>> > >
>>> > >
>>> > >
>>> >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

Posted by "McKinley, James T" <ja...@cengage.com>.
Hi,

A couple mailing list members have brought the following paragraph from the https://wiki.apache.org/lucene-java/JavaBugs page to my attention:

"Do not, under any circumstances, run Lucene with the G1 garbage collector. Lucene's test suite fails with the G1 garbage collector on a regular basis, including bugs that cause index corruption. There is no person on this planet that seems to understand such bugs (see https://bugs.openjdk.java.net/browse/JDK-8038348, open for over a year), so don't count on the situation changing soon. This information is not out of date, and don't think that the next oracle java release will fix the situation."

Since we run Lucene 4.8.1 on Java(TM) SE Runtime Environment (build 1.7.0_04-b20) Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode) using G1GC in production I felt I should look into the issue and see if it is reproducible in our environment.  First I read the bug linked in the above paragraph as well as https://issues.apache.org/jira/browse/LUCENE-5168 and it appears quite a bit of work in trying to track down this bug has already been done by Dawid Weiss and Vladmir Kozlov but it seems it is limited to the 32-bit JVM (maybe even only on Windows), to quote Dawid Weiss from the Jira bug:

"My quest continues 

I thought it'd be interesting to see how far back I can trace this
issue. I fetched the official binaries for jdk17 (windows, 32-bit) and
did a binary search with the failing Lucene test command. The results
show that, in short:

...
jdk1.7.0_03: PASSES
jdk1.7.0_04: FAILS
...

and are consistent before and after. jdk1.7.0_04, 64-bit does *NOT*
exhibit the issue (and neither does any version afterwards, it only
happens on 32-bit; perhaps it's because of smaller number of available
registers and the need to spill?).

jdk1.7.0_04 was when G1GC was "officially" made supported but I don't
think this plays a big difference. I'll see if I can bsearch on
mercurial revisions to see which particular revision introduced the
problem. Anyway, the problem has to be a long-standing issue and not a
regression. Which makes it even more interesting I guess.

Dawid"

In addition the second to last comment in the LUCENE-5168 bug is "I don't think this is closely related to G1GC. It looks more that G1GC happily triggers this bug in this special case."

Just to make sure the bug wasn't reproducible with our specific environment I checked out the tag for Lucene 4.8.1 (http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_8_1) and made the following change to common-build.xml:

gada@C006129:~/workspace-java/lucene_solr_4_8_1/lucene$ svn diff common-build.xml 
Index: common-build.xml
===================================================================
--- common-build.xml	(revision 1658458)
+++ common-build.xml	(working copy)
@@ -92,7 +92,7 @@
   </path>
 
   <!-- default arguments to pass to JVM executing tests -->
-  <property name="args" value=""/>
+  <property name="args" value="-XX:+UnlockDiagnosticVMOptions -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:InitiatingHeapOccupancyPercent=65 -XX:ParallelGCThreads=12 -verbose:gc -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/home/gada/tmp/lucene-test-gc.log -XX:LogFile=/home/gada/tmp/lucene-test-vmop.log -XX:+LogVMOutput -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1"/>
 
   <property name="tests.seed" value="" />
 
I then ran the following script:

#!/bin/bash
count=0
while ant test ; do
	count=$[$count +1]
	printf "\n\n\nrun $count completed without errors\n\n\n"
	if [ "$count" -ge 100 ]; then
		break
	fi
	sleep 1
done

All tests ran successfully 100 times in a row on a dual 6-core CPU Intel Xeon Lenovo C30 ThinkStation with 64GB RAM running the Ubuntu 14.04 LTS Linux distribution.  I also successfully ran the test suite a few times on Java(TM) SE Runtime Environment (build 1.7.0_55-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode) since I had it available.

TL;DR:

I think perhaps the sentence: "Do not, under any circumstances, run Lucene with the G1 garbage collector." is a bit too strong.  Maybe a more balanced statement is in order?  For example, "we've found that the OpenJDK/Oracle 32-bit JVM (if only on Windows, say only on Windows) has a bug that when used in combination with the the G1 garbage collector causes incorrect code to be produced possibly resulting in index corruption", or something along those lines.  It seems a shame to possibly scare new Lucene users away from using G1GC with the 64-bit JVM given that it has better performance on large heaps which are becoming more common today.

FWIW,
Jim
________________________________________
From: McKinley, James T [james.mckinley@cengage.com]
Sent: Monday, February 09, 2015 11:00 AM
To: java-user@lucene.apache.org
Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

OK thanks Erick, I have put a story in our jira backlog to investigate the G1GC issues with the Lucene test suite.  I don't know if we'll be able to shed any light on the issue, but since we're using Lucene with Java 7 G1GC, I guess we better investigate it.

Jim
________________________________________
From: Erick Erickson [erickerickson@gmail.com]
Sent: Saturday, February 07, 2015 2:22 PM
To: java-user
Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

The G1C1 issue reference by Robert Muir on the Wiki page is at a
Lucene level. Lucene, of course, is critically important to Solr so
from that perspective it is about Solr too.

https://wiki.apache.org/lucene-java/JavaBugs

And, I assume, it also applies to your custom app.

FWIW,
Erick

On Fri, Feb 6, 2015 at 12:10 PM, McKinley, James T
<ja...@cengage.com> wrote:
> Just to be clear in case there was any confusion about my previous message regarding G1GC, we do not use Solr, my team works on a proprietary Lucene-based search engine.  Consequently, I can't really give any advice regarding Solr with G1GC, but for our uses (so far anyway), G1GC seems to work well with Lucene.
>
> Jim
> ________________________________________
> From: Piotr Idzikowski [piotridzikowski@gmail.com]
> Sent: Friday, February 06, 2015 5:35 AM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>
> Hello.
> A little bit delayed question. But recently I have found this articles:
> https://wiki.apache.org/solr/SolrPerformanceProblems
> https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
>
> Especially this part from first url:
> *Using the ConcurrentMarkSweep (CMS) collector with tuning parameters is a
> very good option for for Solr, but with the latest Java 7 releases (7u72 at
> the time of this writing), G1 is looking like a better option, if the
> -XX:+ParallelRefProcEnabled option is used.*
>
> How does it play with *"Do not, under any circumstances, run Lucene with
> the G1 garbage collector."*
> from https://wiki.apache.org/lucene-java/JavaBugs?
>
> Regards
> Piotr
>
> On Tue, Jan 27, 2015 at 9:55 PM, McKinley, James T <
> james.mckinley@cengage.com> wrote:
>
>> Hi Uwe,
>>
>> OK, thanks for the info.  We'll see if we can download the Lucene test
>> suite and check it out.
>>
>> FWIW, we use G1GC in our production runtime (~70 12-16 core Cisco UCS and
>> HP Gen7/Gen8 nodes with 20+ GB heaps using Java 7 and Lucene 4.8.1 with
>> pairs of 30 index partitions with 15M-23M docs each) and have not
>> experienced any VM crashes (well, maybe a couple, but not directly
>> traceable to G1 to my knowledge).  We have found some undocumented pauses
>> in G1 due to very large object arrays and filed a bug report which was
>> confirmed and also affects CMS (we worked around this in our code using
>> memory mapping of some files whose contents we previously held all in
>> RAM).  I think the only index corruption we've ever seen was in our index
>> creation workflow (~30 HP Gen7 nodes with 27GB heaps) but this was using
>> Parallel GC since it is a batch system, so that corruption (which we've not
>> seen recently and never found a cause for) was definitely not due to G1GC.
>>
>> G1GC has bugs as does CMS but we've found it to work pretty well so far in
>> our runtime system.  Of course YMMV, thanks again for the info.
>>
>> Jim
>> ________________________________________
>> From: Uwe Schindler [uwe@thetaphi.de]
>> Sent: Tuesday, January 27, 2015 3:02 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>>
>> Hi.,
>>
>> About G1GC. We consistently see problems when running the Lucene Testsuite
>> with G1GC enabled. The people from Elasticsearch concluded:
>>
>> "There is a newer GC called the Garbage First GC (G1GC). This newer GC is
>> designed to minimize pausing even more than CMS, and operate on large
>> heaps. It works by dividing the heap into regions and predicting which
>> regions contain the most reclaimable space. By collecting those regions
>> first (garbage first), it can minimize pauses and operate on very large
>> heaps.
>>
>> Sounds great! Unfortunately, G1GC is still new, and fresh bugs are found
>> routinely. These bugs are usually of the segfault variety, and will cause
>> hard crashes. The Lucene test suite is brutal on GC algorithms, and it
>> seems that G1GC hasn’t had the kinks worked out yet.
>>
>> We would like to recommend G1GC someday, but for now, it is simply not
>> stable enough to meet the demands of Elasticsearch and Lucene."
>> (
>> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_don_8217_t_touch_these_settings.html
>> )
>>
>> In fact, the problems with G1GC can sometimes lead to index corruption,
>> and are hard to reproduce. So better don't use...
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>> > -----Original Message-----
>> > From: McKinley, James T [mailto:james.mckinley@cengage.com]
>> > Sent: Tuesday, January 27, 2015 8:58 PM
>> > To: java-user@lucene.apache.org
>> > Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>> >
>> > Why do you say not to use G1GC?  We are using Java 7 & G1GC with Lucene
>> > 4.8.1 in production.  Thanks.
>> >
>> > Jim
>> > ________________________________________
>> > From: Uwe Schindler [uwe@thetaphi.de]
>> > Sent: Tuesday, January 27, 2015 2:49 PM
>> > To: java-user@lucene.apache.org; 'kiwi clive'
>> > Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>> >
>> > Java 8 update 20 or later is also fine. At current time, always use
>> latest update
>> > release and you are be fine with Java 7 and Java 8. Don't use older
>> releases
>> > and don't use G1 Garbage Collector.
>> >
>> > -----
>> > Uwe Schindler
>> > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > http://www.thetaphi.de
>> > eMail: uwe@thetaphi.de
>> >
>> >
>> > > -----Original Message-----
>> > > From: kiwi clive [mailto:kiwi_clive@yahoo.com.INVALID]
>> > > Sent: Tuesday, January 27, 2015 8:03 PM
>> > > To: java-user@lucene.apache.org
>> > > Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>> > >
>> > > Hi Hoss,
>> > > Many thanks for the information. This looks very encouraging as the
>> > > Java7 bug I remember  was fixed and as far as I know, we should not be
>> > > affected by the others.
>> > > I'll put a few tests together and put my toe in the water :-) Clive
>> > >
>> > >       From: Chris Hostetter <ho...@fucit.org>
>> > >  To: "java-user@lucene.apache.org" <ja...@lucene.apache.org>; kiwi
>> > > clive <ki...@yahoo.com>
>> > >  Sent: Tuesday, January 27, 2015 4:01 PM
>> > >  Subject: Re: Lucene Version Upgrade (3->4) and Java JVM
>> > > Versions(6->8)
>> > >
>> > >
>> > >
>> > >
>> > > : I seem to remember reading that certain versions of lucene were
>> > > : incompatible with some java versions although I cannot find anything
>> > > to
>> > > : verify this. As we have tens of thousands of large indexes,
>> > > backwards
>> > > : compatibility without the need to reindex on an upgrade is of prime
>> > > : importance to us.
>> > >
>> > > All known JVM bugs affecting Lucene are listed here...
>> > >
>> > > https://wiki.apache.org/lucene-java/JavaBugs
>> > >
>> > >
>> > > -Hoss
>> > > http://www.lucidworks.com/
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > > For additional commands, e-mail: java-user-help@lucene.apache.org
>> > >
>> > >
>> > >
>> > >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

Posted by "McKinley, James T" <ja...@cengage.com>.
OK thanks Erick, I have put a story in our jira backlog to investigate the G1GC issues with the Lucene test suite.  I don't know if we'll be able to shed any light on the issue, but since we're using Lucene with Java 7 G1GC, I guess we better investigate it.

Jim
________________________________________
From: Erick Erickson [erickerickson@gmail.com]
Sent: Saturday, February 07, 2015 2:22 PM
To: java-user
Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

The G1C1 issue reference by Robert Muir on the Wiki page is at a
Lucene level. Lucene, of course, is critically important to Solr so
from that perspective it is about Solr too.

https://wiki.apache.org/lucene-java/JavaBugs

And, I assume, it also applies to your custom app.

FWIW,
Erick

On Fri, Feb 6, 2015 at 12:10 PM, McKinley, James T
<ja...@cengage.com> wrote:
> Just to be clear in case there was any confusion about my previous message regarding G1GC, we do not use Solr, my team works on a proprietary Lucene-based search engine.  Consequently, I can't really give any advice regarding Solr with G1GC, but for our uses (so far anyway), G1GC seems to work well with Lucene.
>
> Jim
> ________________________________________
> From: Piotr Idzikowski [piotridzikowski@gmail.com]
> Sent: Friday, February 06, 2015 5:35 AM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>
> Hello.
> A little bit delayed question. But recently I have found this articles:
> https://wiki.apache.org/solr/SolrPerformanceProblems
> https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
>
> Especially this part from first url:
> *Using the ConcurrentMarkSweep (CMS) collector with tuning parameters is a
> very good option for for Solr, but with the latest Java 7 releases (7u72 at
> the time of this writing), G1 is looking like a better option, if the
> -XX:+ParallelRefProcEnabled option is used.*
>
> How does it play with *"Do not, under any circumstances, run Lucene with
> the G1 garbage collector."*
> from https://wiki.apache.org/lucene-java/JavaBugs?
>
> Regards
> Piotr
>
> On Tue, Jan 27, 2015 at 9:55 PM, McKinley, James T <
> james.mckinley@cengage.com> wrote:
>
>> Hi Uwe,
>>
>> OK, thanks for the info.  We'll see if we can download the Lucene test
>> suite and check it out.
>>
>> FWIW, we use G1GC in our production runtime (~70 12-16 core Cisco UCS and
>> HP Gen7/Gen8 nodes with 20+ GB heaps using Java 7 and Lucene 4.8.1 with
>> pairs of 30 index partitions with 15M-23M docs each) and have not
>> experienced any VM crashes (well, maybe a couple, but not directly
>> traceable to G1 to my knowledge).  We have found some undocumented pauses
>> in G1 due to very large object arrays and filed a bug report which was
>> confirmed and also affects CMS (we worked around this in our code using
>> memory mapping of some files whose contents we previously held all in
>> RAM).  I think the only index corruption we've ever seen was in our index
>> creation workflow (~30 HP Gen7 nodes with 27GB heaps) but this was using
>> Parallel GC since it is a batch system, so that corruption (which we've not
>> seen recently and never found a cause for) was definitely not due to G1GC.
>>
>> G1GC has bugs as does CMS but we've found it to work pretty well so far in
>> our runtime system.  Of course YMMV, thanks again for the info.
>>
>> Jim
>> ________________________________________
>> From: Uwe Schindler [uwe@thetaphi.de]
>> Sent: Tuesday, January 27, 2015 3:02 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>>
>> Hi.,
>>
>> About G1GC. We consistently see problems when running the Lucene Testsuite
>> with G1GC enabled. The people from Elasticsearch concluded:
>>
>> "There is a newer GC called the Garbage First GC (G1GC). This newer GC is
>> designed to minimize pausing even more than CMS, and operate on large
>> heaps. It works by dividing the heap into regions and predicting which
>> regions contain the most reclaimable space. By collecting those regions
>> first (garbage first), it can minimize pauses and operate on very large
>> heaps.
>>
>> Sounds great! Unfortunately, G1GC is still new, and fresh bugs are found
>> routinely. These bugs are usually of the segfault variety, and will cause
>> hard crashes. The Lucene test suite is brutal on GC algorithms, and it
>> seems that G1GC hasn’t had the kinks worked out yet.
>>
>> We would like to recommend G1GC someday, but for now, it is simply not
>> stable enough to meet the demands of Elasticsearch and Lucene."
>> (
>> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_don_8217_t_touch_these_settings.html
>> )
>>
>> In fact, the problems with G1GC can sometimes lead to index corruption,
>> and are hard to reproduce. So better don't use...
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>> > -----Original Message-----
>> > From: McKinley, James T [mailto:james.mckinley@cengage.com]
>> > Sent: Tuesday, January 27, 2015 8:58 PM
>> > To: java-user@lucene.apache.org
>> > Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>> >
>> > Why do you say not to use G1GC?  We are using Java 7 & G1GC with Lucene
>> > 4.8.1 in production.  Thanks.
>> >
>> > Jim
>> > ________________________________________
>> > From: Uwe Schindler [uwe@thetaphi.de]
>> > Sent: Tuesday, January 27, 2015 2:49 PM
>> > To: java-user@lucene.apache.org; 'kiwi clive'
>> > Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>> >
>> > Java 8 update 20 or later is also fine. At current time, always use
>> latest update
>> > release and you are be fine with Java 7 and Java 8. Don't use older
>> releases
>> > and don't use G1 Garbage Collector.
>> >
>> > -----
>> > Uwe Schindler
>> > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > http://www.thetaphi.de
>> > eMail: uwe@thetaphi.de
>> >
>> >
>> > > -----Original Message-----
>> > > From: kiwi clive [mailto:kiwi_clive@yahoo.com.INVALID]
>> > > Sent: Tuesday, January 27, 2015 8:03 PM
>> > > To: java-user@lucene.apache.org
>> > > Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>> > >
>> > > Hi Hoss,
>> > > Many thanks for the information. This looks very encouraging as the
>> > > Java7 bug I remember  was fixed and as far as I know, we should not be
>> > > affected by the others.
>> > > I'll put a few tests together and put my toe in the water :-) Clive
>> > >
>> > >       From: Chris Hostetter <ho...@fucit.org>
>> > >  To: "java-user@lucene.apache.org" <ja...@lucene.apache.org>; kiwi
>> > > clive <ki...@yahoo.com>
>> > >  Sent: Tuesday, January 27, 2015 4:01 PM
>> > >  Subject: Re: Lucene Version Upgrade (3->4) and Java JVM
>> > > Versions(6->8)
>> > >
>> > >
>> > >
>> > >
>> > > : I seem to remember reading that certain versions of lucene were
>> > > : incompatible with some java versions although I cannot find anything
>> > > to
>> > > : verify this. As we have tens of thousands of large indexes,
>> > > backwards
>> > > : compatibility without the need to reindex on an upgrade is of prime
>> > > : importance to us.
>> > >
>> > > All known JVM bugs affecting Lucene are listed here...
>> > >
>> > > https://wiki.apache.org/lucene-java/JavaBugs
>> > >
>> > >
>> > > -Hoss
>> > > http://www.lucidworks.com/
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > > For additional commands, e-mail: java-user-help@lucene.apache.org
>> > >
>> > >
>> > >
>> > >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

Posted by Erick Erickson <er...@gmail.com>.
The G1C1 issue reference by Robert Muir on the Wiki page is at a
Lucene level. Lucene, of course, is critically important to Solr so
from that perspective it is about Solr too.

https://wiki.apache.org/lucene-java/JavaBugs

And, I assume, it also applies to your custom app.

FWIW,
Erick

On Fri, Feb 6, 2015 at 12:10 PM, McKinley, James T
<ja...@cengage.com> wrote:
> Just to be clear in case there was any confusion about my previous message regarding G1GC, we do not use Solr, my team works on a proprietary Lucene-based search engine.  Consequently, I can't really give any advice regarding Solr with G1GC, but for our uses (so far anyway), G1GC seems to work well with Lucene.
>
> Jim
> ________________________________________
> From: Piotr Idzikowski [piotridzikowski@gmail.com]
> Sent: Friday, February 06, 2015 5:35 AM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>
> Hello.
> A little bit delayed question. But recently I have found this articles:
> https://wiki.apache.org/solr/SolrPerformanceProblems
> https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
>
> Especially this part from first url:
> *Using the ConcurrentMarkSweep (CMS) collector with tuning parameters is a
> very good option for for Solr, but with the latest Java 7 releases (7u72 at
> the time of this writing), G1 is looking like a better option, if the
> -XX:+ParallelRefProcEnabled option is used.*
>
> How does it play with *"Do not, under any circumstances, run Lucene with
> the G1 garbage collector."*
> from https://wiki.apache.org/lucene-java/JavaBugs?
>
> Regards
> Piotr
>
> On Tue, Jan 27, 2015 at 9:55 PM, McKinley, James T <
> james.mckinley@cengage.com> wrote:
>
>> Hi Uwe,
>>
>> OK, thanks for the info.  We'll see if we can download the Lucene test
>> suite and check it out.
>>
>> FWIW, we use G1GC in our production runtime (~70 12-16 core Cisco UCS and
>> HP Gen7/Gen8 nodes with 20+ GB heaps using Java 7 and Lucene 4.8.1 with
>> pairs of 30 index partitions with 15M-23M docs each) and have not
>> experienced any VM crashes (well, maybe a couple, but not directly
>> traceable to G1 to my knowledge).  We have found some undocumented pauses
>> in G1 due to very large object arrays and filed a bug report which was
>> confirmed and also affects CMS (we worked around this in our code using
>> memory mapping of some files whose contents we previously held all in
>> RAM).  I think the only index corruption we've ever seen was in our index
>> creation workflow (~30 HP Gen7 nodes with 27GB heaps) but this was using
>> Parallel GC since it is a batch system, so that corruption (which we've not
>> seen recently and never found a cause for) was definitely not due to G1GC.
>>
>> G1GC has bugs as does CMS but we've found it to work pretty well so far in
>> our runtime system.  Of course YMMV, thanks again for the info.
>>
>> Jim
>> ________________________________________
>> From: Uwe Schindler [uwe@thetaphi.de]
>> Sent: Tuesday, January 27, 2015 3:02 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>>
>> Hi.,
>>
>> About G1GC. We consistently see problems when running the Lucene Testsuite
>> with G1GC enabled. The people from Elasticsearch concluded:
>>
>> "There is a newer GC called the Garbage First GC (G1GC). This newer GC is
>> designed to minimize pausing even more than CMS, and operate on large
>> heaps. It works by dividing the heap into regions and predicting which
>> regions contain the most reclaimable space. By collecting those regions
>> first (garbage first), it can minimize pauses and operate on very large
>> heaps.
>>
>> Sounds great! Unfortunately, G1GC is still new, and fresh bugs are found
>> routinely. These bugs are usually of the segfault variety, and will cause
>> hard crashes. The Lucene test suite is brutal on GC algorithms, and it
>> seems that G1GC hasn’t had the kinks worked out yet.
>>
>> We would like to recommend G1GC someday, but for now, it is simply not
>> stable enough to meet the demands of Elasticsearch and Lucene."
>> (
>> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_don_8217_t_touch_these_settings.html
>> )
>>
>> In fact, the problems with G1GC can sometimes lead to index corruption,
>> and are hard to reproduce. So better don't use...
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>> > -----Original Message-----
>> > From: McKinley, James T [mailto:james.mckinley@cengage.com]
>> > Sent: Tuesday, January 27, 2015 8:58 PM
>> > To: java-user@lucene.apache.org
>> > Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>> >
>> > Why do you say not to use G1GC?  We are using Java 7 & G1GC with Lucene
>> > 4.8.1 in production.  Thanks.
>> >
>> > Jim
>> > ________________________________________
>> > From: Uwe Schindler [uwe@thetaphi.de]
>> > Sent: Tuesday, January 27, 2015 2:49 PM
>> > To: java-user@lucene.apache.org; 'kiwi clive'
>> > Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>> >
>> > Java 8 update 20 or later is also fine. At current time, always use
>> latest update
>> > release and you are be fine with Java 7 and Java 8. Don't use older
>> releases
>> > and don't use G1 Garbage Collector.
>> >
>> > -----
>> > Uwe Schindler
>> > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > http://www.thetaphi.de
>> > eMail: uwe@thetaphi.de
>> >
>> >
>> > > -----Original Message-----
>> > > From: kiwi clive [mailto:kiwi_clive@yahoo.com.INVALID]
>> > > Sent: Tuesday, January 27, 2015 8:03 PM
>> > > To: java-user@lucene.apache.org
>> > > Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>> > >
>> > > Hi Hoss,
>> > > Many thanks for the information. This looks very encouraging as the
>> > > Java7 bug I remember  was fixed and as far as I know, we should not be
>> > > affected by the others.
>> > > I'll put a few tests together and put my toe in the water :-) Clive
>> > >
>> > >       From: Chris Hostetter <ho...@fucit.org>
>> > >  To: "java-user@lucene.apache.org" <ja...@lucene.apache.org>; kiwi
>> > > clive <ki...@yahoo.com>
>> > >  Sent: Tuesday, January 27, 2015 4:01 PM
>> > >  Subject: Re: Lucene Version Upgrade (3->4) and Java JVM
>> > > Versions(6->8)
>> > >
>> > >
>> > >
>> > >
>> > > : I seem to remember reading that certain versions of lucene were
>> > > : incompatible with some java versions although I cannot find anything
>> > > to
>> > > : verify this. As we have tens of thousands of large indexes,
>> > > backwards
>> > > : compatibility without the need to reindex on an upgrade is of prime
>> > > : importance to us.
>> > >
>> > > All known JVM bugs affecting Lucene are listed here...
>> > >
>> > > https://wiki.apache.org/lucene-java/JavaBugs
>> > >
>> > >
>> > > -Hoss
>> > > http://www.lucidworks.com/
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > > For additional commands, e-mail: java-user-help@lucene.apache.org
>> > >
>> > >
>> > >
>> > >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

Posted by "McKinley, James T" <ja...@cengage.com>.
Just to be clear in case there was any confusion about my previous message regarding G1GC, we do not use Solr, my team works on a proprietary Lucene-based search engine.  Consequently, I can't really give any advice regarding Solr with G1GC, but for our uses (so far anyway), G1GC seems to work well with Lucene.

Jim
________________________________________
From: Piotr Idzikowski [piotridzikowski@gmail.com]
Sent: Friday, February 06, 2015 5:35 AM
To: java-user@lucene.apache.org
Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

Hello.
A little bit delayed question. But recently I have found this articles:
https://wiki.apache.org/solr/SolrPerformanceProblems
https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

Especially this part from first url:
*Using the ConcurrentMarkSweep (CMS) collector with tuning parameters is a
very good option for for Solr, but with the latest Java 7 releases (7u72 at
the time of this writing), G1 is looking like a better option, if the
-XX:+ParallelRefProcEnabled option is used.*

How does it play with *"Do not, under any circumstances, run Lucene with
the G1 garbage collector."*
from https://wiki.apache.org/lucene-java/JavaBugs?

Regards
Piotr

On Tue, Jan 27, 2015 at 9:55 PM, McKinley, James T <
james.mckinley@cengage.com> wrote:

> Hi Uwe,
>
> OK, thanks for the info.  We'll see if we can download the Lucene test
> suite and check it out.
>
> FWIW, we use G1GC in our production runtime (~70 12-16 core Cisco UCS and
> HP Gen7/Gen8 nodes with 20+ GB heaps using Java 7 and Lucene 4.8.1 with
> pairs of 30 index partitions with 15M-23M docs each) and have not
> experienced any VM crashes (well, maybe a couple, but not directly
> traceable to G1 to my knowledge).  We have found some undocumented pauses
> in G1 due to very large object arrays and filed a bug report which was
> confirmed and also affects CMS (we worked around this in our code using
> memory mapping of some files whose contents we previously held all in
> RAM).  I think the only index corruption we've ever seen was in our index
> creation workflow (~30 HP Gen7 nodes with 27GB heaps) but this was using
> Parallel GC since it is a batch system, so that corruption (which we've not
> seen recently and never found a cause for) was definitely not due to G1GC.
>
> G1GC has bugs as does CMS but we've found it to work pretty well so far in
> our runtime system.  Of course YMMV, thanks again for the info.
>
> Jim
> ________________________________________
> From: Uwe Schindler [uwe@thetaphi.de]
> Sent: Tuesday, January 27, 2015 3:02 PM
> To: java-user@lucene.apache.org
> Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>
> Hi.,
>
> About G1GC. We consistently see problems when running the Lucene Testsuite
> with G1GC enabled. The people from Elasticsearch concluded:
>
> "There is a newer GC called the Garbage First GC (G1GC). This newer GC is
> designed to minimize pausing even more than CMS, and operate on large
> heaps. It works by dividing the heap into regions and predicting which
> regions contain the most reclaimable space. By collecting those regions
> first (garbage first), it can minimize pauses and operate on very large
> heaps.
>
> Sounds great! Unfortunately, G1GC is still new, and fresh bugs are found
> routinely. These bugs are usually of the segfault variety, and will cause
> hard crashes. The Lucene test suite is brutal on GC algorithms, and it
> seems that G1GC hasn’t had the kinks worked out yet.
>
> We would like to recommend G1GC someday, but for now, it is simply not
> stable enough to meet the demands of Elasticsearch and Lucene."
> (
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_don_8217_t_touch_these_settings.html
> )
>
> In fact, the problems with G1GC can sometimes lead to index corruption,
> and are hard to reproduce. So better don't use...
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: McKinley, James T [mailto:james.mckinley@cengage.com]
> > Sent: Tuesday, January 27, 2015 8:58 PM
> > To: java-user@lucene.apache.org
> > Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
> >
> > Why do you say not to use G1GC?  We are using Java 7 & G1GC with Lucene
> > 4.8.1 in production.  Thanks.
> >
> > Jim
> > ________________________________________
> > From: Uwe Schindler [uwe@thetaphi.de]
> > Sent: Tuesday, January 27, 2015 2:49 PM
> > To: java-user@lucene.apache.org; 'kiwi clive'
> > Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
> >
> > Java 8 update 20 or later is also fine. At current time, always use
> latest update
> > release and you are be fine with Java 7 and Java 8. Don't use older
> releases
> > and don't use G1 Garbage Collector.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> > > -----Original Message-----
> > > From: kiwi clive [mailto:kiwi_clive@yahoo.com.INVALID]
> > > Sent: Tuesday, January 27, 2015 8:03 PM
> > > To: java-user@lucene.apache.org
> > > Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
> > >
> > > Hi Hoss,
> > > Many thanks for the information. This looks very encouraging as the
> > > Java7 bug I remember  was fixed and as far as I know, we should not be
> > > affected by the others.
> > > I'll put a few tests together and put my toe in the water :-) Clive
> > >
> > >       From: Chris Hostetter <ho...@fucit.org>
> > >  To: "java-user@lucene.apache.org" <ja...@lucene.apache.org>; kiwi
> > > clive <ki...@yahoo.com>
> > >  Sent: Tuesday, January 27, 2015 4:01 PM
> > >  Subject: Re: Lucene Version Upgrade (3->4) and Java JVM
> > > Versions(6->8)
> > >
> > >
> > >
> > >
> > > : I seem to remember reading that certain versions of lucene were
> > > : incompatible with some java versions although I cannot find anything
> > > to
> > > : verify this. As we have tens of thousands of large indexes,
> > > backwards
> > > : compatibility without the need to reindex on an upgrade is of prime
> > > : importance to us.
> > >
> > > All known JVM bugs affecting Lucene are listed here...
> > >
> > > https://wiki.apache.org/lucene-java/JavaBugs
> > >
> > >
> > > -Hoss
> > > http://www.lucidworks.com/
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> > >
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

Posted by Piotr Idzikowski <pi...@gmail.com>.
Hello.
A little bit delayed question. But recently I have found this articles:
https://wiki.apache.org/solr/SolrPerformanceProblems
https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

Especially this part from first url:
*Using the ConcurrentMarkSweep (CMS) collector with tuning parameters is a
very good option for for Solr, but with the latest Java 7 releases (7u72 at
the time of this writing), G1 is looking like a better option, if the
-XX:+ParallelRefProcEnabled option is used.*

How does it play with *"Do not, under any circumstances, run Lucene with
the G1 garbage collector."*
from https://wiki.apache.org/lucene-java/JavaBugs?

Regards
Piotr

On Tue, Jan 27, 2015 at 9:55 PM, McKinley, James T <
james.mckinley@cengage.com> wrote:

> Hi Uwe,
>
> OK, thanks for the info.  We'll see if we can download the Lucene test
> suite and check it out.
>
> FWIW, we use G1GC in our production runtime (~70 12-16 core Cisco UCS and
> HP Gen7/Gen8 nodes with 20+ GB heaps using Java 7 and Lucene 4.8.1 with
> pairs of 30 index partitions with 15M-23M docs each) and have not
> experienced any VM crashes (well, maybe a couple, but not directly
> traceable to G1 to my knowledge).  We have found some undocumented pauses
> in G1 due to very large object arrays and filed a bug report which was
> confirmed and also affects CMS (we worked around this in our code using
> memory mapping of some files whose contents we previously held all in
> RAM).  I think the only index corruption we've ever seen was in our index
> creation workflow (~30 HP Gen7 nodes with 27GB heaps) but this was using
> Parallel GC since it is a batch system, so that corruption (which we've not
> seen recently and never found a cause for) was definitely not due to G1GC.
>
> G1GC has bugs as does CMS but we've found it to work pretty well so far in
> our runtime system.  Of course YMMV, thanks again for the info.
>
> Jim
> ________________________________________
> From: Uwe Schindler [uwe@thetaphi.de]
> Sent: Tuesday, January 27, 2015 3:02 PM
> To: java-user@lucene.apache.org
> Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>
> Hi.,
>
> About G1GC. We consistently see problems when running the Lucene Testsuite
> with G1GC enabled. The people from Elasticsearch concluded:
>
> "There is a newer GC called the Garbage First GC (G1GC). This newer GC is
> designed to minimize pausing even more than CMS, and operate on large
> heaps. It works by dividing the heap into regions and predicting which
> regions contain the most reclaimable space. By collecting those regions
> first (garbage first), it can minimize pauses and operate on very large
> heaps.
>
> Sounds great! Unfortunately, G1GC is still new, and fresh bugs are found
> routinely. These bugs are usually of the segfault variety, and will cause
> hard crashes. The Lucene test suite is brutal on GC algorithms, and it
> seems that G1GC hasn’t had the kinks worked out yet.
>
> We would like to recommend G1GC someday, but for now, it is simply not
> stable enough to meet the demands of Elasticsearch and Lucene."
> (
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_don_8217_t_touch_these_settings.html
> )
>
> In fact, the problems with G1GC can sometimes lead to index corruption,
> and are hard to reproduce. So better don't use...
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: McKinley, James T [mailto:james.mckinley@cengage.com]
> > Sent: Tuesday, January 27, 2015 8:58 PM
> > To: java-user@lucene.apache.org
> > Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
> >
> > Why do you say not to use G1GC?  We are using Java 7 & G1GC with Lucene
> > 4.8.1 in production.  Thanks.
> >
> > Jim
> > ________________________________________
> > From: Uwe Schindler [uwe@thetaphi.de]
> > Sent: Tuesday, January 27, 2015 2:49 PM
> > To: java-user@lucene.apache.org; 'kiwi clive'
> > Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
> >
> > Java 8 update 20 or later is also fine. At current time, always use
> latest update
> > release and you are be fine with Java 7 and Java 8. Don't use older
> releases
> > and don't use G1 Garbage Collector.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> > > -----Original Message-----
> > > From: kiwi clive [mailto:kiwi_clive@yahoo.com.INVALID]
> > > Sent: Tuesday, January 27, 2015 8:03 PM
> > > To: java-user@lucene.apache.org
> > > Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
> > >
> > > Hi Hoss,
> > > Many thanks for the information. This looks very encouraging as the
> > > Java7 bug I remember  was fixed and as far as I know, we should not be
> > > affected by the others.
> > > I'll put a few tests together and put my toe in the water :-) Clive
> > >
> > >       From: Chris Hostetter <ho...@fucit.org>
> > >  To: "java-user@lucene.apache.org" <ja...@lucene.apache.org>; kiwi
> > > clive <ki...@yahoo.com>
> > >  Sent: Tuesday, January 27, 2015 4:01 PM
> > >  Subject: Re: Lucene Version Upgrade (3->4) and Java JVM
> > > Versions(6->8)
> > >
> > >
> > >
> > >
> > > : I seem to remember reading that certain versions of lucene were
> > > : incompatible with some java versions although I cannot find anything
> > > to
> > > : verify this. As we have tens of thousands of large indexes,
> > > backwards
> > > : compatibility without the need to reindex on an upgrade is of prime
> > > : importance to us.
> > >
> > > All known JVM bugs affecting Lucene are listed here...
> > >
> > > https://wiki.apache.org/lucene-java/JavaBugs
> > >
> > >
> > > -Hoss
> > > http://www.lucidworks.com/
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> > >
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

Posted by "McKinley, James T" <ja...@cengage.com>.
Hi Uwe,

OK, thanks for the info.  We'll see if we can download the Lucene test suite and check it out.  

FWIW, we use G1GC in our production runtime (~70 12-16 core Cisco UCS and HP Gen7/Gen8 nodes with 20+ GB heaps using Java 7 and Lucene 4.8.1 with pairs of 30 index partitions with 15M-23M docs each) and have not experienced any VM crashes (well, maybe a couple, but not directly traceable to G1 to my knowledge).  We have found some undocumented pauses in G1 due to very large object arrays and filed a bug report which was confirmed and also affects CMS (we worked around this in our code using memory mapping of some files whose contents we previously held all in RAM).  I think the only index corruption we've ever seen was in our index creation workflow (~30 HP Gen7 nodes with 27GB heaps) but this was using Parallel GC since it is a batch system, so that corruption (which we've not seen recently and never found a cause for) was definitely not due to G1GC.

G1GC has bugs as does CMS but we've found it to work pretty well so far in our runtime system.  Of course YMMV, thanks again for the info.

Jim
________________________________________
From: Uwe Schindler [uwe@thetaphi.de]
Sent: Tuesday, January 27, 2015 3:02 PM
To: java-user@lucene.apache.org
Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

Hi.,

About G1GC. We consistently see problems when running the Lucene Testsuite with G1GC enabled. The people from Elasticsearch concluded:

"There is a newer GC called the Garbage First GC (G1GC). This newer GC is designed to minimize pausing even more than CMS, and operate on large heaps. It works by dividing the heap into regions and predicting which regions contain the most reclaimable space. By collecting those regions first (garbage first), it can minimize pauses and operate on very large heaps.

Sounds great! Unfortunately, G1GC is still new, and fresh bugs are found routinely. These bugs are usually of the segfault variety, and will cause hard crashes. The Lucene test suite is brutal on GC algorithms, and it seems that G1GC hasn’t had the kinks worked out yet.

We would like to recommend G1GC someday, but for now, it is simply not stable enough to meet the demands of Elasticsearch and Lucene."
(http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_don_8217_t_touch_these_settings.html)

In fact, the problems with G1GC can sometimes lead to index corruption, and are hard to reproduce. So better don't use...

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: McKinley, James T [mailto:james.mckinley@cengage.com]
> Sent: Tuesday, January 27, 2015 8:58 PM
> To: java-user@lucene.apache.org
> Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>
> Why do you say not to use G1GC?  We are using Java 7 & G1GC with Lucene
> 4.8.1 in production.  Thanks.
>
> Jim
> ________________________________________
> From: Uwe Schindler [uwe@thetaphi.de]
> Sent: Tuesday, January 27, 2015 2:49 PM
> To: java-user@lucene.apache.org; 'kiwi clive'
> Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>
> Java 8 update 20 or later is also fine. At current time, always use latest update
> release and you are be fine with Java 7 and Java 8. Don't use older releases
> and don't use G1 Garbage Collector.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: kiwi clive [mailto:kiwi_clive@yahoo.com.INVALID]
> > Sent: Tuesday, January 27, 2015 8:03 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
> >
> > Hi Hoss,
> > Many thanks for the information. This looks very encouraging as the
> > Java7 bug I remember  was fixed and as far as I know, we should not be
> > affected by the others.
> > I'll put a few tests together and put my toe in the water :-) Clive
> >
> >       From: Chris Hostetter <ho...@fucit.org>
> >  To: "java-user@lucene.apache.org" <ja...@lucene.apache.org>; kiwi
> > clive <ki...@yahoo.com>
> >  Sent: Tuesday, January 27, 2015 4:01 PM
> >  Subject: Re: Lucene Version Upgrade (3->4) and Java JVM
> > Versions(6->8)
> >
> >
> >
> >
> > : I seem to remember reading that certain versions of lucene were
> > : incompatible with some java versions although I cannot find anything
> > to
> > : verify this. As we have tens of thousands of large indexes,
> > backwards
> > : compatibility without the need to reindex on an upgrade is of prime
> > : importance to us.
> >
> > All known JVM bugs affecting Lucene are listed here...
> >
> > https://wiki.apache.org/lucene-java/JavaBugs
> >
> >
> > -Hoss
> > http://www.lucidworks.com/
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi.,

About G1GC. We consistently see problems when running the Lucene Testsuite with G1GC enabled. The people from Elasticsearch concluded:

"There is a newer GC called the Garbage First GC (G1GC). This newer GC is designed to minimize pausing even more than CMS, and operate on large heaps. It works by dividing the heap into regions and predicting which regions contain the most reclaimable space. By collecting those regions first (garbage first), it can minimize pauses and operate on very large heaps.

Sounds great! Unfortunately, G1GC is still new, and fresh bugs are found routinely. These bugs are usually of the segfault variety, and will cause hard crashes. The Lucene test suite is brutal on GC algorithms, and it seems that G1GC hasn’t had the kinks worked out yet.

We would like to recommend G1GC someday, but for now, it is simply not stable enough to meet the demands of Elasticsearch and Lucene."
(http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_don_8217_t_touch_these_settings.html)

In fact, the problems with G1GC can sometimes lead to index corruption, and are hard to reproduce. So better don't use...

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: McKinley, James T [mailto:james.mckinley@cengage.com]
> Sent: Tuesday, January 27, 2015 8:58 PM
> To: java-user@lucene.apache.org
> Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
> 
> Why do you say not to use G1GC?  We are using Java 7 & G1GC with Lucene
> 4.8.1 in production.  Thanks.
> 
> Jim
> ________________________________________
> From: Uwe Schindler [uwe@thetaphi.de]
> Sent: Tuesday, January 27, 2015 2:49 PM
> To: java-user@lucene.apache.org; 'kiwi clive'
> Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
> 
> Java 8 update 20 or later is also fine. At current time, always use latest update
> release and you are be fine with Java 7 and Java 8. Don't use older releases
> and don't use G1 Garbage Collector.
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> 
> > -----Original Message-----
> > From: kiwi clive [mailto:kiwi_clive@yahoo.com.INVALID]
> > Sent: Tuesday, January 27, 2015 8:03 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
> >
> > Hi Hoss,
> > Many thanks for the information. This looks very encouraging as the
> > Java7 bug I remember  was fixed and as far as I know, we should not be
> > affected by the others.
> > I'll put a few tests together and put my toe in the water :-) Clive
> >
> >       From: Chris Hostetter <ho...@fucit.org>
> >  To: "java-user@lucene.apache.org" <ja...@lucene.apache.org>; kiwi
> > clive <ki...@yahoo.com>
> >  Sent: Tuesday, January 27, 2015 4:01 PM
> >  Subject: Re: Lucene Version Upgrade (3->4) and Java JVM
> > Versions(6->8)
> >
> >
> >
> >
> > : I seem to remember reading that certain versions of lucene were
> > : incompatible with some java versions although I cannot find anything
> > to
> > : verify this. As we have tens of thousands of large indexes,
> > backwards
> > : compatibility without the need to reindex on an upgrade is of prime
> > : importance to us.
> >
> > All known JVM bugs affecting Lucene are listed here...
> >
> > https://wiki.apache.org/lucene-java/JavaBugs
> >
> >
> > -Hoss
> > http://www.lucidworks.com/
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

Posted by "McKinley, James T" <ja...@cengage.com>.
Why do you say not to use G1GC?  We are using Java 7 & G1GC with Lucene 4.8.1 in production.  Thanks.

Jim
________________________________________
From: Uwe Schindler [uwe@thetaphi.de]
Sent: Tuesday, January 27, 2015 2:49 PM
To: java-user@lucene.apache.org; 'kiwi clive'
Subject: RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

Java 8 update 20 or later is also fine. At current time, always use latest update release and you are be fine with Java 7 and Java 8. Don't use older releases and don't use G1 Garbage Collector.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: kiwi clive [mailto:kiwi_clive@yahoo.com.INVALID]
> Sent: Tuesday, January 27, 2015 8:03 PM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>
> Hi Hoss,
> Many thanks for the information. This looks very encouraging as the Java7
> bug I remember  was fixed and as far as I know, we should not be affected
> by the others.
> I'll put a few tests together and put my toe in the water :-) Clive
>
>       From: Chris Hostetter <ho...@fucit.org>
>  To: "java-user@lucene.apache.org" <ja...@lucene.apache.org>; kiwi
> clive <ki...@yahoo.com>
>  Sent: Tuesday, January 27, 2015 4:01 PM
>  Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
>
>
>
>
> : I seem to remember reading that certain versions of lucene were
> : incompatible with some java versions although I cannot find anything to
> : verify this. As we have tens of thousands of large indexes, backwards
> : compatibility without the need to reindex on an upgrade is of prime
> : importance to us.
>
> All known JVM bugs affecting Lucene are listed here...
>
> https://wiki.apache.org/lucene-java/JavaBugs
>
>
> -Hoss
> http://www.lucidworks.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

Posted by Uwe Schindler <uw...@thetaphi.de>.
Java 8 update 20 or later is also fine. At current time, always use latest update release and you are be fine with Java 7 and Java 8. Don't use older releases and don't use G1 Garbage Collector.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: kiwi clive [mailto:kiwi_clive@yahoo.com.INVALID]
> Sent: Tuesday, January 27, 2015 8:03 PM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
> 
> Hi Hoss,
> Many thanks for the information. This looks very encouraging as the Java7
> bug I remember  was fixed and as far as I know, we should not be affected
> by the others.
> I'll put a few tests together and put my toe in the water :-) Clive
> 
>       From: Chris Hostetter <ho...@fucit.org>
>  To: "java-user@lucene.apache.org" <ja...@lucene.apache.org>; kiwi
> clive <ki...@yahoo.com>
>  Sent: Tuesday, January 27, 2015 4:01 PM
>  Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
> 
> 
> 
> 
> : I seem to remember reading that certain versions of lucene were
> : incompatible with some java versions although I cannot find anything to
> : verify this. As we have tens of thousands of large indexes, backwards
> : compatibility without the need to reindex on an upgrade is of prime
> : importance to us.
> 
> All known JVM bugs affecting Lucene are listed here...
> 
> https://wiki.apache.org/lucene-java/JavaBugs
> 
> 
> -Hoss
> http://www.lucidworks.com/
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

Posted by kiwi clive <ki...@yahoo.com.INVALID>.
Hi Hoss,
Many thanks for the information. This looks very encouraging as the Java7 bug I remember  was fixed and as far as I know, we should not be affected by the others.
I'll put a few tests together and put my toe in the water :-)
Clive

      From: Chris Hostetter <ho...@fucit.org>
 To: "java-user@lucene.apache.org" <ja...@lucene.apache.org>; kiwi clive <ki...@yahoo.com> 
 Sent: Tuesday, January 27, 2015 4:01 PM
 Subject: Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)
   



: I seem to remember reading that certain versions of lucene were 
: incompatible with some java versions although I cannot find anything to 
: verify this. As we have tens of thousands of large indexes, backwards 
: compatibility without the need to reindex on an upgrade is of prime 
: importance to us.

All known JVM bugs affecting Lucene are listed here...

https://wiki.apache.org/lucene-java/JavaBugs


-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



  

Re: Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

Posted by Chris Hostetter <ho...@fucit.org>.
: I seem to remember reading that certain versions of lucene were 
: incompatible with some java versions although I cannot find anything to 
: verify this. As we have tens of thousands of large indexes, backwards 
: compatibility without the need to reindex on an upgrade is of prime 
: importance to us.

All known JVM bugs affecting Lucene are listed here...

https://wiki.apache.org/lucene-java/JavaBugs


-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Lucene Version Upgrade (3->4) and Java JVM Versions(6->8)

Posted by kiwi clive <ki...@yahoo.com.INVALID>.
Hello guys,
We currently run with Lucene 3.6 and Java6. In view of the fact that Java7 is soon to be deprecated, we are keen to move to Java8 and also to move to the latest version of Lucene. I understand Lucene 5 is coming although we are happy to move to 4.x as there are lots of goodies there we can use.
I seem to remember reading that certain versions of lucene were incompatible with some java versions although I cannot find anything to verify this. As we have tens of thousands of large indexes, backwards compatibility without the need to reindex on an upgrade is of prime importance to us.
Does anyone have any words of wisdom, or better still, pointers to some documentation that would be of use here? I can obviously run some tests but incompatibilities can be insidious and it would be good to know from the outset if there are any gotchas before embarking along this road.
In an ideal world we would have Java8 + lucene4.x reading a lucene3.6 index (that was created with Java6).
Then we would write to the lucen3.6 index using java8 and lucene4.x.
Any suggestions would be most welcome!
Many thanks,Clive
   

Re: Absolute term position in scoring

Posted by Michael McCandless <lu...@mikemccandless.com>.
A custom query could improve on the situation by not pulling multiple
docs/positions enum for a single term.  E.g. the patch on
https://issues.apache.org/jira/browse/LUCENE-5288 (which never got
committed: too controversial) has such a query, letting you customize
how positions are scored for boolean term query matches.  Maybe you
could start from it and see how performance compares vs the
SpanFirstQuery approach...

Mike McCandless

http://blog.mikemccandless.com


On Mon, Jan 26, 2015 at 6:14 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> Hi,
>
> it depends on the query structure. In fact, SpanFirstQuery is slow (all span queries are slow because of position use, this may improve in the future).
>
> You question was about using multiple fields - in fact querying for the same terms on multiple fields and/or different query types: This is the standard approach to tune the relevance! But it always has a cost. In most cases you will not see a large difference (unless you use phrase or span queries). A very good explanation what can be done using this is described in the Elasticsearch Guide: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/multi-field-search.html
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Alexey Morozov [mailto:morozov@gmail.com]
>> Sent: Monday, January 26, 2015 11:49 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: Absolute term position in scoring
>>
>> Hello!
>>
>> I'd like to ask if this approach: construct a complex query consisting of a
>> boosted "specialized" part and an "ordinary" part with no boost, - doesn't
>> [necessarily] cause a significant performance degradation compared to a
>> "custom query", specialized for a particular need.
>>
>> Thanks in advance,
>> Alexey Morozov
>>
>> 26.01.2015 14:57, Michael McCandless пишет:
>> > Well you could have ordinary term queries, and then a SHOULD
>> > SpanFirstQuery clause with a boost, to give higher scores to those
>> > docs that also had the
>> > term(s) close to the start of the document.
>> >
>> >
>> > Mike McCandless
>> >
>> > http://blog.mikemccandless.com
>> >
>> > On Sun, Jan 25, 2015 at 5:44 PM, Luis A Lastras <la...@us.ibm.com>
>> wrote:
>> >
>> >> Thanks I didn't know about SpanFirstQuery. I can likely get something
>> >> going with that. I was still hoping that we could affect the scoring
>> >> formula with the position itself, but maybe this is not feasible.
>> >>
>> >> Luis
>> >>
>> >>
>> >>
>> >>    ------------------------------
>> >>
>> >>
>> >>
>> >> *Luis A Lastras, Ph.D. Research Staff Member & Manager, Concept
>> Analytics,
>> >>    IBM Watson*
>> >>    *Member of the iBM Academy of Technology*
>> >>
>> >> *IBM Master Inventor email: **lastrasl@us.ibm.com*
>> >> <em...@region.ibm.com>
>> >> * | Tel: 914-945-3613 <914-945-3613> | Cell: 914-382-1879 <914-382-1879>
>> >>    address:  1101 Kitchawan Rd, Office 28-132, Yorktown Heights, NY,
>> >> 10598*
>> >>
>> >>
>> >>
>> >>
>> >>    <http://www.facebook.com/ibmwatson>
>> >>
>> >>
>> >>    ------------------------------
>> >>
>> >>
>> >>
>> >> [image: Inactive hide details for Michael McCandless ---01/25/2015
>> >> 08:12:18 AM---Maybe SpanFirstQuery? Mike McCandless]Michael
>> >> McCandless
>> >> ---01/25/2015 08:12:18 AM---Maybe SpanFirstQuery? Mike McCandless
>> >>
>> >> From: Michael McCandless <lu...@mikemccandless.com>
>> >> To: Lucene Users <ja...@lucene.apache.org>
>> >> Date: 01/25/2015 08:12 AM
>> >> Subject: Re: Absolute term position in scoring
>> >> ------------------------------
>> >>
>> >>
>> >>
>> >> Maybe SpanFirstQuery?
>> >>
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >> On Sat, Jan 24, 2015 at 9:34 PM, Luis A Lastras <la...@us.ibm.com>
>> >> wrote:
>> >>
>> >>> Is it possible to incorporate in Lucene's scoring function the
>> >>> position
>> >> of
>> >>> a matching term (say as measured from the top of the document). The
>> >>> scenario is, if the set of documents tend to lk about the most
>> >>> important stuff at the beginning of the document, then we would like
>> >>> to give preference to documents that mention a term close to the top.
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Luis
>> >>>
>> >>>
>> >>>
>> >>>    ------------------------------
>> >>>
>> >>>
>> >>>
>> >>> *Luis A Lastras, Ph.D. Research Staff Member & Manager, Concept
>> >> Analytics,
>> >>>    IBM Watson*
>> >>>    *Member of the iBM Academy of Technology*
>> >>>
>> >>> *IBM Master Inventor email: **lastrasl@us.ibm.com*
>> >>> <email@region.ibm.com
>> >>>
>> >>> * | Tel: 914-945-3613 <914-945-3613> | Cell: 914-382-1879 <914-382-
>> 1879>
>> >>>    address:  1101 Kitchawan Rd, Office 28-132, Yorktown Heights, NY,
>> >> 10598*
>> >>>
>> >>>
>> >>>
>> >>>    <http://www.facebook.com/ibmwatson>
>> >>>
>> >>>
>> >>>    ------------------------------
>> >>>
>> >>>
>> >>>
>> >>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Absolute term position in scoring

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

it depends on the query structure. In fact, SpanFirstQuery is slow (all span queries are slow because of position use, this may improve in the future).

You question was about using multiple fields - in fact querying for the same terms on multiple fields and/or different query types: This is the standard approach to tune the relevance! But it always has a cost. In most cases you will not see a large difference (unless you use phrase or span queries). A very good explanation what can be done using this is described in the Elasticsearch Guide: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/multi-field-search.html

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Alexey Morozov [mailto:morozov@gmail.com]
> Sent: Monday, January 26, 2015 11:49 AM
> To: java-user@lucene.apache.org
> Subject: Re: Absolute term position in scoring
> 
> Hello!
> 
> I'd like to ask if this approach: construct a complex query consisting of a
> boosted "specialized" part and an "ordinary" part with no boost, - doesn't
> [necessarily] cause a significant performance degradation compared to a
> "custom query", specialized for a particular need.
> 
> Thanks in advance,
> Alexey Morozov
> 
> 26.01.2015 14:57, Michael McCandless пишет:
> > Well you could have ordinary term queries, and then a SHOULD
> > SpanFirstQuery clause with a boost, to give higher scores to those
> > docs that also had the
> > term(s) close to the start of the document.
> >
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> > On Sun, Jan 25, 2015 at 5:44 PM, Luis A Lastras <la...@us.ibm.com>
> wrote:
> >
> >> Thanks I didn't know about SpanFirstQuery. I can likely get something
> >> going with that. I was still hoping that we could affect the scoring
> >> formula with the position itself, but maybe this is not feasible.
> >>
> >> Luis
> >>
> >>
> >>
> >>    ------------------------------
> >>
> >>
> >>
> >> *Luis A Lastras, Ph.D. Research Staff Member & Manager, Concept
> Analytics,
> >>    IBM Watson*
> >>    *Member of the iBM Academy of Technology*
> >>
> >> *IBM Master Inventor email: **lastrasl@us.ibm.com*
> >> <em...@region.ibm.com>
> >> * | Tel: 914-945-3613 <914-945-3613> | Cell: 914-382-1879 <914-382-1879>
> >>    address:  1101 Kitchawan Rd, Office 28-132, Yorktown Heights, NY,
> >> 10598*
> >>
> >>
> >>
> >>
> >>    <http://www.facebook.com/ibmwatson>
> >>
> >>
> >>    ------------------------------
> >>
> >>
> >>
> >> [image: Inactive hide details for Michael McCandless ---01/25/2015
> >> 08:12:18 AM---Maybe SpanFirstQuery? Mike McCandless]Michael
> >> McCandless
> >> ---01/25/2015 08:12:18 AM---Maybe SpanFirstQuery? Mike McCandless
> >>
> >> From: Michael McCandless <lu...@mikemccandless.com>
> >> To: Lucene Users <ja...@lucene.apache.org>
> >> Date: 01/25/2015 08:12 AM
> >> Subject: Re: Absolute term position in scoring
> >> ------------------------------
> >>
> >>
> >>
> >> Maybe SpanFirstQuery?
> >>
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >> On Sat, Jan 24, 2015 at 9:34 PM, Luis A Lastras <la...@us.ibm.com>
> >> wrote:
> >>
> >>> Is it possible to incorporate in Lucene's scoring function the
> >>> position
> >> of
> >>> a matching term (say as measured from the top of the document). The
> >>> scenario is, if the set of documents tend to lk about the most
> >>> important stuff at the beginning of the document, then we would like
> >>> to give preference to documents that mention a term close to the top.
> >>>
> >>> Thanks,
> >>>
> >>> Luis
> >>>
> >>>
> >>>
> >>>    ------------------------------
> >>>
> >>>
> >>>
> >>> *Luis A Lastras, Ph.D. Research Staff Member & Manager, Concept
> >> Analytics,
> >>>    IBM Watson*
> >>>    *Member of the iBM Academy of Technology*
> >>>
> >>> *IBM Master Inventor email: **lastrasl@us.ibm.com*
> >>> <email@region.ibm.com
> >>>
> >>> * | Tel: 914-945-3613 <914-945-3613> | Cell: 914-382-1879 <914-382-
> 1879>
> >>>    address:  1101 Kitchawan Rd, Office 28-132, Yorktown Heights, NY,
> >> 10598*
> >>>
> >>>
> >>>
> >>>    <http://www.facebook.com/ibmwatson>
> >>>
> >>>
> >>>    ------------------------------
> >>>
> >>>
> >>>
> >>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Absolute term position in scoring

Posted by Alexey Morozov <mo...@gmail.com>.
Hello!

I'd like to ask if this approach: construct a complex query consisting
of a boosted "specialized" part and an "ordinary" part with no boost, -
doesn't [necessarily] cause a significant performance degradation
compared to a "custom query", specialized for a particular need.

Thanks in advance,
Alexey Morozov

26.01.2015 14:57, Michael McCandless пишет:
> Well you could have ordinary term queries, and then a SHOULD SpanFirstQuery
> clause with a boost, to give higher scores to those docs that also had the
> term(s) close to the start of the document.
>
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Sun, Jan 25, 2015 at 5:44 PM, Luis A Lastras <la...@us.ibm.com> wrote:
>
>> Thanks I didn't know about SpanFirstQuery. I can likely get something
>> going with that. I was still hoping that we could affect the scoring
>> formula with the position itself, but maybe this is not feasible.
>>
>> Luis
>>
>>
>>
>>    ------------------------------
>>
>>
>>
>> *Luis A Lastras, Ph.D. Research Staff Member & Manager, Concept Analytics,
>>    IBM Watson*
>>    *Member of the iBM Academy of Technology*
>>
>> *IBM Master Inventor email: **lastrasl@us.ibm.com* <em...@region.ibm.com>
>> * | Tel: 914-945-3613 <914-945-3613> | Cell: 914-382-1879 <914-382-1879>
>>    address:  1101 Kitchawan Rd, Office 28-132, Yorktown Heights, NY, 10598*
>>
>>
>>
>>
>>    <http://www.facebook.com/ibmwatson>
>>
>>
>>    ------------------------------
>>
>>
>>
>> [image: Inactive hide details for Michael McCandless ---01/25/2015
>> 08:12:18 AM---Maybe SpanFirstQuery? Mike McCandless]Michael McCandless
>> ---01/25/2015 08:12:18 AM---Maybe SpanFirstQuery? Mike McCandless
>>
>> From: Michael McCandless <lu...@mikemccandless.com>
>> To: Lucene Users <ja...@lucene.apache.org>
>> Date: 01/25/2015 08:12 AM
>> Subject: Re: Absolute term position in scoring
>> ------------------------------
>>
>>
>>
>> Maybe SpanFirstQuery?
>>
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Sat, Jan 24, 2015 at 9:34 PM, Luis A Lastras <la...@us.ibm.com>
>> wrote:
>>
>>> Is it possible to incorporate in Lucene's scoring function the position
>> of
>>> a matching term (say as measured from the top of the document). The
>>> scenario is, if the set of documents tend to lk about the most important
>>> stuff at the beginning of the document, then we would like to give
>>> preference to documents that mention a term close to the top.
>>>
>>> Thanks,
>>>
>>> Luis
>>>
>>>
>>>
>>>    ------------------------------
>>>
>>>
>>>
>>> *Luis A Lastras, Ph.D. Research Staff Member & Manager, Concept
>> Analytics,
>>>    IBM Watson*
>>>    *Member of the iBM Academy of Technology*
>>>
>>> *IBM Master Inventor email: **lastrasl@us.ibm.com* <email@region.ibm.com
>>>
>>> * | Tel: 914-945-3613 <914-945-3613> | Cell: 914-382-1879 <914-382-1879>
>>>    address:  1101 Kitchawan Rd, Office 28-132, Yorktown Heights, NY,
>> 10598*
>>>
>>>
>>>
>>>    <http://www.facebook.com/ibmwatson>
>>>
>>>
>>>    ------------------------------
>>>
>>>
>>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Absolute term position in scoring

Posted by Michael McCandless <lu...@mikemccandless.com>.
Well you could have ordinary term queries, and then a SHOULD SpanFirstQuery
clause with a boost, to give higher scores to those docs that also had the
term(s) close to the start of the document.


Mike McCandless

http://blog.mikemccandless.com

On Sun, Jan 25, 2015 at 5:44 PM, Luis A Lastras <la...@us.ibm.com> wrote:

> Thanks I didn't know about SpanFirstQuery. I can likely get something
> going with that. I was still hoping that we could affect the scoring
> formula with the position itself, but maybe this is not feasible.
>
> Luis
>
>
>
>    ------------------------------
>
>
>
> *Luis A Lastras, Ph.D. Research Staff Member & Manager, Concept Analytics,
>    IBM Watson*
>    *Member of the iBM Academy of Technology*
>
> *IBM Master Inventor email: **lastrasl@us.ibm.com* <em...@region.ibm.com>
> * | Tel: 914-945-3613 <914-945-3613> | Cell: 914-382-1879 <914-382-1879>
>    address:  1101 Kitchawan Rd, Office 28-132, Yorktown Heights, NY, 10598*
>
>
>
>
>    <http://www.facebook.com/ibmwatson>
>
>
>    ------------------------------
>
>
>
> [image: Inactive hide details for Michael McCandless ---01/25/2015
> 08:12:18 AM---Maybe SpanFirstQuery? Mike McCandless]Michael McCandless
> ---01/25/2015 08:12:18 AM---Maybe SpanFirstQuery? Mike McCandless
>
> From: Michael McCandless <lu...@mikemccandless.com>
> To: Lucene Users <ja...@lucene.apache.org>
> Date: 01/25/2015 08:12 AM
> Subject: Re: Absolute term position in scoring
> ------------------------------
>
>
>
> Maybe SpanFirstQuery?
>
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Sat, Jan 24, 2015 at 9:34 PM, Luis A Lastras <la...@us.ibm.com>
> wrote:
>
> > Is it possible to incorporate in Lucene's scoring function the position
> of
> > a matching term (say as measured from the top of the document). The
> > scenario is, if the set of documents tend to lk about the most important
> > stuff at the beginning of the document, then we would like to give
> > preference to documents that mention a term close to the top.
> >
> > Thanks,
> >
> > Luis
> >
> >
> >
> >    ------------------------------
> >
> >
> >
> > *Luis A Lastras, Ph.D. Research Staff Member & Manager, Concept
> Analytics,
> >    IBM Watson*
> >    *Member of the iBM Academy of Technology*
> >
> > *IBM Master Inventor email: **lastrasl@us.ibm.com* <email@region.ibm.com
> >
> > * | Tel: 914-945-3613 <914-945-3613> | Cell: 914-382-1879 <914-382-1879>
> >    address:  1101 Kitchawan Rd, Office 28-132, Yorktown Heights, NY,
> 10598*
> >
> >
> >
> >
> >    <http://www.facebook.com/ibmwatson>
> >
> >
> >    ------------------------------
> >
> >
> >
>
>

Re: Absolute term position in scoring

Posted by Luis A Lastras <la...@us.ibm.com>.
Thanks I didn't know about SpanFirstQuery. I can likely get something going
with that. I was still hoping that we could affect the scoring formula with
the position itself, but maybe this is not feasible.

Luis

                                                                               
                                                                               
                                                                               
  Luis A Lastras, Ph.D.                                                        
  Research Staff Member                                                        
  & Manager, Concept                                                           
  Analytics, IBM Watson                                                        
  Member of the iBM                                                            
  Academy of Technology                                                        
  IBM Master Inventor                                                          
  email:                                                                       
  lastrasl@us.ibm.com |                                                        
  Tel: 914-945-3613 |                                                          
  Cell: 914-382-1879                                                           
  address:  1101                                                               
  Kitchawan Rd, Office                                                         
  28-132, Yorktown                                                             
  Heights, NY, 10598                                                           
                                                                               
                                                                               
                                                                               
                                                                               
                                                                               






From:	Michael McCandless <lu...@mikemccandless.com>
To:	Lucene Users <ja...@lucene.apache.org>
Date:	01/25/2015 08:12 AM
Subject:	Re: Absolute term position in scoring



Maybe SpanFirstQuery?


Mike McCandless

http://blog.mikemccandless.com

On Sat, Jan 24, 2015 at 9:34 PM, Luis A Lastras <la...@us.ibm.com>
wrote:

> Is it possible to incorporate in Lucene's scoring function the position
of
> a matching term (say as measured from the top of the document). The
> scenario is, if the set of documents tend to lk about the most important
> stuff at the beginning of the document, then we would like to give
> preference to documents that mention a term close to the top.
>
> Thanks,
>
> Luis
>
>
>
>    ------------------------------
>
>
>
> *Luis A Lastras, Ph.D. Research Staff Member & Manager, Concept
Analytics,
>    IBM Watson*
>    *Member of the iBM Academy of Technology*
>
> *IBM Master Inventor email: **lastrasl@us.ibm.com* <em...@region.ibm.com>
> * | Tel: 914-945-3613 <914-945-3613> | Cell: 914-382-1879 <914-382-1879>
>    address:  1101 Kitchawan Rd, Office 28-132, Yorktown Heights, NY,
10598*
>
>
>
>
>    <http://www.facebook.com/ibmwatson>
>
>
>    ------------------------------
>
>
>

Re: Absolute term position in scoring

Posted by Michael McCandless <lu...@mikemccandless.com>.
Maybe SpanFirstQuery?


Mike McCandless

http://blog.mikemccandless.com

On Sat, Jan 24, 2015 at 9:34 PM, Luis A Lastras <la...@us.ibm.com> wrote:

> Is it possible to incorporate in Lucene's scoring function the position of
> a matching term (say as measured from the top of the document). The
> scenario is, if the set of documents tend to lk about the most important
> stuff at the beginning of the document, then we would like to give
> preference to documents that mention a term close to the top.
>
> Thanks,
>
> Luis
>
>
>
>    ------------------------------
>
>
>
> *Luis A Lastras, Ph.D. Research Staff Member & Manager, Concept Analytics,
>    IBM Watson*
>    *Member of the iBM Academy of Technology*
>
> *IBM Master Inventor email: **lastrasl@us.ibm.com* <em...@region.ibm.com>
> * | Tel: 914-945-3613 <914-945-3613> | Cell: 914-382-1879 <914-382-1879>
>    address:  1101 Kitchawan Rd, Office 28-132, Yorktown Heights, NY, 10598*
>
>
>
>
>    <http://www.facebook.com/ibmwatson>
>
>
>    ------------------------------
>
>
>

Re: Can we configure analyzers to not exclude specific characters

Posted by Michael Sokolov <ms...@safaribooksonline.com>.
To do that you need to create multiple filters

-Mike

On 01/29/2015 03:36 PM, Shivashankar Maddanimath wrote:
> Thanks Michael,
>
> I am using lucene library so below I how I used your suggestion. Its works but if I need to add multiple patterns and replacements then its not working. It picks the last entry. Is there any way we can add multiple patterns and replacements to PatternReplaceCharFilterFactory?
>
> TokenStream ts;
>          Map ruleExplained = new HashMap();
>          ruleExplained.put("pattern", "([cC])\\+\\+");
>          ruleExplained.put("replacement", "CPlusPlus");
>          PatternReplaceCharFilterFactory myRules = new PatternReplaceCharFilterFactory(ruleExplained);
>          Reader myreader = myRules.create(new BufferedReader(new InputStreamReader(new                            FileInputStream(TestFile),StandardCharsets.UTF_8)));
>          ts = new UAX29URLEmailTokenizer(Version.LUCENE_48,myreader);
>
>
> Regards,
> Shiv
>
> -----Original Message-----
> From: "Michael Sokolov" <ms...@safaribooksonline.com>
> Sent: ‎29-‎01-‎2015 01:32 AM
> To: "java-user@lucene.apache.org" <ja...@lucene.apache.org>
> Subject: Re: Can we  configure  analyzers to not exclude specific characters
>
> It's a bit of a hack, but we do this:
>
>           <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="([A-Za-z])\+\+" replacement="$1plusplus" />
>           <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="([A-Za-z])\#" replacement="$1sharp" />
>
>
> On 1/28/2015 2:00 AM, Shivashankar Maddanimath wrote:
>> Hi,
>>
>> I am using  Lucene standard and uax29urlemailtokenizer. These analysers are excluding some characters like "+" ( I can't search C++). Is there any way we can  configure analyzers to include specific characters in analyzers while tokenising?
>>
>> Regards,
>> Shiv
>>
>> -----Original Message-----
>> From: "Luis A Lastras" <la...@us.ibm.com>
>> Sent: ‎25-‎01-‎2015 08:05 AM
>> To: "java-user@lucene.apache.org" <ja...@lucene.apache.org>
>> Subject: Absolute term position in scoring
>>
>> Is it possible to incorporate in Lucene's scoring function the position of a matching term (say as measured from the top of the document). The scenario is, if the set of documents tend to lk about the most important stuff at the beginning of the document, then we would like to give preference to documents that mention a term close to the top.
>>
>> Thanks,
>>
>> Luis
>>
>>
>>
>>
>>
>> Luis A Lastras, Ph.D.
>> Research Staff Member & Manager, Concept Analytics, IBM Watson
>> Member of the iBM Academy of Technology
>> IBM Master Inventor
>> email: lastrasl@us.ibm.com | Tel: 914-945-3613 | Cell: 914-382-1879
>> address:  1101 Kitchawan Rd, Office 28-132, Yorktown Heights, NY, 10598
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Can we configure analyzers to not exclude specific characters

Posted by Shivashankar Maddanimath <sh...@yahoo.in>.
Thanks Michael,

I am using lucene library so below I how I used your suggestion. Its works but if I need to add multiple patterns and replacements then its not working. It picks the last entry. Is there any way we can add multiple patterns and replacements to PatternReplaceCharFilterFactory?

TokenStream ts;
        Map ruleExplained = new HashMap();
        ruleExplained.put("pattern", "([cC])\\+\\+");
        ruleExplained.put("replacement", "CPlusPlus");
        PatternReplaceCharFilterFactory myRules = new PatternReplaceCharFilterFactory(ruleExplained); 
        Reader myreader = myRules.create(new BufferedReader(new InputStreamReader(new                            FileInputStream(TestFile),StandardCharsets.UTF_8)));
        ts = new UAX29URLEmailTokenizer(Version.LUCENE_48,myreader);


Regards,
Shiv

-----Original Message-----
From: "Michael Sokolov" <ms...@safaribooksonline.com>
Sent: ‎29-‎01-‎2015 01:32 AM
To: "java-user@lucene.apache.org" <ja...@lucene.apache.org>
Subject: Re: Can we  configure  analyzers to not exclude specific characters

It's a bit of a hack, but we do this:

         <charFilter class="solr.PatternReplaceCharFilterFactory" 
pattern="([A-Za-z])\+\+" replacement="$1plusplus" />
         <charFilter class="solr.PatternReplaceCharFilterFactory" 
pattern="([A-Za-z])\#" replacement="$1sharp" />


On 1/28/2015 2:00 AM, Shivashankar Maddanimath wrote:
> Hi,
>
> I am using  Lucene standard and uax29urlemailtokenizer. These analysers are excluding some characters like "+" ( I can't search C++). Is there any way we can  configure analyzers to include specific characters in analyzers while tokenising?
>
> Regards,
> Shiv
>
> -----Original Message-----
> From: "Luis A Lastras" <la...@us.ibm.com>
> Sent: ‎25-‎01-‎2015 08:05 AM
> To: "java-user@lucene.apache.org" <ja...@lucene.apache.org>
> Subject: Absolute term position in scoring
>
> Is it possible to incorporate in Lucene's scoring function the position of a matching term (say as measured from the top of the document). The scenario is, if the set of documents tend to lk about the most important stuff at the beginning of the document, then we would like to give preference to documents that mention a term close to the top.
>
> Thanks,
>
> Luis
>
>
>
>
>
> Luis A Lastras, Ph.D.
> Research Staff Member & Manager, Concept Analytics, IBM Watson
> Member of the iBM Academy of Technology
> IBM Master Inventor
> email: lastrasl@us.ibm.com | Tel: 914-945-3613 | Cell: 914-382-1879
> address:  1101 Kitchawan Rd, Office 28-132, Yorktown Heights, NY, 10598


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Can we configure analyzers to not exclude specific characters

Posted by Michael Sokolov <ms...@safaribooksonline.com>.
It's a bit of a hack, but we do this:

         <charFilter class="solr.PatternReplaceCharFilterFactory" 
pattern="([A-Za-z])\+\+" replacement="$1plusplus" />
         <charFilter class="solr.PatternReplaceCharFilterFactory" 
pattern="([A-Za-z])\#" replacement="$1sharp" />


On 1/28/2015 2:00 AM, Shivashankar Maddanimath wrote:
> Hi,
>
> I am using  Lucene standard and uax29urlemailtokenizer. These analysers are excluding some characters like "+" ( I can't search C++). Is there any way we can  configure analyzers to include specific characters in analyzers while tokenising?
>
> Regards,
> Shiv
>
> -----Original Message-----
> From: "Luis A Lastras" <la...@us.ibm.com>
> Sent: ‎25-‎01-‎2015 08:05 AM
> To: "java-user@lucene.apache.org" <ja...@lucene.apache.org>
> Subject: Absolute term position in scoring
>
> Is it possible to incorporate in Lucene's scoring function the position of a matching term (say as measured from the top of the document). The scenario is, if the set of documents tend to lk about the most important stuff at the beginning of the document, then we would like to give preference to documents that mention a term close to the top.
>
> Thanks,
>
> Luis
>
>
>
>
>
> Luis A Lastras, Ph.D.
> Research Staff Member & Manager, Concept Analytics, IBM Watson
> Member of the iBM Academy of Technology
> IBM Master Inventor
> email: lastrasl@us.ibm.com | Tel: 914-945-3613 | Cell: 914-382-1879
> address:  1101 Kitchawan Rd, Office 28-132, Yorktown Heights, NY, 10598


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Can we configure analyzers to not exclude specific characters

Posted by Shivashankar Maddanimath <sh...@yahoo.in>.
Hi,

I am using  Lucene standard and uax29urlemailtokenizer. These analysers are excluding some characters like "+" ( I can't search C++). Is there any way we can  configure analyzers to include specific characters in analyzers while tokenising?

Regards,
Shiv

-----Original Message-----
From: "Luis A Lastras" <la...@us.ibm.com>
Sent: ‎25-‎01-‎2015 08:05 AM
To: "java-user@lucene.apache.org" <ja...@lucene.apache.org>
Subject: Absolute term position in scoring

Is it possible to incorporate in Lucene's scoring function the position of a matching term (say as measured from the top of the document). The scenario is, if the set of documents tend to lk about the most important stuff at the beginning of the document, then we would like to give preference to documents that mention a term close to the top.

Thanks,

Luis





Luis A Lastras, Ph.D.
Research Staff Member & Manager, Concept Analytics, IBM Watson
Member of the iBM Academy of Technology
IBM Master Inventor
email: lastrasl@us.ibm.com | Tel: 914-945-3613 | Cell: 914-382-1879
address:  1101 Kitchawan Rd, Office 28-132, Yorktown Heights, NY, 10598