You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Simon Willnauer <si...@gmail.com> on 2013/04/20 07:17:27 UTC

"[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Here is the RC:
http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054

happy voting...

here is my +1

simon

Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Shai Erera <se...@gmail.com>.
>
> It's a random failure an happened to be pretty rarely. Can you just re-run
> the smoketester?
>

Hehe ... rarely ... I ran again and this time tripped another seed:

[junit4:junit4] Suite: org.apache.lucene.search.join.TestBlockJoin
> [junit4:junit4]   2> NOTE: reproduce with: ant test
> -Dtestcase=TestBlockJoin -Dtests.method=testEmptyChildFilter
> -Dtests.seed=222F35AAD454FA32 -Dtests.slow=true -Dtests.locale=es_BO
> -Dtests.timezone=Brazil/DeNoronha -Dtests.file.encoding=Cp1252
> [junit4:junit4] FAILURE 0.11s J1 | TestBlockJoin.testEmptyChildFilter <<<
> [junit4:junit4]    > Throwable #1: java.lang.AssertionError
> [junit4:junit4]    >    at
> __randomizedtesting.SeedInfo.seed([222F35AAD454FA32:13845E4D45F5DCD8]:0)
> [junit4:junit4]    >    at
> org.apache.lucene.search.join.ToParentBlockJoinQuery$BlockJoinScorer.nextDoc(ToParentBlockJoinQuery.java:289)
> [junit4:junit4]    >    at
> org.apache.lucene.search.ConjunctionScorer.nextDoc(ConjunctionScorer.java:99)
> [junit4:junit4]    >    at
> org.apache.lucene.index.FilterAtomicReader$FilterDocsEnum.nextDoc(FilterAtomicReader.java:240)
> [junit4:junit4]    >    at
> org.apache.lucene.index.AssertingAtomicReader$AssertingDocsEnum.nextDoc(AssertingAtomicReader.java:252)
> [junit4:junit4]    >    at
> org.apache.lucene.search.AssertingIndexSearcher$AssertingScorer.nextDoc(AssertingIndexSearcher.java:295)
> [junit4:junit4]    >    at
> org.apache.lucene.search.Scorer.score(Scorer.java:64)
> [junit4:junit4]    >    at
> org.apache.lucene.search.AssertingIndexSearcher$AssertingScorer.score(AssertingIndexSearcher.java:260)
> [junit4:junit4]    >    at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)
> [junit4:junit4]    >    at
> org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:102)
> [junit4:junit4]    >    at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
> [junit4:junit4]    >    at
> org.apache.lucene.search.join.TestBlockJoin.testEmptyChildFilter(TestBlockJoin.java:106)
> [junit4:junit4]    >    at java.lang.Thread.run(Thread.java:662)
>

I'll run again!

Shai


On Sat, Apr 20, 2013 at 9:29 AM, Simon Willnauer
<si...@gmail.com>wrote:

> I just ported this test-fix to the 4_3 branch in rev 1470112. Yet, I
> don't think we really need a new RC for this unless somebody objects.
>
> simon
>
> On Sat, Apr 20, 2013 at 7:31 AM, Simon Willnauer
> <si...@gmail.com> wrote:
> > that is a test-bug I fixed in trunk and branch_4x
> > (http://svn.apache.org/r1469402)
> >
> > this has not been ported to the rel branch
> >
> > simon
> >
> > On Sat, Apr 20, 2013 at 7:25 AM, Shai Erera <se...@gmail.com> wrote:
> >> I ran:
> >>
> >> python3.2 -u lucene_4x/dev-tools/scripts/smokeTestRelease.py
> >>
> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054/
> >> 4.3.0 smoke-lucene-4.3
> >>
> >> And hit:
> >>
> >> [junit4:junit4] Suite: org.apache.lucene.search.join.TestBlockJoin
> >> [junit4:junit4]   2> NOTE: reproduce with: ant test
> >> -Dtestcase=TestBlockJoin -Dtests.method=testEmptyChildFilter
> >> -Dtests.seed=3132D930DDF0D07C -Dtests.slow=true -Dtests.locale=da_DK
> >> -Dtests.timezone=Europe/Volgograd -Dtests.file.encoding=Cp1255
> >> [junit4:junit4] FAILURE 0.16s J1 | TestBlockJoin.testEmptyChildFilter
> <<<
> >> [junit4:junit4]    > Throwable #1: java.lang.AssertionError:
> expected:<1>
> >> but was:<13>
> >> [junit4:junit4]    >    at
> >> __randomizedtesting.SeedInfo.seed([3132D930DDF0D07C:99B2D74C51F696]:0)
> >> [junit4:junit4]    >    at
> >>
> org.apache.lucene.search.join.TestBlockJoin.testEmptyChildFilter(TestBlockJoin.java:109)
> >> [junit4:junit4]    >    at java.lang.Thread.run(Thread.java:722)
> >> [junit4:junit4]   2> NOTE: test params are: codec=Appending,
> >> sim=DefaultSimilarity, locale=da_DK, timezone=Europe/Volgograd
> >> [junit4:junit4]   2> NOTE: Windows 7 6.1 amd64/Oracle Corporation
> 1.7.0_13
> >> (64-bit)/cpus=8,threads=1,free=178780368,total=261947392
> >> [junit4:junit4]   2> NOTE: All tests run in this JVM: [TestBlockJoin]
> >> [junit4:junit4] Completed on J1 in 5.18s, 10 tests, 1 failure <<<
> FAILURES!
> >>
> >> However I failed to reproduce on trunk and 4x, so not sure what's
> wrong. I
> >> assume smokeTestRelease.py works only on the downloaded RC, and does not
> >> depend on my local checkout of 4x? Because I wasn't on the latest
> revision.
> >>
> >> I'll 'svn up' and run smokeTest again, but would be good if anyone else
> >> tries to reproduce.
> >>
> >> Shai
> >>
> >>
> >> On Sat, Apr 20, 2013 at 8:17 AM, Simon Willnauer <
> simon.willnauer@gmail.com>
> >> wrote:
> >>>
> >>>
> >>> Here is the RC:
> >>>
> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
> >>>
> >>> happy voting...
> >>>
> >>> here is my +1
> >>>
> >>> simon
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Simon Willnauer <si...@gmail.com>.
I just ported this test-fix to the 4_3 branch in rev 1470112. Yet, I
don't think we really need a new RC for this unless somebody objects.

simon

On Sat, Apr 20, 2013 at 7:31 AM, Simon Willnauer
<si...@gmail.com> wrote:
> that is a test-bug I fixed in trunk and branch_4x
> (http://svn.apache.org/r1469402)
>
> this has not been ported to the rel branch
>
> simon
>
> On Sat, Apr 20, 2013 at 7:25 AM, Shai Erera <se...@gmail.com> wrote:
>> I ran:
>>
>> python3.2 -u lucene_4x/dev-tools/scripts/smokeTestRelease.py
>> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054/
>> 4.3.0 smoke-lucene-4.3
>>
>> And hit:
>>
>> [junit4:junit4] Suite: org.apache.lucene.search.join.TestBlockJoin
>> [junit4:junit4]   2> NOTE: reproduce with: ant test
>> -Dtestcase=TestBlockJoin -Dtests.method=testEmptyChildFilter
>> -Dtests.seed=3132D930DDF0D07C -Dtests.slow=true -Dtests.locale=da_DK
>> -Dtests.timezone=Europe/Volgograd -Dtests.file.encoding=Cp1255
>> [junit4:junit4] FAILURE 0.16s J1 | TestBlockJoin.testEmptyChildFilter <<<
>> [junit4:junit4]    > Throwable #1: java.lang.AssertionError: expected:<1>
>> but was:<13>
>> [junit4:junit4]    >    at
>> __randomizedtesting.SeedInfo.seed([3132D930DDF0D07C:99B2D74C51F696]:0)
>> [junit4:junit4]    >    at
>> org.apache.lucene.search.join.TestBlockJoin.testEmptyChildFilter(TestBlockJoin.java:109)
>> [junit4:junit4]    >    at java.lang.Thread.run(Thread.java:722)
>> [junit4:junit4]   2> NOTE: test params are: codec=Appending,
>> sim=DefaultSimilarity, locale=da_DK, timezone=Europe/Volgograd
>> [junit4:junit4]   2> NOTE: Windows 7 6.1 amd64/Oracle Corporation 1.7.0_13
>> (64-bit)/cpus=8,threads=1,free=178780368,total=261947392
>> [junit4:junit4]   2> NOTE: All tests run in this JVM: [TestBlockJoin]
>> [junit4:junit4] Completed on J1 in 5.18s, 10 tests, 1 failure <<< FAILURES!
>>
>> However I failed to reproduce on trunk and 4x, so not sure what's wrong. I
>> assume smokeTestRelease.py works only on the downloaded RC, and does not
>> depend on my local checkout of 4x? Because I wasn't on the latest revision.
>>
>> I'll 'svn up' and run smokeTest again, but would be good if anyone else
>> tries to reproduce.
>>
>> Shai
>>
>>
>> On Sat, Apr 20, 2013 at 8:17 AM, Simon Willnauer <si...@gmail.com>
>> wrote:
>>>
>>>
>>> Here is the RC:
>>> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
>>>
>>> happy voting...
>>>
>>> here is my +1
>>>
>>> simon
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Simon Willnauer <si...@gmail.com>.
that is a test-bug I fixed in trunk and branch_4x
(http://svn.apache.org/r1469402)

this has not been ported to the rel branch

simon

On Sat, Apr 20, 2013 at 7:25 AM, Shai Erera <se...@gmail.com> wrote:
> I ran:
>
> python3.2 -u lucene_4x/dev-tools/scripts/smokeTestRelease.py
> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054/
> 4.3.0 smoke-lucene-4.3
>
> And hit:
>
> [junit4:junit4] Suite: org.apache.lucene.search.join.TestBlockJoin
> [junit4:junit4]   2> NOTE: reproduce with: ant test
> -Dtestcase=TestBlockJoin -Dtests.method=testEmptyChildFilter
> -Dtests.seed=3132D930DDF0D07C -Dtests.slow=true -Dtests.locale=da_DK
> -Dtests.timezone=Europe/Volgograd -Dtests.file.encoding=Cp1255
> [junit4:junit4] FAILURE 0.16s J1 | TestBlockJoin.testEmptyChildFilter <<<
> [junit4:junit4]    > Throwable #1: java.lang.AssertionError: expected:<1>
> but was:<13>
> [junit4:junit4]    >    at
> __randomizedtesting.SeedInfo.seed([3132D930DDF0D07C:99B2D74C51F696]:0)
> [junit4:junit4]    >    at
> org.apache.lucene.search.join.TestBlockJoin.testEmptyChildFilter(TestBlockJoin.java:109)
> [junit4:junit4]    >    at java.lang.Thread.run(Thread.java:722)
> [junit4:junit4]   2> NOTE: test params are: codec=Appending,
> sim=DefaultSimilarity, locale=da_DK, timezone=Europe/Volgograd
> [junit4:junit4]   2> NOTE: Windows 7 6.1 amd64/Oracle Corporation 1.7.0_13
> (64-bit)/cpus=8,threads=1,free=178780368,total=261947392
> [junit4:junit4]   2> NOTE: All tests run in this JVM: [TestBlockJoin]
> [junit4:junit4] Completed on J1 in 5.18s, 10 tests, 1 failure <<< FAILURES!
>
> However I failed to reproduce on trunk and 4x, so not sure what's wrong. I
> assume smokeTestRelease.py works only on the downloaded RC, and does not
> depend on my local checkout of 4x? Because I wasn't on the latest revision.
>
> I'll 'svn up' and run smokeTest again, but would be good if anyone else
> tries to reproduce.
>
> Shai
>
>
> On Sat, Apr 20, 2013 at 8:17 AM, Simon Willnauer <si...@gmail.com>
> wrote:
>>
>>
>> Here is the RC:
>> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
>>
>> happy voting...
>>
>> here is my +1
>>
>> simon
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Shai Erera <se...@gmail.com>.
I ran:

python3.2 -u lucene_4x/dev-tools/scripts/smokeTestRelease.py
http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054/4.3.0
smoke-lucene-4.3

And hit:

[junit4:junit4] Suite: org.apache.lucene.search.join.TestBlockJoin
[junit4:junit4]   2> NOTE: reproduce with: ant test
-Dtestcase=TestBlockJoin -Dtests.method=testEmptyChildFilter
-Dtests.seed=3132D930DDF0D07C -Dtests.slow=true -Dtests.locale=da_DK
-Dtests.timezone=Europe/Volgograd -Dtests.file.encoding=Cp1255
[junit4:junit4] FAILURE 0.16s J1 | TestBlockJoin.testEmptyChildFilter <<<
[junit4:junit4]    > Throwable #1: java.lang.AssertionError: expected:<1>
but was:<13>
[junit4:junit4]    >    at
__randomizedtesting.SeedInfo.seed([3132D930DDF0D07C:99B2D74C51F696]:0)
[junit4:junit4]    >    at
org.apache.lucene.search.join.TestBlockJoin.testEmptyChildFilter(TestBlockJoin.java:109)
[junit4:junit4]    >    at java.lang.Thread.run(Thread.java:722)
[junit4:junit4]   2> NOTE: test params are: codec=Appending,
sim=DefaultSimilarity, locale=da_DK, timezone=Europe/Volgograd
[junit4:junit4]   2> NOTE: Windows 7 6.1 amd64/Oracle Corporation 1.7.0_13
(64-bit)/cpus=8,threads=1,free=178780368,total=261947392
[junit4:junit4]   2> NOTE: All tests run in this JVM: [TestBlockJoin]
[junit4:junit4] Completed on J1 in 5.18s, 10 tests, 1 failure <<< FAILURES!

However I failed to reproduce on trunk and 4x, so not sure what's wrong. I
assume smokeTestRelease.py works only on the downloaded RC, and does not
depend on my local checkout of 4x? Because I wasn't on the latest revision.

I'll 'svn up' and run smokeTest again, but would be good if anyone else
tries to reproduce.

Shai


On Sat, Apr 20, 2013 at 8:17 AM, Simon Willnauer
<si...@gmail.com>wrote:

>
> Here is the RC:
> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
>
> happy voting...
>
> here is my +1
>
> simon
>

Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Michael McCandless <lu...@mikemccandless.com>.
+1, smoke tester is happy for me.

Mike McCandless

http://blog.mikemccandless.com

On Sat, Apr 20, 2013 at 2:50 AM, Steve Rowe <sa...@gmail.com> wrote:
> +1 - smoke tester passes for me.  I also ran a script checking that the svn revision in all to-be-released {j,w}ar manifests is the same as that in the base release URL: 1470054.
>
> Steve
>
> On Apr 20, 2013, at 1:17 AM, Simon Willnauer <si...@gmail.com> wrote:
>
>>
>> Here is the RC: http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
>>
>> happy voting...
>>
>> here is my +1
>>
>> simon
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Steve Rowe <sa...@gmail.com>.
+1 - smoke tester passes for me.  I also ran a script checking that the svn revision in all to-be-released {j,w}ar manifests is the same as that in the base release URL: 1470054.

Steve

On Apr 20, 2013, at 1:17 AM, Simon Willnauer <si...@gmail.com> wrote:

> 
> Here is the RC: http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
> 
> happy voting...
> 
> here is my +1
> 
> simon


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Robert Muir <rc...@gmail.com>.
+1

Thanks for driving this one!

On Sat, Apr 20, 2013 at 1:17 AM, Simon Willnauer
<si...@gmail.com>wrote:

>
> Here is the RC:
> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
>
> happy voting...
>
> here is my +1
>
> simon
>

Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Karol Sikora <ka...@laboratorium.ee>.
I forgot about attachment...

W dniu 21.04.2013 15:21, Karol Sikora pisze:
> hi,
>
> I extracted minimal failing example, solr configs(schema, 
> solrconfig.xml) and data are in attached archive.
> I try to import simple document:
> [
>     {
>         "publisher": [
>             "T. Gl\u00fccksberg"
>         ],
>         "uid": "1000881"
>     },
>     {
>         "publisher": [
>       "Ala a kota"
>         ],
>         "uid": "1000894"
>     }
> ]
> first fails on copyfield destination publisher_hl with exception 
> (trace: https://gist.github.com/anonymous/5429558), second is added 
> without any problems.
> schema.xml is here: https://gist.github.com/anonymous/5429562
>
> When someone will trying to reproduce this behaviour remember to copy 
> libs related with morfologik and icu filters.
>
> This extracted example works fine with solr 4.0 - 4.2.1.
>
> Regards,
> Karol
>
>
>
> W dniu 21.04.2013 09:03, Simon Willnauer pisze:
>> hey karol,
>>
>> can you reproduce this behaviour in a small test-case (curl command or
>> something like this) that we can reproduce?
>>
>> @solr guys any idea what this could be?
>>
>> simon
>>
>> On Sun, Apr 21, 2013 at 1:52 AM, Karol Sikora
>> <ka...@laboratorium.ee>  wrote:
>>> Hi all,
>>>
>>> I have problem with solr 4.3 RC2 on my testing data for searching
>>> application which i'm developing.
>>> A lot of importing records fails with exception
>>> "java.lang.IllegalArgumentException: first position increment must be > 0
>>> (got 0)". On versions from early 4.0 to 4.2.1 all documents was added
>>> successfully, so I'm thinking that something is broken in new release.
>>> I'll try examine tomorrow what is broken.
>>>
>>>
>>> Regards,
>>> Karol
>>>
>>> W dniu 20.04.2013 21:07, Andi Vajda pisze:
>>>
>>>> On Sat, 20 Apr 2013, Simon Willnauer wrote:
>>>>
>>>>> Here is the RC:
>>>>>
>>>>> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
>>>>>
>>>>> happy voting...
>>>>>
>>>>> here is my +1
>>>> PyLucene 4.3 builds and passes its tests.
>>>>
>>>> +1 !
>>>>
>>>> Andi..
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail:dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail:dev-help@lucene.apache.org
>>>>
>>>>
>>> --
>>>   Karol Sikora
>>> +48 781 493 788
>>>
>>> Laboratorium EE
>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
>>> www.laboratorium.ee  |www.laboratorium.ee/facebook
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail:dev-help@lucene.apache.org
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail:dev-help@lucene.apache.org
>>
>>
>
> -- 
>   
> Karol Sikora
> Kierownik Informatyczny Projektu CBN - Interfejs 2.0
> +48 781 493 788
>
> Laboratorium EE
> ul. Mokotowska 46A/23 | 00-543 Warszawa |
> www.laboratorium.ee  |www.laboratorium.ee/facebook

-- 
  
Karol Sikora
Kierownik Informatyczny Projektu CBN - Interfejs 2.0
+48 781 493 788

Laboratorium EE
ul. Mokotowska 46A/23 | 00-543 Warszawa |
www.laboratorium.ee | www.laboratorium.ee/facebook


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Walter Underwood <wu...@wunderwood.org>.
I did the work on 4810, thanks for improving it. We've been running that code on 3.3 for a while now with no problems.

Even with a lemmatizer, you can get some odd results with edge ngrams. Any internal vowel changes, for example. Or a more heavily-inflected language than English. 

I distinguish between search helps for what you type (edge ngram, fuzzy) and those for what you "mean" (morphology, synonyms). Then there is phonetic matching, which doesn't fit into either of those.

Mixing the kinds of techniques can cause surprising results.

wunder

On Apr 21, 2013, at 11:07 PM, Steve Rowe wrote:

> I've reopened LUCENE-4810 and attached a patch with a test and fix for this problem. - Steve
> 
> On Apr 22, 2013, at 1:09 AM, Steve Rowe <sa...@gmail.com> wrote:
> 
>> Actually, Walter, I misspoke: Morfologik is a lemmatizer: it produces surface forms.  Not really so incompatible, I think.
>> 
>> Regardless of the choice to use this particular sequence of filters, EdgeNGramTokenFilter shouldn't produce a bad stream.
>> 
>> Steve
>> 
>> On Apr 21, 2013, at 8:34 PM, Walter Underwood <wu...@wunderwood.org> wrote:
>> 
>>> Don't use a stemmer with edge ngrams.
>>> 
>>> Edge ngrams are a tool for matching the surface word. Stemmers are a tool for matching the root. Those are logically incompatible transforms. 
>>> 
>>> wunder
>>> 
>>> On Apr 21, 2013, at 5:21 PM, Steve Rowe wrote:
>>> 
>>>> Karol has uncovered a bug introduced by LUCENE-4810 <https://issues.apache.org/jira/browse/LUCENE-4810>, included in Lucene/Solr 4.3.0.
>>>> 
>>>> The problem is an interaction between the Morfologik stemmer, which can produce multiple stems per input term, all but the first having a position increment of zero, and EdgeNGramTokenFilter, which only outputs ngrams for input terms that are at least as long as the minimum configured length, and passes through unchanged the position increment for the first ngram output for any given input term.
>>>> 
>>>> So what happens in Karol's case is that "T." has the period stripped by StandardTokenizer, then is stemmed by Morfologik to produce terms "to", "tom" and "tona".  The first term "to" has a position increment of 1, but is not output by EdgeNGramTokenFilter, because it's length is below the configured minimum of 3.  The second term "tom" is given a position increment of 0 by Morfologik, and meets EdgeNGramTokenFilter's minimum length, so gets output, and since it's the first output term for the input term "tom", the input position increment is left as-is in the output term: 0.  That's how the first output term gets a position increment of 0.
>>>> 
>>>> Before LUCENE-4810 was committed and included in Lucene/Solr 4.3.0, EdgeNGramTokenFilter indiscriminately set all output terms' position increments to 1, so that explains why this behavior didn't occur with previously released versions.
>>>> 
>>>> I think the fix is a check in EdgeNGramTokenFilter when outputting the first term, that the position increment is greater than 0, and if it's not, then it should be set it to 1.
>>>> 
>>>> Does anybody know if this could also be an issue for other filters?
>>>> 
>>>> I'll work on a patch for EdgeNGramTokenFilter.
>>>> 
>>>> Steve
>>>> 
>>>> On Apr 21, 2013, at 9:21 AM, Karol Sikora <ka...@laboratorium.ee> wrote:
>>>> 
>>>>> hi,
>>>>> 
>>>>> I extracted minimal failing example, solr configs(schema, solrconfig.xml) and data are in attached archive.
>>>>> I try to import simple document:
>>>>> [
>>>>>  {
>>>>>      "publisher": [
>>>>>          "T. Gl\u00fccksberg"
>>>>>      ],  
>>>>>      "uid": "1000881" 
>>>>>  }, 
>>>>>  {
>>>>>      "publisher": [
>>>>>    "Ala a kota"
>>>>>      ],
>>>>>      "uid": "1000894"
>>>>>  }
>>>>> ]
>>>>> first fails on copyfield destination publisher_hl with exception (trace: https://gist.github.com/anonymous/5429558), second is added without any problems.
>>>>> schema.xml is here: https://gist.github.com/anonymous/5429562
>>>>> 
>>>>> When someone will trying to reproduce this behaviour remember to copy libs related with morfologik and icu filters.
>>>>> 
>>>>> This extracted example works fine with solr 4.0 - 4.2.1.
>>>>> 
>>>>> Regards,
>>>>> Karol
>>>>> 
>>>>> 
>>>>> 
>>>>> W dniu 21.04.2013 09:03, Simon Willnauer pisze:
>>>>>> hey karol,
>>>>>> 
>>>>>> can you reproduce this behaviour in a small test-case (curl command or
>>>>>> something like this) that we can reproduce?
>>>>>> 
>>>>>> @solr guys any idea what this could be?
>>>>>> 
>>>>>> simon
>>>>>> 
>>>>>> On Sun, Apr 21, 2013 at 1:52 AM, Karol Sikora
>>>>>> 
>>>>>> <ka...@laboratorium.ee>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> I have problem with solr 4.3 RC2 on my testing data for searching
>>>>>>> application which i'm developing.
>>>>>>> A lot of importing records fails with exception
>>>>>>> "java.lang.IllegalArgumentException: first position increment must be > 0
>>>>>>> (got 0)". On versions from early 4.0 to 4.2.1 all documents was added
>>>>>>> successfully, so I'm thinking that something is broken in new release.
>>>>>>> I'll try examine tomorrow what is broken.
>>>>>>> 
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Karol
>>>>>>> 
>>>>>>> W dniu 20.04.2013 21:07, Andi Vajda pisze:
>>>>>>> 
>>>>>>> 
>>>>>>>> On Sat, 20 Apr 2013, Simon Willnauer wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Here is the RC:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> happy voting...
>>>>>>>>> 
>>>>>>>>> here is my +1
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> PyLucene 4.3 builds and passes its tests.
>>>>>>>> 
>>>>>>>> +1 !
>>>>>>>> 
>>>>>>>> Andi..
>>>>>>>> 
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: 
>>>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>>>> 
>>>>>>>> For additional commands, e-mail: 
>>>>>>>> dev-help@lucene.apache.org
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> --
>>>>>>> Karol Sikora
>>>>>>> +48 781 493 788
>>>>>>> 
>>>>>>> Laboratorium EE
>>>>>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
>>>>>>> 
>>>>>>> www.laboratorium.ee | www.laboratorium.ee/facebook
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: 
>>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>>> 
>>>>>>> For additional commands, e-mail: 
>>>>>>> dev-help@lucene.apache.org
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: 
>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>> 
>>>>>> For additional commands, e-mail: 
>>>>>> dev-help@lucene.apache.org
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> -- 
>>>>> 
>>>>> Karol Sikora
>>>>> Kierownik Informatyczny Projektu CBN - Interfejs 2.0
>>>>> +48 781 493 788
>>>>> 
>>>>> Laboratorium EE
>>>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
>>>>> 
>>>>> www.laboratorium.ee | www.laboratorium.ee/facebook
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>> 
>>> 
>>> --
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> 
>>> 
>>> 
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 

--
Walter Underwood
wunder@wunderwood.org




Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Walter Underwood <wu...@wunderwood.org>.
I would put this in 4.3. This is the first release with the position fix for edge ngrams, so it would make sense to fix it all the way, rather than have two different levels of fix in two different releases.

wunder

On Apr 22, 2013, at 6:17 AM, Simon Willnauer wrote:

> I think we can add this to 4.3 I can roll another RC for that.
> 
> simon
> 
> On Mon, Apr 22, 2013 at 3:11 PM, Jack Krupansky <ja...@basetechnology.com> wrote:
>> Is this a fix to 4.3 (RC3?) or for a 4.3.1?
>> 
>> -- Jack Krupansky
>> 
>> -----Original Message----- From: Steve Rowe
>> Sent: Monday, April 22, 2013 2:07 AM
>> 
>> To: dev@lucene.apache.org
>> Subject: Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"
>> 
>> I've reopened LUCENE-4810 and attached a patch with a test and fix for this
>> problem. - Steve
>> 
>> On Apr 22, 2013, at 1:09 AM, Steve Rowe <sa...@gmail.com> wrote:
>> 
>>> Actually, Walter, I misspoke: Morfologik is a lemmatizer: it produces
>>> surface forms.  Not really so incompatible, I think.
>>> 
>>> Regardless of the choice to use this particular sequence of filters,
>>> EdgeNGramTokenFilter shouldn't produce a bad stream.
>>> 
>>> Steve
>>> 
>>> On Apr 21, 2013, at 8:34 PM, Walter Underwood <wu...@wunderwood.org>
>>> wrote:
>>> 
>>>> Don't use a stemmer with edge ngrams.
>>>> 
>>>> Edge ngrams are a tool for matching the surface word. Stemmers are a tool
>>>> for matching the root. Those are logically incompatible transforms.
>>>> 
>>>> wunder
>>>> 
>>>> On Apr 21, 2013, at 5:21 PM, Steve Rowe wrote:
>>>> 
>>>>> Karol has uncovered a bug introduced by LUCENE-4810
>>>>> <https://issues.apache.org/jira/browse/LUCENE-4810>, included in Lucene/Solr
>>>>> 4.3.0.
>>>>> 
>>>>> The problem is an interaction between the Morfologik stemmer, which can
>>>>> produce multiple stems per input term, all but the first having a position
>>>>> increment of zero, and EdgeNGramTokenFilter, which only outputs ngrams for
>>>>> input terms that are at least as long as the minimum configured length, and
>>>>> passes through unchanged the position increment for the first ngram output
>>>>> for any given input term.
>>>>> 
>>>>> So what happens in Karol's case is that "T." has the period stripped by
>>>>> StandardTokenizer, then is stemmed by Morfologik to produce terms "to",
>>>>> "tom" and "tona".  The first term "to" has a position increment of 1, but is
>>>>> not output by EdgeNGramTokenFilter, because it's length is below the
>>>>> configured minimum of 3.  The second term "tom" is given a position
>>>>> increment of 0 by Morfologik, and meets EdgeNGramTokenFilter's minimum
>>>>> length, so gets output, and since it's the first output term for the input
>>>>> term "tom", the input position increment is left as-is in the output term:
>>>>> 0.  That's how the first output term gets a position increment of 0.
>>>>> 
>>>>> Before LUCENE-4810 was committed and included in Lucene/Solr 4.3.0,
>>>>> EdgeNGramTokenFilter indiscriminately set all output terms' position
>>>>> increments to 1, so that explains why this behavior didn't occur with
>>>>> previously released versions.
>>>>> 
>>>>> I think the fix is a check in EdgeNGramTokenFilter when outputting the
>>>>> first term, that the position increment is greater than 0, and if it's not,
>>>>> then it should be set it to 1.
>>>>> 
>>>>> Does anybody know if this could also be an issue for other filters?
>>>>> 
>>>>> I'll work on a patch for EdgeNGramTokenFilter.
>>>>> 
>>>>> Steve
>>>>> 
>>>>> On Apr 21, 2013, at 9:21 AM, Karol Sikora <ka...@laboratorium.ee>
>>>>> wrote:
>>>>> 
>>>>>> hi,
>>>>>> 
>>>>>> I extracted minimal failing example, solr configs(schema,
>>>>>> solrconfig.xml) and data are in attached archive.
>>>>>> I try to import simple document:
>>>>>> [
>>>>>>  {
>>>>>>      "publisher": [
>>>>>>          "T. Gl\u00fccksberg"
>>>>>>      ],
>>>>>>      "uid": "1000881"
>>>>>>  },
>>>>>>  {
>>>>>>      "publisher": [
>>>>>>    "Ala a kota"
>>>>>>      ],
>>>>>>      "uid": "1000894"
>>>>>>  }
>>>>>> ]
>>>>>> first fails on copyfield destination publisher_hl with exception
>>>>>> (trace: https://gist.github.com/anonymous/5429558), second is added without
>>>>>> any problems.
>>>>>> schema.xml is here: https://gist.github.com/anonymous/5429562
>>>>>> 
>>>>>> When someone will trying to reproduce this behaviour remember to copy
>>>>>> libs related with morfologik and icu filters.
>>>>>> 
>>>>>> This extracted example works fine with solr 4.0 - 4.2.1.
>>>>>> 
>>>>>> Regards,
>>>>>> Karol
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> W dniu 21.04.2013 09:03, Simon Willnauer pisze:
>>>>>>> 
>>>>>>> hey karol,
>>>>>>> 
>>>>>>> can you reproduce this behaviour in a small test-case (curl command or
>>>>>>> something like this) that we can reproduce?
>>>>>>> 
>>>>>>> @solr guys any idea what this could be?
>>>>>>> 
>>>>>>> simon
>>>>>>> 
>>>>>>> On Sun, Apr 21, 2013 at 1:52 AM, Karol Sikora
>>>>>>> 
>>>>>>> <ka...@laboratorium.ee>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> I have problem with solr 4.3 RC2 on my testing data for searching
>>>>>>>> application which i'm developing.
>>>>>>>> A lot of importing records fails with exception
>>>>>>>> "java.lang.IllegalArgumentException: first position increment must be
>>>>>>>>> 0
>>>>>>>> (got 0)". On versions from early 4.0 to 4.2.1 all documents was added
>>>>>>>> successfully, so I'm thinking that something is broken in new
>>>>>>>> release.
>>>>>>>> I'll try examine tomorrow what is broken.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Karol
>>>>>>>> 
>>>>>>>> W dniu 20.04.2013 21:07, Andi Vajda pisze:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Sat, 20 Apr 2013, Simon Willnauer wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> Here is the RC:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> happy voting...
>>>>>>>>>> 
>>>>>>>>>> here is my +1
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> PyLucene 4.3 builds and passes its tests.
>>>>>>>>> 
>>>>>>>>> +1 !
>>>>>>>>> 
>>>>>>>>> Andi..
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail:
>>>>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>>>>> 
>>>>>>>>> For additional commands, e-mail:
>>>>>>>>> dev-help@lucene.apache.org
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> --
>>>>>>>> Karol Sikora
>>>>>>>> +48 781 493 788
>>>>>>>> 
>>>>>>>> Laboratorium EE
>>>>>>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
>>>>>>>> 
>>>>>>>> www.laboratorium.ee | www.laboratorium.ee/facebook
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail:
>>>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>>>> 
>>>>>>>> For additional commands, e-mail:
>>>>>>>> dev-help@lucene.apache.org
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail:
>>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>>> 
>>>>>>> For additional commands, e-mail:
>>>>>>> dev-help@lucene.apache.org
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> Karol Sikora
>>>>>> Kierownik Informatyczny Projektu CBN - Interfejs 2.0
>>>>>> +48 781 493 788
>>>>>> 
>>>>>> Laboratorium EE
>>>>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
>>>>>> 
>>>>>> www.laboratorium.ee | www.laboratorium.ee/facebook
>>>>> 
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>> 
>>>> 
>>>> --
>>>> Walter Underwood
>>>> wunder@wunderwood.org
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 

--
Walter Underwood
wunder@wunderwood.org




Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Simon Willnauer <si...@gmail.com>.
Yonik, can you fix this in the near future? I mean how much time do you need?

On Mon, Apr 22, 2013 at 3:23 PM, Yonik Seeley <yo...@lucidworks.com> wrote:
> On Mon, Apr 22, 2013 at 9:17 AM, Simon Willnauer
> <si...@gmail.com> wrote:
>> I think we can add this to 4.3 I can roll another RC for that.
>
> Someone just found a somewhat serious (but hopefully simple to fix)
> bug that I'd like to get in also:
> https://issues.apache.org/jira/browse/SOLR-4746
>
> -Yonik
> http://lucidworks.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Yonik Seeley <yo...@lucidworks.com>.
On Mon, Apr 22, 2013 at 9:17 AM, Simon Willnauer
<si...@gmail.com> wrote:
> I think we can add this to 4.3 I can roll another RC for that.

Someone just found a somewhat serious (but hopefully simple to fix)
bug that I'd like to get in also:
https://issues.apache.org/jira/browse/SOLR-4746

-Yonik
http://lucidworks.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Simon Willnauer <si...@gmail.com>.
I hear ya robert!

On Mon, Apr 22, 2013 at 3:54 PM, Robert Muir <rc...@gmail.com> wrote:
>
>
> On Mon, Apr 22, 2013 at 9:52 AM, Simon Willnauer <si...@gmail.com>
> wrote:
>>
>>
>> my take on this more community oriented. I really want to encourage
>> folks to test our releases. Its a lot of work to upgrade existing apps
>> to run with an RC and if somebody does that and finds a bug I think
>> this is worth rolling a new RC. I don't have a rush here and quality
>> of the release is most important here. If this makes 1 more person
>> running our RC against their app to have a chance to catch a bug that
>> would prevent them to upgrade it's worth the effort.
>
>
> I'm just voicing my dissatisfaction with this filter: we have a test that
> would have found the bug, but its disabled.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Robert Muir <rc...@gmail.com>.
On Mon, Apr 22, 2013 at 9:52 AM, Simon Willnauer
<si...@gmail.com>wrote:

>
> my take on this more community oriented. I really want to encourage
> folks to test our releases. Its a lot of work to upgrade existing apps
> to run with an RC and if somebody does that and finds a bug I think
> this is worth rolling a new RC. I don't have a rush here and quality
> of the release is most important here. If this makes 1 more person
> running our RC against their app to have a chance to catch a bug that
> would prevent them to upgrade it's worth the effort.
>

I'm just voicing my dissatisfaction with this filter: we have a test that
would have found the bug, but its disabled.

Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Chris Hostetter <ho...@fucit.org>.
: my take on this more community oriented. I really want to encourage
: folks to test our releases. Its a lot of work to upgrade existing apps
: to run with an RC and if somebody does that and finds a bug I think
: this is worth rolling a new RC. I don't have a rush here and quality
: of the release is most important here. If this makes 1 more person
: running our RC against their app to have a chance to catch a bug that
: would prevent them to upgrade it's worth the effort.

+1, thank you simon.


-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Simon Willnauer <si...@gmail.com>.
building an RC3 off SVN rev: 1470541 now

simon

On Mon, Apr 22, 2013 at 3:59 PM, Steve Rowe <sa...@gmail.com> wrote:
> On Apr 22, 2013, at 9:52 AM, Simon Willnauer <si...@gmail.com> wrote:
>> On Mon, Apr 22, 2013 at 3:42 PM, Steve Rowe <sa...@gmail.com> wrote:
>>> I just committed the edge-ngrams fix on the 4.3 release branch.
>>>
>>> I will not -1 RC2 for this, but if we're respinning anyway for SOLR-4746, including the edge-ngrams fix in the respin shouldn't be a problem.
>>>
>>> Steve
>>>
>>> On Apr 22, 2013, at 9:27 AM, Robert Muir <rc...@gmail.com> wrote:
>>>
>>>> If I was the RM, i would not respin for this edge-ngrams filter.
>>
>> my take on this more community oriented. I really want to encourage
>> folks to test our releases. Its a lot of work to upgrade existing apps
>> to run with an RC and if somebody does that and finds a bug I think
>> this is worth rolling a new RC. I don't have a rush here and quality
>> of the release is most important here. If this makes 1 more person
>> running our RC against their app to have a chance to catch a bug that
>> would prevent them to upgrade it's worth the effort.
>
> +1, I completely agree and would do the same.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Steve Rowe <sa...@gmail.com>.
On Apr 22, 2013, at 9:52 AM, Simon Willnauer <si...@gmail.com> wrote:
> On Mon, Apr 22, 2013 at 3:42 PM, Steve Rowe <sa...@gmail.com> wrote:
>> I just committed the edge-ngrams fix on the 4.3 release branch.
>> 
>> I will not -1 RC2 for this, but if we're respinning anyway for SOLR-4746, including the edge-ngrams fix in the respin shouldn't be a problem.
>> 
>> Steve
>> 
>> On Apr 22, 2013, at 9:27 AM, Robert Muir <rc...@gmail.com> wrote:
>> 
>>> If I was the RM, i would not respin for this edge-ngrams filter.
> 
> my take on this more community oriented. I really want to encourage
> folks to test our releases. Its a lot of work to upgrade existing apps
> to run with an RC and if somebody does that and finds a bug I think
> this is worth rolling a new RC. I don't have a rush here and quality
> of the release is most important here. If this makes 1 more person
> running our RC against their app to have a chance to catch a bug that
> would prevent them to upgrade it's worth the effort.

+1, I completely agree and would do the same.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Simon Willnauer <si...@gmail.com>.
On Mon, Apr 22, 2013 at 3:42 PM, Steve Rowe <sa...@gmail.com> wrote:
> I just committed the edge-ngrams fix on the 4.3 release branch.
>
> I will not -1 RC2 for this, but if we're respinning anyway for SOLR-4746, including the edge-ngrams fix in the respin shouldn't be a problem.
>
> Steve
>
> On Apr 22, 2013, at 9:27 AM, Robert Muir <rc...@gmail.com> wrote:
>
>> If I was the RM, i would not respin for this edge-ngrams filter.

my take on this more community oriented. I really want to encourage
folks to test our releases. Its a lot of work to upgrade existing apps
to run with an RC and if somebody does that and finds a bug I think
this is worth rolling a new RC. I don't have a rush here and quality
of the release is most important here. If this makes 1 more person
running our RC against their app to have a chance to catch a bug that
would prevent them to upgrade it's worth the effort.

I will catch up with yonik and see how long he needs for SOLR-4746

simon


>>
>> We already have tests to find such bugs, but these tests are currently disabled (!) because the filter is basically rotting.
>>
>> So i can't see how something can be important enough to respin a release candidate for, but not important in the sense no one cares if its unit tests are really working.
>>
>> On Mon, Apr 22, 2013 at 9:17 AM, Simon Willnauer <si...@gmail.com> wrote:
>> I think we can add this to 4.3 I can roll another RC for that.
>>
>> simon
>>
>> On Mon, Apr 22, 2013 at 3:11 PM, Jack Krupansky <ja...@basetechnology.com> wrote:
>> > Is this a fix to 4.3 (RC3?) or for a 4.3.1?
>> >
>> > -- Jack Krupansky
>> >
>> > -----Original Message----- From: Steve Rowe
>> > Sent: Monday, April 22, 2013 2:07 AM
>> >
>> > To: dev@lucene.apache.org
>> > Subject: Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"
>> >
>> > I've reopened LUCENE-4810 and attached a patch with a test and fix for this
>> > problem. - Steve
>> >
>> > On Apr 22, 2013, at 1:09 AM, Steve Rowe <sa...@gmail.com> wrote:
>> >
>> >> Actually, Walter, I misspoke: Morfologik is a lemmatizer: it produces
>> >> surface forms.  Not really so incompatible, I think.
>> >>
>> >> Regardless of the choice to use this particular sequence of filters,
>> >> EdgeNGramTokenFilter shouldn't produce a bad stream.
>> >>
>> >> Steve
>> >>
>> >> On Apr 21, 2013, at 8:34 PM, Walter Underwood <wu...@wunderwood.org>
>> >> wrote:
>> >>
>> >>> Don't use a stemmer with edge ngrams.
>> >>>
>> >>> Edge ngrams are a tool for matching the surface word. Stemmers are a tool
>> >>> for matching the root. Those are logically incompatible transforms.
>> >>>
>> >>> wunder
>> >>>
>> >>> On Apr 21, 2013, at 5:21 PM, Steve Rowe wrote:
>> >>>
>> >>>> Karol has uncovered a bug introduced by LUCENE-4810
>> >>>> <https://issues.apache.org/jira/browse/LUCENE-4810>, included in Lucene/Solr
>> >>>> 4.3.0.
>> >>>>
>> >>>> The problem is an interaction between the Morfologik stemmer, which can
>> >>>> produce multiple stems per input term, all but the first having a position
>> >>>> increment of zero, and EdgeNGramTokenFilter, which only outputs ngrams for
>> >>>> input terms that are at least as long as the minimum configured length, and
>> >>>> passes through unchanged the position increment for the first ngram output
>> >>>> for any given input term.
>> >>>>
>> >>>> So what happens in Karol's case is that "T." has the period stripped by
>> >>>> StandardTokenizer, then is stemmed by Morfologik to produce terms "to",
>> >>>> "tom" and "tona".  The first term "to" has a position increment of 1, but is
>> >>>> not output by EdgeNGramTokenFilter, because it's length is below the
>> >>>> configured minimum of 3.  The second term "tom" is given a position
>> >>>> increment of 0 by Morfologik, and meets EdgeNGramTokenFilter's minimum
>> >>>> length, so gets output, and since it's the first output term for the input
>> >>>> term "tom", the input position increment is left as-is in the output term:
>> >>>> 0.  That's how the first output term gets a position increment of 0.
>> >>>>
>> >>>> Before LUCENE-4810 was committed and included in Lucene/Solr 4.3.0,
>> >>>> EdgeNGramTokenFilter indiscriminately set all output terms' position
>> >>>> increments to 1, so that explains why this behavior didn't occur with
>> >>>> previously released versions.
>> >>>>
>> >>>> I think the fix is a check in EdgeNGramTokenFilter when outputting the
>> >>>> first term, that the position increment is greater than 0, and if it's not,
>> >>>> then it should be set it to 1.
>> >>>>
>> >>>> Does anybody know if this could also be an issue for other filters?
>> >>>>
>> >>>> I'll work on a patch for EdgeNGramTokenFilter.
>> >>>>
>> >>>> Steve
>> >>>>
>> >>>> On Apr 21, 2013, at 9:21 AM, Karol Sikora <ka...@laboratorium.ee>
>> >>>> wrote:
>> >>>>
>> >>>>> hi,
>> >>>>>
>> >>>>> I extracted minimal failing example, solr configs(schema,
>> >>>>> solrconfig.xml) and data are in attached archive.
>> >>>>> I try to import simple document:
>> >>>>> [
>> >>>>>   {
>> >>>>>       "publisher": [
>> >>>>>           "T. Gl\u00fccksberg"
>> >>>>>       ],
>> >>>>>       "uid": "1000881"
>> >>>>>   },
>> >>>>>   {
>> >>>>>       "publisher": [
>> >>>>>     "Ala a kota"
>> >>>>>       ],
>> >>>>>       "uid": "1000894"
>> >>>>>   }
>> >>>>> ]
>> >>>>> first fails on copyfield destination publisher_hl with exception
>> >>>>> (trace: https://gist.github.com/anonymous/5429558), second is added without
>> >>>>> any problems.
>> >>>>> schema.xml is here: https://gist.github.com/anonymous/5429562
>> >>>>>
>> >>>>> When someone will trying to reproduce this behaviour remember to copy
>> >>>>> libs related with morfologik and icu filters.
>> >>>>>
>> >>>>> This extracted example works fine with solr 4.0 - 4.2.1.
>> >>>>>
>> >>>>> Regards,
>> >>>>> Karol
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> W dniu 21.04.2013 09:03, Simon Willnauer pisze:
>> >>>>>>
>> >>>>>> hey karol,
>> >>>>>>
>> >>>>>> can you reproduce this behaviour in a small test-case (curl command or
>> >>>>>> something like this) that we can reproduce?
>> >>>>>>
>> >>>>>> @solr guys any idea what this could be?
>> >>>>>>
>> >>>>>> simon
>> >>>>>>
>> >>>>>> On Sun, Apr 21, 2013 at 1:52 AM, Karol Sikora
>> >>>>>>
>> >>>>>> <ka...@laboratorium.ee>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> Hi all,
>> >>>>>>>
>> >>>>>>> I have problem with solr 4.3 RC2 on my testing data for searching
>> >>>>>>> application which i'm developing.
>> >>>>>>> A lot of importing records fails with exception
>> >>>>>>> "java.lang.IllegalArgumentException: first position increment must be
>> >>>>>>> > 0
>> >>>>>>> (got 0)". On versions from early 4.0 to 4.2.1 all documents was added
>> >>>>>>> successfully, so I'm thinking that something is broken in new
>> >>>>>>> release.
>> >>>>>>> I'll try examine tomorrow what is broken.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Regards,
>> >>>>>>> Karol
>> >>>>>>>
>> >>>>>>> W dniu 20.04.2013 21:07, Andi Vajda pisze:
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>> On Sat, 20 Apr 2013, Simon Willnauer wrote:
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>> Here is the RC:
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> happy voting...
>> >>>>>>>>>
>> >>>>>>>>> here is my +1
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>> PyLucene 4.3 builds and passes its tests.
>> >>>>>>>>
>> >>>>>>>> +1 !
>> >>>>>>>>
>> >>>>>>>> Andi..
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> ---------------------------------------------------------------------
>> >>>>>>>> To unsubscribe, e-mail:
>> >>>>>>>> dev-unsubscribe@lucene.apache.org
>> >>>>>>>>
>> >>>>>>>> For additional commands, e-mail:
>> >>>>>>>> dev-help@lucene.apache.org
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>> --
>> >>>>>>> Karol Sikora
>> >>>>>>> +48 781 493 788
>> >>>>>>>
>> >>>>>>> Laboratorium EE
>> >>>>>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
>> >>>>>>>
>> >>>>>>> www.laboratorium.ee | www.laboratorium.ee/facebook
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> ---------------------------------------------------------------------
>> >>>>>>> To unsubscribe, e-mail:
>> >>>>>>> dev-unsubscribe@lucene.apache.org
>> >>>>>>>
>> >>>>>>> For additional commands, e-mail:
>> >>>>>>> dev-help@lucene.apache.org
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>> ---------------------------------------------------------------------
>> >>>>>> To unsubscribe, e-mail:
>> >>>>>> dev-unsubscribe@lucene.apache.org
>> >>>>>>
>> >>>>>> For additional commands, e-mail:
>> >>>>>> dev-help@lucene.apache.org
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>> --
>> >>>>>
>> >>>>> Karol Sikora
>> >>>>> Kierownik Informatyczny Projektu CBN - Interfejs 2.0
>> >>>>> +48 781 493 788
>> >>>>>
>> >>>>> Laboratorium EE
>> >>>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
>> >>>>>
>> >>>>> www.laboratorium.ee | www.laboratorium.ee/facebook
>> >>>>
>> >>>>
>> >>>>
>> >>>> ---------------------------------------------------------------------
>> >>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> >>>> For additional commands, e-mail: dev-help@lucene.apache.org
>> >>>>
>> >>>
>> >>> --
>> >>> Walter Underwood
>> >>> wunder@wunderwood.org
>> >>>
>> >>>
>> >>>
>> >>
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: dev-help@lucene.apache.org
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: dev-help@lucene.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Steve Rowe <sa...@gmail.com>.
I just committed the edge-ngrams fix on the 4.3 release branch.

I will not -1 RC2 for this, but if we're respinning anyway for SOLR-4746, including the edge-ngrams fix in the respin shouldn't be a problem.

Steve

On Apr 22, 2013, at 9:27 AM, Robert Muir <rc...@gmail.com> wrote:

> If I was the RM, i would not respin for this edge-ngrams filter.
> 
> We already have tests to find such bugs, but these tests are currently disabled (!) because the filter is basically rotting.
> 
> So i can't see how something can be important enough to respin a release candidate for, but not important in the sense no one cares if its unit tests are really working.
> 
> On Mon, Apr 22, 2013 at 9:17 AM, Simon Willnauer <si...@gmail.com> wrote:
> I think we can add this to 4.3 I can roll another RC for that.
> 
> simon
> 
> On Mon, Apr 22, 2013 at 3:11 PM, Jack Krupansky <ja...@basetechnology.com> wrote:
> > Is this a fix to 4.3 (RC3?) or for a 4.3.1?
> >
> > -- Jack Krupansky
> >
> > -----Original Message----- From: Steve Rowe
> > Sent: Monday, April 22, 2013 2:07 AM
> >
> > To: dev@lucene.apache.org
> > Subject: Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"
> >
> > I've reopened LUCENE-4810 and attached a patch with a test and fix for this
> > problem. - Steve
> >
> > On Apr 22, 2013, at 1:09 AM, Steve Rowe <sa...@gmail.com> wrote:
> >
> >> Actually, Walter, I misspoke: Morfologik is a lemmatizer: it produces
> >> surface forms.  Not really so incompatible, I think.
> >>
> >> Regardless of the choice to use this particular sequence of filters,
> >> EdgeNGramTokenFilter shouldn't produce a bad stream.
> >>
> >> Steve
> >>
> >> On Apr 21, 2013, at 8:34 PM, Walter Underwood <wu...@wunderwood.org>
> >> wrote:
> >>
> >>> Don't use a stemmer with edge ngrams.
> >>>
> >>> Edge ngrams are a tool for matching the surface word. Stemmers are a tool
> >>> for matching the root. Those are logically incompatible transforms.
> >>>
> >>> wunder
> >>>
> >>> On Apr 21, 2013, at 5:21 PM, Steve Rowe wrote:
> >>>
> >>>> Karol has uncovered a bug introduced by LUCENE-4810
> >>>> <https://issues.apache.org/jira/browse/LUCENE-4810>, included in Lucene/Solr
> >>>> 4.3.0.
> >>>>
> >>>> The problem is an interaction between the Morfologik stemmer, which can
> >>>> produce multiple stems per input term, all but the first having a position
> >>>> increment of zero, and EdgeNGramTokenFilter, which only outputs ngrams for
> >>>> input terms that are at least as long as the minimum configured length, and
> >>>> passes through unchanged the position increment for the first ngram output
> >>>> for any given input term.
> >>>>
> >>>> So what happens in Karol's case is that "T." has the period stripped by
> >>>> StandardTokenizer, then is stemmed by Morfologik to produce terms "to",
> >>>> "tom" and "tona".  The first term "to" has a position increment of 1, but is
> >>>> not output by EdgeNGramTokenFilter, because it's length is below the
> >>>> configured minimum of 3.  The second term "tom" is given a position
> >>>> increment of 0 by Morfologik, and meets EdgeNGramTokenFilter's minimum
> >>>> length, so gets output, and since it's the first output term for the input
> >>>> term "tom", the input position increment is left as-is in the output term:
> >>>> 0.  That's how the first output term gets a position increment of 0.
> >>>>
> >>>> Before LUCENE-4810 was committed and included in Lucene/Solr 4.3.0,
> >>>> EdgeNGramTokenFilter indiscriminately set all output terms' position
> >>>> increments to 1, so that explains why this behavior didn't occur with
> >>>> previously released versions.
> >>>>
> >>>> I think the fix is a check in EdgeNGramTokenFilter when outputting the
> >>>> first term, that the position increment is greater than 0, and if it's not,
> >>>> then it should be set it to 1.
> >>>>
> >>>> Does anybody know if this could also be an issue for other filters?
> >>>>
> >>>> I'll work on a patch for EdgeNGramTokenFilter.
> >>>>
> >>>> Steve
> >>>>
> >>>> On Apr 21, 2013, at 9:21 AM, Karol Sikora <ka...@laboratorium.ee>
> >>>> wrote:
> >>>>
> >>>>> hi,
> >>>>>
> >>>>> I extracted minimal failing example, solr configs(schema,
> >>>>> solrconfig.xml) and data are in attached archive.
> >>>>> I try to import simple document:
> >>>>> [
> >>>>>   {
> >>>>>       "publisher": [
> >>>>>           "T. Gl\u00fccksberg"
> >>>>>       ],
> >>>>>       "uid": "1000881"
> >>>>>   },
> >>>>>   {
> >>>>>       "publisher": [
> >>>>>     "Ala a kota"
> >>>>>       ],
> >>>>>       "uid": "1000894"
> >>>>>   }
> >>>>> ]
> >>>>> first fails on copyfield destination publisher_hl with exception
> >>>>> (trace: https://gist.github.com/anonymous/5429558), second is added without
> >>>>> any problems.
> >>>>> schema.xml is here: https://gist.github.com/anonymous/5429562
> >>>>>
> >>>>> When someone will trying to reproduce this behaviour remember to copy
> >>>>> libs related with morfologik and icu filters.
> >>>>>
> >>>>> This extracted example works fine with solr 4.0 - 4.2.1.
> >>>>>
> >>>>> Regards,
> >>>>> Karol
> >>>>>
> >>>>>
> >>>>>
> >>>>> W dniu 21.04.2013 09:03, Simon Willnauer pisze:
> >>>>>>
> >>>>>> hey karol,
> >>>>>>
> >>>>>> can you reproduce this behaviour in a small test-case (curl command or
> >>>>>> something like this) that we can reproduce?
> >>>>>>
> >>>>>> @solr guys any idea what this could be?
> >>>>>>
> >>>>>> simon
> >>>>>>
> >>>>>> On Sun, Apr 21, 2013 at 1:52 AM, Karol Sikora
> >>>>>>
> >>>>>> <ka...@laboratorium.ee>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I have problem with solr 4.3 RC2 on my testing data for searching
> >>>>>>> application which i'm developing.
> >>>>>>> A lot of importing records fails with exception
> >>>>>>> "java.lang.IllegalArgumentException: first position increment must be
> >>>>>>> > 0
> >>>>>>> (got 0)". On versions from early 4.0 to 4.2.1 all documents was added
> >>>>>>> successfully, so I'm thinking that something is broken in new
> >>>>>>> release.
> >>>>>>> I'll try examine tomorrow what is broken.
> >>>>>>>
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Karol
> >>>>>>>
> >>>>>>> W dniu 20.04.2013 21:07, Andi Vajda pisze:
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Sat, 20 Apr 2013, Simon Willnauer wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> Here is the RC:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> happy voting...
> >>>>>>>>>
> >>>>>>>>> here is my +1
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> PyLucene 4.3 builds and passes its tests.
> >>>>>>>>
> >>>>>>>> +1 !
> >>>>>>>>
> >>>>>>>> Andi..
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail:
> >>>>>>>> dev-unsubscribe@lucene.apache.org
> >>>>>>>>
> >>>>>>>> For additional commands, e-mail:
> >>>>>>>> dev-help@lucene.apache.org
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>> --
> >>>>>>> Karol Sikora
> >>>>>>> +48 781 493 788
> >>>>>>>
> >>>>>>> Laboratorium EE
> >>>>>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
> >>>>>>>
> >>>>>>> www.laboratorium.ee | www.laboratorium.ee/facebook
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail:
> >>>>>>> dev-unsubscribe@lucene.apache.org
> >>>>>>>
> >>>>>>> For additional commands, e-mail:
> >>>>>>> dev-help@lucene.apache.org
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail:
> >>>>>> dev-unsubscribe@lucene.apache.org
> >>>>>>
> >>>>>> For additional commands, e-mail:
> >>>>>> dev-help@lucene.apache.org
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Karol Sikora
> >>>>> Kierownik Informatyczny Projektu CBN - Interfejs 2.0
> >>>>> +48 781 493 788
> >>>>>
> >>>>> Laboratorium EE
> >>>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
> >>>>>
> >>>>> www.laboratorium.ee | www.laboratorium.ee/facebook
> >>>>
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: dev-help@lucene.apache.org
> >>>>
> >>>
> >>> --
> >>> Walter Underwood
> >>> wunder@wunderwood.org
> >>>
> >>>
> >>>
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Robert Muir <rc...@gmail.com>.
If I was the RM, i would not respin for this edge-ngrams filter.

We already have tests to find such bugs, but these tests are currently
disabled (!) because the filter is basically rotting.

So i can't see how something can be important enough to respin a release
candidate for, but not important in the sense no one cares if its unit
tests are really working.

On Mon, Apr 22, 2013 at 9:17 AM, Simon Willnauer
<si...@gmail.com>wrote:

> I think we can add this to 4.3 I can roll another RC for that.
>
> simon
>
> On Mon, Apr 22, 2013 at 3:11 PM, Jack Krupansky <ja...@basetechnology.com>
> wrote:
> > Is this a fix to 4.3 (RC3?) or for a 4.3.1?
> >
> > -- Jack Krupansky
> >
> > -----Original Message----- From: Steve Rowe
> > Sent: Monday, April 22, 2013 2:07 AM
> >
> > To: dev@lucene.apache.org
> > Subject: Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"
> >
> > I've reopened LUCENE-4810 and attached a patch with a test and fix for
> this
> > problem. - Steve
> >
> > On Apr 22, 2013, at 1:09 AM, Steve Rowe <sa...@gmail.com> wrote:
> >
> >> Actually, Walter, I misspoke: Morfologik is a lemmatizer: it produces
> >> surface forms.  Not really so incompatible, I think.
> >>
> >> Regardless of the choice to use this particular sequence of filters,
> >> EdgeNGramTokenFilter shouldn't produce a bad stream.
> >>
> >> Steve
> >>
> >> On Apr 21, 2013, at 8:34 PM, Walter Underwood <wu...@wunderwood.org>
> >> wrote:
> >>
> >>> Don't use a stemmer with edge ngrams.
> >>>
> >>> Edge ngrams are a tool for matching the surface word. Stemmers are a
> tool
> >>> for matching the root. Those are logically incompatible transforms.
> >>>
> >>> wunder
> >>>
> >>> On Apr 21, 2013, at 5:21 PM, Steve Rowe wrote:
> >>>
> >>>> Karol has uncovered a bug introduced by LUCENE-4810
> >>>> <https://issues.apache.org/jira/browse/LUCENE-4810>, included in
> Lucene/Solr
> >>>> 4.3.0.
> >>>>
> >>>> The problem is an interaction between the Morfologik stemmer, which
> can
> >>>> produce multiple stems per input term, all but the first having a
> position
> >>>> increment of zero, and EdgeNGramTokenFilter, which only outputs
> ngrams for
> >>>> input terms that are at least as long as the minimum configured
> length, and
> >>>> passes through unchanged the position increment for the first ngram
> output
> >>>> for any given input term.
> >>>>
> >>>> So what happens in Karol's case is that "T." has the period stripped
> by
> >>>> StandardTokenizer, then is stemmed by Morfologik to produce terms
> "to",
> >>>> "tom" and "tona".  The first term "to" has a position increment of 1,
> but is
> >>>> not output by EdgeNGramTokenFilter, because it's length is below the
> >>>> configured minimum of 3.  The second term "tom" is given a position
> >>>> increment of 0 by Morfologik, and meets EdgeNGramTokenFilter's minimum
> >>>> length, so gets output, and since it's the first output term for the
> input
> >>>> term "tom", the input position increment is left as-is in the output
> term:
> >>>> 0.  That's how the first output term gets a position increment of 0.
> >>>>
> >>>> Before LUCENE-4810 was committed and included in Lucene/Solr 4.3.0,
> >>>> EdgeNGramTokenFilter indiscriminately set all output terms' position
> >>>> increments to 1, so that explains why this behavior didn't occur with
> >>>> previously released versions.
> >>>>
> >>>> I think the fix is a check in EdgeNGramTokenFilter when outputting the
> >>>> first term, that the position increment is greater than 0, and if
> it's not,
> >>>> then it should be set it to 1.
> >>>>
> >>>> Does anybody know if this could also be an issue for other filters?
> >>>>
> >>>> I'll work on a patch for EdgeNGramTokenFilter.
> >>>>
> >>>> Steve
> >>>>
> >>>> On Apr 21, 2013, at 9:21 AM, Karol Sikora <
> karol.sikora@laboratorium.ee>
> >>>> wrote:
> >>>>
> >>>>> hi,
> >>>>>
> >>>>> I extracted minimal failing example, solr configs(schema,
> >>>>> solrconfig.xml) and data are in attached archive.
> >>>>> I try to import simple document:
> >>>>> [
> >>>>>   {
> >>>>>       "publisher": [
> >>>>>           "T. Gl\u00fccksberg"
> >>>>>       ],
> >>>>>       "uid": "1000881"
> >>>>>   },
> >>>>>   {
> >>>>>       "publisher": [
> >>>>>     "Ala a kota"
> >>>>>       ],
> >>>>>       "uid": "1000894"
> >>>>>   }
> >>>>> ]
> >>>>> first fails on copyfield destination publisher_hl with exception
> >>>>> (trace: https://gist.github.com/anonymous/5429558), second is added
> without
> >>>>> any problems.
> >>>>> schema.xml is here: https://gist.github.com/anonymous/5429562
> >>>>>
> >>>>> When someone will trying to reproduce this behaviour remember to copy
> >>>>> libs related with morfologik and icu filters.
> >>>>>
> >>>>> This extracted example works fine with solr 4.0 - 4.2.1.
> >>>>>
> >>>>> Regards,
> >>>>> Karol
> >>>>>
> >>>>>
> >>>>>
> >>>>> W dniu 21.04.2013 09:03, Simon Willnauer pisze:
> >>>>>>
> >>>>>> hey karol,
> >>>>>>
> >>>>>> can you reproduce this behaviour in a small test-case (curl command
> or
> >>>>>> something like this) that we can reproduce?
> >>>>>>
> >>>>>> @solr guys any idea what this could be?
> >>>>>>
> >>>>>> simon
> >>>>>>
> >>>>>> On Sun, Apr 21, 2013 at 1:52 AM, Karol Sikora
> >>>>>>
> >>>>>> <ka...@laboratorium.ee>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I have problem with solr 4.3 RC2 on my testing data for searching
> >>>>>>> application which i'm developing.
> >>>>>>> A lot of importing records fails with exception
> >>>>>>> "java.lang.IllegalArgumentException: first position increment must
> be
> >>>>>>> > 0
> >>>>>>> (got 0)". On versions from early 4.0 to 4.2.1 all documents was
> added
> >>>>>>> successfully, so I'm thinking that something is broken in new
> >>>>>>> release.
> >>>>>>> I'll try examine tomorrow what is broken.
> >>>>>>>
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Karol
> >>>>>>>
> >>>>>>> W dniu 20.04.2013 21:07, Andi Vajda pisze:
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Sat, 20 Apr 2013, Simon Willnauer wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> Here is the RC:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> happy voting...
> >>>>>>>>>
> >>>>>>>>> here is my +1
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> PyLucene 4.3 builds and passes its tests.
> >>>>>>>>
> >>>>>>>> +1 !
> >>>>>>>>
> >>>>>>>> Andi..
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail:
> >>>>>>>> dev-unsubscribe@lucene.apache.org
> >>>>>>>>
> >>>>>>>> For additional commands, e-mail:
> >>>>>>>> dev-help@lucene.apache.org
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>> --
> >>>>>>> Karol Sikora
> >>>>>>> +48 781 493 788
> >>>>>>>
> >>>>>>> Laboratorium EE
> >>>>>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
> >>>>>>>
> >>>>>>> www.laboratorium.ee | www.laboratorium.ee/facebook
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail:
> >>>>>>> dev-unsubscribe@lucene.apache.org
> >>>>>>>
> >>>>>>> For additional commands, e-mail:
> >>>>>>> dev-help@lucene.apache.org
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail:
> >>>>>> dev-unsubscribe@lucene.apache.org
> >>>>>>
> >>>>>> For additional commands, e-mail:
> >>>>>> dev-help@lucene.apache.org
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Karol Sikora
> >>>>> Kierownik Informatyczny Projektu CBN - Interfejs 2.0
> >>>>> +48 781 493 788
> >>>>>
> >>>>> Laboratorium EE
> >>>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
> >>>>>
> >>>>> www.laboratorium.ee | www.laboratorium.ee/facebook
> >>>>
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: dev-help@lucene.apache.org
> >>>>
> >>>
> >>> --
> >>> Walter Underwood
> >>> wunder@wunderwood.org
> >>>
> >>>
> >>>
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Simon Willnauer <si...@gmail.com>.
I think we can add this to 4.3 I can roll another RC for that.

simon

On Mon, Apr 22, 2013 at 3:11 PM, Jack Krupansky <ja...@basetechnology.com> wrote:
> Is this a fix to 4.3 (RC3?) or for a 4.3.1?
>
> -- Jack Krupansky
>
> -----Original Message----- From: Steve Rowe
> Sent: Monday, April 22, 2013 2:07 AM
>
> To: dev@lucene.apache.org
> Subject: Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"
>
> I've reopened LUCENE-4810 and attached a patch with a test and fix for this
> problem. - Steve
>
> On Apr 22, 2013, at 1:09 AM, Steve Rowe <sa...@gmail.com> wrote:
>
>> Actually, Walter, I misspoke: Morfologik is a lemmatizer: it produces
>> surface forms.  Not really so incompatible, I think.
>>
>> Regardless of the choice to use this particular sequence of filters,
>> EdgeNGramTokenFilter shouldn't produce a bad stream.
>>
>> Steve
>>
>> On Apr 21, 2013, at 8:34 PM, Walter Underwood <wu...@wunderwood.org>
>> wrote:
>>
>>> Don't use a stemmer with edge ngrams.
>>>
>>> Edge ngrams are a tool for matching the surface word. Stemmers are a tool
>>> for matching the root. Those are logically incompatible transforms.
>>>
>>> wunder
>>>
>>> On Apr 21, 2013, at 5:21 PM, Steve Rowe wrote:
>>>
>>>> Karol has uncovered a bug introduced by LUCENE-4810
>>>> <https://issues.apache.org/jira/browse/LUCENE-4810>, included in Lucene/Solr
>>>> 4.3.0.
>>>>
>>>> The problem is an interaction between the Morfologik stemmer, which can
>>>> produce multiple stems per input term, all but the first having a position
>>>> increment of zero, and EdgeNGramTokenFilter, which only outputs ngrams for
>>>> input terms that are at least as long as the minimum configured length, and
>>>> passes through unchanged the position increment for the first ngram output
>>>> for any given input term.
>>>>
>>>> So what happens in Karol's case is that "T." has the period stripped by
>>>> StandardTokenizer, then is stemmed by Morfologik to produce terms "to",
>>>> "tom" and "tona".  The first term "to" has a position increment of 1, but is
>>>> not output by EdgeNGramTokenFilter, because it's length is below the
>>>> configured minimum of 3.  The second term "tom" is given a position
>>>> increment of 0 by Morfologik, and meets EdgeNGramTokenFilter's minimum
>>>> length, so gets output, and since it's the first output term for the input
>>>> term "tom", the input position increment is left as-is in the output term:
>>>> 0.  That's how the first output term gets a position increment of 0.
>>>>
>>>> Before LUCENE-4810 was committed and included in Lucene/Solr 4.3.0,
>>>> EdgeNGramTokenFilter indiscriminately set all output terms' position
>>>> increments to 1, so that explains why this behavior didn't occur with
>>>> previously released versions.
>>>>
>>>> I think the fix is a check in EdgeNGramTokenFilter when outputting the
>>>> first term, that the position increment is greater than 0, and if it's not,
>>>> then it should be set it to 1.
>>>>
>>>> Does anybody know if this could also be an issue for other filters?
>>>>
>>>> I'll work on a patch for EdgeNGramTokenFilter.
>>>>
>>>> Steve
>>>>
>>>> On Apr 21, 2013, at 9:21 AM, Karol Sikora <ka...@laboratorium.ee>
>>>> wrote:
>>>>
>>>>> hi,
>>>>>
>>>>> I extracted minimal failing example, solr configs(schema,
>>>>> solrconfig.xml) and data are in attached archive.
>>>>> I try to import simple document:
>>>>> [
>>>>>   {
>>>>>       "publisher": [
>>>>>           "T. Gl\u00fccksberg"
>>>>>       ],
>>>>>       "uid": "1000881"
>>>>>   },
>>>>>   {
>>>>>       "publisher": [
>>>>>     "Ala a kota"
>>>>>       ],
>>>>>       "uid": "1000894"
>>>>>   }
>>>>> ]
>>>>> first fails on copyfield destination publisher_hl with exception
>>>>> (trace: https://gist.github.com/anonymous/5429558), second is added without
>>>>> any problems.
>>>>> schema.xml is here: https://gist.github.com/anonymous/5429562
>>>>>
>>>>> When someone will trying to reproduce this behaviour remember to copy
>>>>> libs related with morfologik and icu filters.
>>>>>
>>>>> This extracted example works fine with solr 4.0 - 4.2.1.
>>>>>
>>>>> Regards,
>>>>> Karol
>>>>>
>>>>>
>>>>>
>>>>> W dniu 21.04.2013 09:03, Simon Willnauer pisze:
>>>>>>
>>>>>> hey karol,
>>>>>>
>>>>>> can you reproduce this behaviour in a small test-case (curl command or
>>>>>> something like this) that we can reproduce?
>>>>>>
>>>>>> @solr guys any idea what this could be?
>>>>>>
>>>>>> simon
>>>>>>
>>>>>> On Sun, Apr 21, 2013 at 1:52 AM, Karol Sikora
>>>>>>
>>>>>> <ka...@laboratorium.ee>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I have problem with solr 4.3 RC2 on my testing data for searching
>>>>>>> application which i'm developing.
>>>>>>> A lot of importing records fails with exception
>>>>>>> "java.lang.IllegalArgumentException: first position increment must be
>>>>>>> > 0
>>>>>>> (got 0)". On versions from early 4.0 to 4.2.1 all documents was added
>>>>>>> successfully, so I'm thinking that something is broken in new
>>>>>>> release.
>>>>>>> I'll try examine tomorrow what is broken.
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Karol
>>>>>>>
>>>>>>> W dniu 20.04.2013 21:07, Andi Vajda pisze:
>>>>>>>
>>>>>>>
>>>>>>>> On Sat, 20 Apr 2013, Simon Willnauer wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> Here is the RC:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> happy voting...
>>>>>>>>>
>>>>>>>>> here is my +1
>>>>>>>>>
>>>>>>>>
>>>>>>>> PyLucene 4.3 builds and passes its tests.
>>>>>>>>
>>>>>>>> +1 !
>>>>>>>>
>>>>>>>> Andi..
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail:
>>>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>>>>
>>>>>>>> For additional commands, e-mail:
>>>>>>>> dev-help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>>> Karol Sikora
>>>>>>> +48 781 493 788
>>>>>>>
>>>>>>> Laboratorium EE
>>>>>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
>>>>>>>
>>>>>>> www.laboratorium.ee | www.laboratorium.ee/facebook
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail:
>>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>>>
>>>>>>> For additional commands, e-mail:
>>>>>>> dev-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail:
>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>>
>>>>>> For additional commands, e-mail:
>>>>>> dev-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Karol Sikora
>>>>> Kierownik Informatyczny Projektu CBN - Interfejs 2.0
>>>>> +48 781 493 788
>>>>>
>>>>> Laboratorium EE
>>>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
>>>>>
>>>>> www.laboratorium.ee | www.laboratorium.ee/facebook
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>>
>>>
>>> --
>>> Walter Underwood
>>> wunder@wunderwood.org
>>>
>>>
>>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Jack Krupansky <ja...@basetechnology.com>.
Is this a fix to 4.3 (RC3?) or for a 4.3.1?

-- Jack Krupansky

-----Original Message----- 
From: Steve Rowe
Sent: Monday, April 22, 2013 2:07 AM
To: dev@lucene.apache.org
Subject: Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

I've reopened LUCENE-4810 and attached a patch with a test and fix for this 
problem. - Steve

On Apr 22, 2013, at 1:09 AM, Steve Rowe <sa...@gmail.com> wrote:

> Actually, Walter, I misspoke: Morfologik is a lemmatizer: it produces 
> surface forms.  Not really so incompatible, I think.
>
> Regardless of the choice to use this particular sequence of filters, 
> EdgeNGramTokenFilter shouldn't produce a bad stream.
>
> Steve
>
> On Apr 21, 2013, at 8:34 PM, Walter Underwood <wu...@wunderwood.org> 
> wrote:
>
>> Don't use a stemmer with edge ngrams.
>>
>> Edge ngrams are a tool for matching the surface word. Stemmers are a tool 
>> for matching the root. Those are logically incompatible transforms.
>>
>> wunder
>>
>> On Apr 21, 2013, at 5:21 PM, Steve Rowe wrote:
>>
>>> Karol has uncovered a bug introduced by LUCENE-4810 
>>> <https://issues.apache.org/jira/browse/LUCENE-4810>, included in 
>>> Lucene/Solr 4.3.0.
>>>
>>> The problem is an interaction between the Morfologik stemmer, which can 
>>> produce multiple stems per input term, all but the first having a 
>>> position increment of zero, and EdgeNGramTokenFilter, which only outputs 
>>> ngrams for input terms that are at least as long as the minimum 
>>> configured length, and passes through unchanged the position increment 
>>> for the first ngram output for any given input term.
>>>
>>> So what happens in Karol's case is that "T." has the period stripped by 
>>> StandardTokenizer, then is stemmed by Morfologik to produce terms "to", 
>>> "tom" and "tona".  The first term "to" has a position increment of 1, 
>>> but is not output by EdgeNGramTokenFilter, because it's length is below 
>>> the configured minimum of 3.  The second term "tom" is given a position 
>>> increment of 0 by Morfologik, and meets EdgeNGramTokenFilter's minimum 
>>> length, so gets output, and since it's the first output term for the 
>>> input term "tom", the input position increment is left as-is in the 
>>> output term: 0.  That's how the first output term gets a position 
>>> increment of 0.
>>>
>>> Before LUCENE-4810 was committed and included in Lucene/Solr 4.3.0, 
>>> EdgeNGramTokenFilter indiscriminately set all output terms' position 
>>> increments to 1, so that explains why this behavior didn't occur with 
>>> previously released versions.
>>>
>>> I think the fix is a check in EdgeNGramTokenFilter when outputting the 
>>> first term, that the position increment is greater than 0, and if it's 
>>> not, then it should be set it to 1.
>>>
>>> Does anybody know if this could also be an issue for other filters?
>>>
>>> I'll work on a patch for EdgeNGramTokenFilter.
>>>
>>> Steve
>>>
>>> On Apr 21, 2013, at 9:21 AM, Karol Sikora <ka...@laboratorium.ee> 
>>> wrote:
>>>
>>>> hi,
>>>>
>>>> I extracted minimal failing example, solr configs(schema, 
>>>> solrconfig.xml) and data are in attached archive.
>>>> I try to import simple document:
>>>> [
>>>>   {
>>>>       "publisher": [
>>>>           "T. Gl\u00fccksberg"
>>>>       ],
>>>>       "uid": "1000881"
>>>>   },
>>>>   {
>>>>       "publisher": [
>>>>     "Ala a kota"
>>>>       ],
>>>>       "uid": "1000894"
>>>>   }
>>>> ]
>>>> first fails on copyfield destination publisher_hl with exception 
>>>> (trace: https://gist.github.com/anonymous/5429558), second is added 
>>>> without any problems.
>>>> schema.xml is here: https://gist.github.com/anonymous/5429562
>>>>
>>>> When someone will trying to reproduce this behaviour remember to copy 
>>>> libs related with morfologik and icu filters.
>>>>
>>>> This extracted example works fine with solr 4.0 - 4.2.1.
>>>>
>>>> Regards,
>>>> Karol
>>>>
>>>>
>>>>
>>>> W dniu 21.04.2013 09:03, Simon Willnauer pisze:
>>>>> hey karol,
>>>>>
>>>>> can you reproduce this behaviour in a small test-case (curl command or
>>>>> something like this) that we can reproduce?
>>>>>
>>>>> @solr guys any idea what this could be?
>>>>>
>>>>> simon
>>>>>
>>>>> On Sun, Apr 21, 2013 at 1:52 AM, Karol Sikora
>>>>>
>>>>> <ka...@laboratorium.ee>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I have problem with solr 4.3 RC2 on my testing data for searching
>>>>>> application which i'm developing.
>>>>>> A lot of importing records fails with exception
>>>>>> "java.lang.IllegalArgumentException: first position increment must be 
>>>>>>  > 0
>>>>>> (got 0)". On versions from early 4.0 to 4.2.1 all documents was added
>>>>>> successfully, so I'm thinking that something is broken in new 
>>>>>> release.
>>>>>> I'll try examine tomorrow what is broken.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Karol
>>>>>>
>>>>>> W dniu 20.04.2013 21:07, Andi Vajda pisze:
>>>>>>
>>>>>>
>>>>>>> On Sat, 20 Apr 2013, Simon Willnauer wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Here is the RC:
>>>>>>>>
>>>>>>>>
>>>>>>>> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
>>>>>>>>
>>>>>>>>
>>>>>>>> happy voting...
>>>>>>>>
>>>>>>>> here is my +1
>>>>>>>>
>>>>>>>
>>>>>>> PyLucene 4.3 builds and passes its tests.
>>>>>>>
>>>>>>> +1 !
>>>>>>>
>>>>>>> Andi..
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail:
>>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>>>
>>>>>>> For additional commands, e-mail:
>>>>>>> dev-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Karol Sikora
>>>>>> +48 781 493 788
>>>>>>
>>>>>> Laboratorium EE
>>>>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
>>>>>>
>>>>>> www.laboratorium.ee | www.laboratorium.ee/facebook
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail:
>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>>
>>>>>> For additional commands, e-mail:
>>>>>> dev-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail:
>>>>> dev-unsubscribe@lucene.apache.org
>>>>>
>>>>> For additional commands, e-mail:
>>>>> dev-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> -- 
>>>>
>>>> Karol Sikora
>>>> Kierownik Informatyczny Projektu CBN - Interfejs 2.0
>>>> +48 781 493 788
>>>>
>>>> Laboratorium EE
>>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
>>>>
>>>> www.laboratorium.ee | www.laboratorium.ee/facebook
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>
>> --
>> Walter Underwood
>> wunder@wunderwood.org
>>
>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Steve Rowe <sa...@gmail.com>.
I've reopened LUCENE-4810 and attached a patch with a test and fix for this problem. - Steve

On Apr 22, 2013, at 1:09 AM, Steve Rowe <sa...@gmail.com> wrote:

> Actually, Walter, I misspoke: Morfologik is a lemmatizer: it produces surface forms.  Not really so incompatible, I think.
> 
> Regardless of the choice to use this particular sequence of filters, EdgeNGramTokenFilter shouldn't produce a bad stream.
> 
> Steve
> 
> On Apr 21, 2013, at 8:34 PM, Walter Underwood <wu...@wunderwood.org> wrote:
> 
>> Don't use a stemmer with edge ngrams.
>> 
>> Edge ngrams are a tool for matching the surface word. Stemmers are a tool for matching the root. Those are logically incompatible transforms. 
>> 
>> wunder
>> 
>> On Apr 21, 2013, at 5:21 PM, Steve Rowe wrote:
>> 
>>> Karol has uncovered a bug introduced by LUCENE-4810 <https://issues.apache.org/jira/browse/LUCENE-4810>, included in Lucene/Solr 4.3.0.
>>> 
>>> The problem is an interaction between the Morfologik stemmer, which can produce multiple stems per input term, all but the first having a position increment of zero, and EdgeNGramTokenFilter, which only outputs ngrams for input terms that are at least as long as the minimum configured length, and passes through unchanged the position increment for the first ngram output for any given input term.
>>> 
>>> So what happens in Karol's case is that "T." has the period stripped by StandardTokenizer, then is stemmed by Morfologik to produce terms "to", "tom" and "tona".  The first term "to" has a position increment of 1, but is not output by EdgeNGramTokenFilter, because it's length is below the configured minimum of 3.  The second term "tom" is given a position increment of 0 by Morfologik, and meets EdgeNGramTokenFilter's minimum length, so gets output, and since it's the first output term for the input term "tom", the input position increment is left as-is in the output term: 0.  That's how the first output term gets a position increment of 0.
>>> 
>>> Before LUCENE-4810 was committed and included in Lucene/Solr 4.3.0, EdgeNGramTokenFilter indiscriminately set all output terms' position increments to 1, so that explains why this behavior didn't occur with previously released versions.
>>> 
>>> I think the fix is a check in EdgeNGramTokenFilter when outputting the first term, that the position increment is greater than 0, and if it's not, then it should be set it to 1.
>>> 
>>> Does anybody know if this could also be an issue for other filters?
>>> 
>>> I'll work on a patch for EdgeNGramTokenFilter.
>>> 
>>> Steve
>>> 
>>> On Apr 21, 2013, at 9:21 AM, Karol Sikora <ka...@laboratorium.ee> wrote:
>>> 
>>>> hi,
>>>> 
>>>> I extracted minimal failing example, solr configs(schema, solrconfig.xml) and data are in attached archive.
>>>> I try to import simple document:
>>>> [
>>>>   {
>>>>       "publisher": [
>>>>           "T. Gl\u00fccksberg"
>>>>       ],  
>>>>       "uid": "1000881" 
>>>>   }, 
>>>>   {
>>>>       "publisher": [
>>>>     "Ala a kota"
>>>>       ],
>>>>       "uid": "1000894"
>>>>   }
>>>> ]
>>>> first fails on copyfield destination publisher_hl with exception (trace: https://gist.github.com/anonymous/5429558), second is added without any problems.
>>>> schema.xml is here: https://gist.github.com/anonymous/5429562
>>>> 
>>>> When someone will trying to reproduce this behaviour remember to copy libs related with morfologik and icu filters.
>>>> 
>>>> This extracted example works fine with solr 4.0 - 4.2.1.
>>>> 
>>>> Regards,
>>>> Karol
>>>> 
>>>> 
>>>> 
>>>> W dniu 21.04.2013 09:03, Simon Willnauer pisze:
>>>>> hey karol,
>>>>> 
>>>>> can you reproduce this behaviour in a small test-case (curl command or
>>>>> something like this) that we can reproduce?
>>>>> 
>>>>> @solr guys any idea what this could be?
>>>>> 
>>>>> simon
>>>>> 
>>>>> On Sun, Apr 21, 2013 at 1:52 AM, Karol Sikora
>>>>> 
>>>>> <ka...@laboratorium.ee>
>>>>> wrote:
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> I have problem with solr 4.3 RC2 on my testing data for searching
>>>>>> application which i'm developing.
>>>>>> A lot of importing records fails with exception
>>>>>> "java.lang.IllegalArgumentException: first position increment must be > 0
>>>>>> (got 0)". On versions from early 4.0 to 4.2.1 all documents was added
>>>>>> successfully, so I'm thinking that something is broken in new release.
>>>>>> I'll try examine tomorrow what is broken.
>>>>>> 
>>>>>> 
>>>>>> Regards,
>>>>>> Karol
>>>>>> 
>>>>>> W dniu 20.04.2013 21:07, Andi Vajda pisze:
>>>>>> 
>>>>>> 
>>>>>>> On Sat, 20 Apr 2013, Simon Willnauer wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>> Here is the RC:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
>>>>>>>> 
>>>>>>>> 
>>>>>>>> happy voting...
>>>>>>>> 
>>>>>>>> here is my +1
>>>>>>>> 
>>>>>>> 
>>>>>>> PyLucene 4.3 builds and passes its tests.
>>>>>>> 
>>>>>>> +1 !
>>>>>>> 
>>>>>>> Andi..
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: 
>>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>>> 
>>>>>>> For additional commands, e-mail: 
>>>>>>> dev-help@lucene.apache.org
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> --
>>>>>> Karol Sikora
>>>>>> +48 781 493 788
>>>>>> 
>>>>>> Laboratorium EE
>>>>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
>>>>>> 
>>>>>> www.laboratorium.ee | www.laboratorium.ee/facebook
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: 
>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>> 
>>>>>> For additional commands, e-mail: 
>>>>>> dev-help@lucene.apache.org
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: 
>>>>> dev-unsubscribe@lucene.apache.org
>>>>> 
>>>>> For additional commands, e-mail: 
>>>>> dev-help@lucene.apache.org
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> -- 
>>>> 
>>>> Karol Sikora
>>>> Kierownik Informatyczny Projektu CBN - Interfejs 2.0
>>>> +48 781 493 788
>>>> 
>>>> Laboratorium EE
>>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
>>>> 
>>>> www.laboratorium.ee | www.laboratorium.ee/facebook
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>> 
>> 
>> --
>> Walter Underwood
>> wunder@wunderwood.org
>> 
>> 
>> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Steve Rowe <sa...@gmail.com>.
Actually, Walter, I misspoke: Morfologik is a lemmatizer: it produces surface forms.  Not really so incompatible, I think.

Regardless of the choice to use this particular sequence of filters, EdgeNGramTokenFilter shouldn't produce a bad stream.

Steve

On Apr 21, 2013, at 8:34 PM, Walter Underwood <wu...@wunderwood.org> wrote:

> Don't use a stemmer with edge ngrams.
> 
> Edge ngrams are a tool for matching the surface word. Stemmers are a tool for matching the root. Those are logically incompatible transforms. 
> 
> wunder
> 
> On Apr 21, 2013, at 5:21 PM, Steve Rowe wrote:
> 
>> Karol has uncovered a bug introduced by LUCENE-4810 <https://issues.apache.org/jira/browse/LUCENE-4810>, included in Lucene/Solr 4.3.0.
>> 
>> The problem is an interaction between the Morfologik stemmer, which can produce multiple stems per input term, all but the first having a position increment of zero, and EdgeNGramTokenFilter, which only outputs ngrams for input terms that are at least as long as the minimum configured length, and passes through unchanged the position increment for the first ngram output for any given input term.
>> 
>> So what happens in Karol's case is that "T." has the period stripped by StandardTokenizer, then is stemmed by Morfologik to produce terms "to", "tom" and "tona".  The first term "to" has a position increment of 1, but is not output by EdgeNGramTokenFilter, because it's length is below the configured minimum of 3.  The second term "tom" is given a position increment of 0 by Morfologik, and meets EdgeNGramTokenFilter's minimum length, so gets output, and since it's the first output term for the input term "tom", the input position increment is left as-is in the output term: 0.  That's how the first output term gets a position increment of 0.
>> 
>> Before LUCENE-4810 was committed and included in Lucene/Solr 4.3.0, EdgeNGramTokenFilter indiscriminately set all output terms' position increments to 1, so that explains why this behavior didn't occur with previously released versions.
>> 
>> I think the fix is a check in EdgeNGramTokenFilter when outputting the first term, that the position increment is greater than 0, and if it's not, then it should be set it to 1.
>> 
>> Does anybody know if this could also be an issue for other filters?
>> 
>> I'll work on a patch for EdgeNGramTokenFilter.
>> 
>> Steve
>> 
>> On Apr 21, 2013, at 9:21 AM, Karol Sikora <ka...@laboratorium.ee> wrote:
>> 
>>> hi,
>>> 
>>> I extracted minimal failing example, solr configs(schema, solrconfig.xml) and data are in attached archive.
>>> I try to import simple document:
>>> [
>>>    {
>>>        "publisher": [
>>>            "T. Gl\u00fccksberg"
>>>        ],  
>>>        "uid": "1000881" 
>>>    }, 
>>>    {
>>>        "publisher": [
>>>      "Ala a kota"
>>>        ],
>>>        "uid": "1000894"
>>>    }
>>> ]
>>> first fails on copyfield destination publisher_hl with exception (trace: https://gist.github.com/anonymous/5429558), second is added without any problems.
>>> schema.xml is here: https://gist.github.com/anonymous/5429562
>>> 
>>> When someone will trying to reproduce this behaviour remember to copy libs related with morfologik and icu filters.
>>> 
>>> This extracted example works fine with solr 4.0 - 4.2.1.
>>> 
>>> Regards,
>>> Karol
>>> 
>>> 
>>> 
>>> W dniu 21.04.2013 09:03, Simon Willnauer pisze:
>>>> hey karol,
>>>> 
>>>> can you reproduce this behaviour in a small test-case (curl command or
>>>> something like this) that we can reproduce?
>>>> 
>>>> @solr guys any idea what this could be?
>>>> 
>>>> simon
>>>> 
>>>> On Sun, Apr 21, 2013 at 1:52 AM, Karol Sikora
>>>> 
>>>> <ka...@laboratorium.ee>
>>>> wrote:
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I have problem with solr 4.3 RC2 on my testing data for searching
>>>>> application which i'm developing.
>>>>> A lot of importing records fails with exception
>>>>> "java.lang.IllegalArgumentException: first position increment must be > 0
>>>>> (got 0)". On versions from early 4.0 to 4.2.1 all documents was added
>>>>> successfully, so I'm thinking that something is broken in new release.
>>>>> I'll try examine tomorrow what is broken.
>>>>> 
>>>>> 
>>>>> Regards,
>>>>> Karol
>>>>> 
>>>>> W dniu 20.04.2013 21:07, Andi Vajda pisze:
>>>>> 
>>>>> 
>>>>>> On Sat, 20 Apr 2013, Simon Willnauer wrote:
>>>>>> 
>>>>>> 
>>>>>>> Here is the RC:
>>>>>>> 
>>>>>>> 
>>>>>>> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
>>>>>>> 
>>>>>>> 
>>>>>>> happy voting...
>>>>>>> 
>>>>>>> here is my +1
>>>>>>> 
>>>>>> 
>>>>>> PyLucene 4.3 builds and passes its tests.
>>>>>> 
>>>>>> +1 !
>>>>>> 
>>>>>> Andi..
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: 
>>>>>> dev-unsubscribe@lucene.apache.org
>>>>>> 
>>>>>> For additional commands, e-mail: 
>>>>>> dev-help@lucene.apache.org
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> --
>>>>> Karol Sikora
>>>>> +48 781 493 788
>>>>> 
>>>>> Laboratorium EE
>>>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
>>>>> 
>>>>> www.laboratorium.ee | www.laboratorium.ee/facebook
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: 
>>>>> dev-unsubscribe@lucene.apache.org
>>>>> 
>>>>> For additional commands, e-mail: 
>>>>> dev-help@lucene.apache.org
>>>>> 
>>>>> 
>>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: 
>>>> dev-unsubscribe@lucene.apache.org
>>>> 
>>>> For additional commands, e-mail: 
>>>> dev-help@lucene.apache.org
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> -- 
>>> 
>>> Karol Sikora
>>> Kierownik Informatyczny Projektu CBN - Interfejs 2.0
>>> +48 781 493 788
>>> 
>>> Laboratorium EE
>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
>>> 
>>> www.laboratorium.ee | www.laboratorium.ee/facebook
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>> 
> 
> --
> Walter Underwood
> wunder@wunderwood.org
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Walter Underwood <wu...@wunderwood.org>.
Don't use a stemmer with edge ngrams.

Edge ngrams are a tool for matching the surface word. Stemmers are a tool for matching the root. Those are logically incompatible transforms. 

wunder

On Apr 21, 2013, at 5:21 PM, Steve Rowe wrote:

> Karol has uncovered a bug introduced by LUCENE-4810 <https://issues.apache.org/jira/browse/LUCENE-4810>, included in Lucene/Solr 4.3.0.
> 
> The problem is an interaction between the Morfologik stemmer, which can produce multiple stems per input term, all but the first having a position increment of zero, and EdgeNGramTokenFilter, which only outputs ngrams for input terms that are at least as long as the minimum configured length, and passes through unchanged the position increment for the first ngram output for any given input term.
> 
> So what happens in Karol's case is that "T." has the period stripped by StandardTokenizer, then is stemmed by Morfologik to produce terms "to", "tom" and "tona".  The first term "to" has a position increment of 1, but is not output by EdgeNGramTokenFilter, because it's length is below the configured minimum of 3.  The second term "tom" is given a position increment of 0 by Morfologik, and meets EdgeNGramTokenFilter's minimum length, so gets output, and since it's the first output term for the input term "tom", the input position increment is left as-is in the output term: 0.  That's how the first output term gets a position increment of 0.
> 
> Before LUCENE-4810 was committed and included in Lucene/Solr 4.3.0, EdgeNGramTokenFilter indiscriminately set all output terms' position increments to 1, so that explains why this behavior didn't occur with previously released versions.
> 
> I think the fix is a check in EdgeNGramTokenFilter when outputting the first term, that the position increment is greater than 0, and if it's not, then it should be set it to 1.
> 
> Does anybody know if this could also be an issue for other filters?
> 
> I'll work on a patch for EdgeNGramTokenFilter.
> 
> Steve
> 
> On Apr 21, 2013, at 9:21 AM, Karol Sikora <ka...@laboratorium.ee> wrote:
> 
>> hi,
>> 
>> I extracted minimal failing example, solr configs(schema, solrconfig.xml) and data are in attached archive.
>> I try to import simple document:
>> [
>>    {
>>        "publisher": [
>>            "T. Gl\u00fccksberg"
>>        ],  
>>        "uid": "1000881" 
>>    }, 
>>    {
>>        "publisher": [
>>      "Ala a kota"
>>        ],
>>        "uid": "1000894"
>>    }
>> ]
>> first fails on copyfield destination publisher_hl with exception (trace: https://gist.github.com/anonymous/5429558), second is added without any problems.
>> schema.xml is here: https://gist.github.com/anonymous/5429562
>> 
>> When someone will trying to reproduce this behaviour remember to copy libs related with morfologik and icu filters.
>> 
>> This extracted example works fine with solr 4.0 - 4.2.1.
>> 
>> Regards,
>> Karol
>> 
>> 
>> 
>> W dniu 21.04.2013 09:03, Simon Willnauer pisze:
>>> hey karol,
>>> 
>>> can you reproduce this behaviour in a small test-case (curl command or
>>> something like this) that we can reproduce?
>>> 
>>> @solr guys any idea what this could be?
>>> 
>>> simon
>>> 
>>> On Sun, Apr 21, 2013 at 1:52 AM, Karol Sikora
>>> 
>>> <ka...@laboratorium.ee>
>>> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I have problem with solr 4.3 RC2 on my testing data for searching
>>>> application which i'm developing.
>>>> A lot of importing records fails with exception
>>>> "java.lang.IllegalArgumentException: first position increment must be > 0
>>>> (got 0)". On versions from early 4.0 to 4.2.1 all documents was added
>>>> successfully, so I'm thinking that something is broken in new release.
>>>> I'll try examine tomorrow what is broken.
>>>> 
>>>> 
>>>> Regards,
>>>> Karol
>>>> 
>>>> W dniu 20.04.2013 21:07, Andi Vajda pisze:
>>>> 
>>>> 
>>>>> On Sat, 20 Apr 2013, Simon Willnauer wrote:
>>>>> 
>>>>> 
>>>>>> Here is the RC:
>>>>>> 
>>>>>> 
>>>>>> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
>>>>>> 
>>>>>> 
>>>>>> happy voting...
>>>>>> 
>>>>>> here is my +1
>>>>>> 
>>>>> 
>>>>> PyLucene 4.3 builds and passes its tests.
>>>>> 
>>>>> +1 !
>>>>> 
>>>>> Andi..
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: 
>>>>> dev-unsubscribe@lucene.apache.org
>>>>> 
>>>>> For additional commands, e-mail: 
>>>>> dev-help@lucene.apache.org
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> --
>>>> Karol Sikora
>>>> +48 781 493 788
>>>> 
>>>> Laboratorium EE
>>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
>>>> 
>>>> www.laboratorium.ee | www.laboratorium.ee/facebook
>>>> 
>>>> 
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: 
>>>> dev-unsubscribe@lucene.apache.org
>>>> 
>>>> For additional commands, e-mail: 
>>>> dev-help@lucene.apache.org
>>>> 
>>>> 
>>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: 
>>> dev-unsubscribe@lucene.apache.org
>>> 
>>> For additional commands, e-mail: 
>>> dev-help@lucene.apache.org
>>> 
>>> 
>>> 
>>> 
>> 
>> -- 
>> 
>> Karol Sikora
>> Kierownik Informatyczny Projektu CBN - Interfejs 2.0
>> +48 781 493 788
>> 
>> Laboratorium EE
>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
>> 
>> www.laboratorium.ee | www.laboratorium.ee/facebook
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 

--
Walter Underwood
wunder@wunderwood.org




Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Karol Sikora <ka...@laboratorium.ee>.
Steve, thanks for investigating and fixing this problem. Your patch 
attached to issue fixes my problem.
So here is my little (and probably meaningless ;) ) vote: +1 :)

Walter, as Steve says, morfologik is a lemmatizer. It isn't really 
incomaptible, and solves one of requirement from client: highlight not 
only full matched phrases but also matched parts of them.


W dniu 22.04.2013 02:21, Steve Rowe pisze:
> Karol has uncovered a bug introduced by LUCENE-4810 <https://issues.apache.org/jira/browse/LUCENE-4810>, included in Lucene/Solr 4.3.0.
>
> The problem is an interaction between the Morfologik stemmer, which can produce multiple stems per input term, all but the first having a position increment of zero, and EdgeNGramTokenFilter, which only outputs ngrams for input terms that are at least as long as the minimum configured length, and passes through unchanged the position increment for the first ngram output for any given input term.
>
> So what happens in Karol's case is that "T." has the period stripped by StandardTokenizer, then is stemmed by Morfologik to produce terms "to", "tom" and "tona".  The first term "to" has a position increment of 1, but is not output by EdgeNGramTokenFilter, because it's length is below the configured minimum of 3.  The second term "tom" is given a position increment of 0 by Morfologik, and meets EdgeNGramTokenFilter's minimum length, so gets output, and since it's the first output term for the input term "tom", the input position increment is left as-is in the output term: 0.  That's how the first output term gets a position increment of 0.
>
> Before LUCENE-4810 was committed and included in Lucene/Solr 4.3.0, EdgeNGramTokenFilter indiscriminately set all output terms' position increments to 1, so that explains why this behavior didn't occur with previously released versions.
>
> I think the fix is a check in EdgeNGramTokenFilter when outputting the first term, that the position increment is greater than 0, and if it's not, then it should be set it to 1.
>
> Does anybody know if this could also be an issue for other filters?
>
> I'll work on a patch for EdgeNGramTokenFilter.
>
> Steve
>
> On Apr 21, 2013, at 9:21 AM, Karol Sikora <ka...@laboratorium.ee> wrote:
>
>> hi,
>>
>> I extracted minimal failing example, solr configs(schema, solrconfig.xml) and data are in attached archive.
>> I try to import simple document:
>> [
>>      {
>>          "publisher": [
>>              "T. Gl\u00fccksberg"
>>          ],
>>          "uid": "1000881"
>>      },
>>      {
>>          "publisher": [
>>        "Ala a kota"
>>          ],
>>          "uid": "1000894"
>>      }
>> ]
>> first fails on copyfield destination publisher_hl with exception (trace: https://gist.github.com/anonymous/5429558), second is added without any problems.
>> schema.xml is here: https://gist.github.com/anonymous/5429562
>>
>> When someone will trying to reproduce this behaviour remember to copy libs related with morfologik and icu filters.
>>
>> This extracted example works fine with solr 4.0 - 4.2.1.
>>
>> Regards,
>> Karol
>>
>>
>>
>> W dniu 21.04.2013 09:03, Simon Willnauer pisze:
>>> hey karol,
>>>
>>> can you reproduce this behaviour in a small test-case (curl command or
>>> something like this) that we can reproduce?
>>>
>>> @solr guys any idea what this could be?
>>>
>>> simon
>>>
>>> On Sun, Apr 21, 2013 at 1:52 AM, Karol Sikora
>>>
>>> <ka...@laboratorium.ee>
>>>   wrote:
>>>
>>>> Hi all,
>>>>
>>>> I have problem with solr 4.3 RC2 on my testing data for searching
>>>> application which i'm developing.
>>>> A lot of importing records fails with exception
>>>> "java.lang.IllegalArgumentException: first position increment must be > 0
>>>> (got 0)". On versions from early 4.0 to 4.2.1 all documents was added
>>>> successfully, so I'm thinking that something is broken in new release.
>>>> I'll try examine tomorrow what is broken.
>>>>
>>>>
>>>> Regards,
>>>> Karol
>>>>
>>>> W dniu 20.04.2013 21:07, Andi Vajda pisze:
>>>>
>>>>
>>>>> On Sat, 20 Apr 2013, Simon Willnauer wrote:
>>>>>
>>>>>
>>>>>> Here is the RC:
>>>>>>
>>>>>>
>>>>>> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
>>>>>>
>>>>>>
>>>>>> happy voting...
>>>>>>
>>>>>> here is my +1
>>>>>>
>>>>> PyLucene 4.3 builds and passes its tests.
>>>>>
>>>>> +1 !
>>>>>
>>>>> Andi..
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail:
>>>>> dev-unsubscribe@lucene.apache.org
>>>>>
>>>>> For additional commands, e-mail:
>>>>> dev-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>> --
>>>>   Karol Sikora
>>>> +48 781 493 788
>>>>
>>>> Laboratorium EE
>>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
>>>>
>>>> www.laboratorium.ee | www.laboratorium.ee/facebook
>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail:
>>>> dev-unsubscribe@lucene.apache.org
>>>>
>>>> For additional commands, e-mail:
>>>> dev-help@lucene.apache.org
>>>>
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> dev-unsubscribe@lucene.apache.org
>>>
>>> For additional commands, e-mail:
>>> dev-help@lucene.apache.org
>>>
>>>
>>>
>>>
>> -- 
>>   
>> Karol Sikora
>> Kierownik Informatyczny Projektu CBN - Interfejs 2.0
>> +48 781 493 788
>>
>> Laboratorium EE
>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
>>
>> www.laboratorium.ee | www.laboratorium.ee/facebook
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
> .
>

-- 
  
Karol Sikora
Kierownik Informatyczny Projektu CBN - Interfejs 2.0
+48 781 493 788

Laboratorium EE
ul. Mokotowska 46A/23 | 00-543 Warszawa |
www.laboratorium.ee | www.laboratorium.ee/facebook


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Steve Rowe <sa...@gmail.com>.
Karol has uncovered a bug introduced by LUCENE-4810 <https://issues.apache.org/jira/browse/LUCENE-4810>, included in Lucene/Solr 4.3.0.

The problem is an interaction between the Morfologik stemmer, which can produce multiple stems per input term, all but the first having a position increment of zero, and EdgeNGramTokenFilter, which only outputs ngrams for input terms that are at least as long as the minimum configured length, and passes through unchanged the position increment for the first ngram output for any given input term.

So what happens in Karol's case is that "T." has the period stripped by StandardTokenizer, then is stemmed by Morfologik to produce terms "to", "tom" and "tona".  The first term "to" has a position increment of 1, but is not output by EdgeNGramTokenFilter, because it's length is below the configured minimum of 3.  The second term "tom" is given a position increment of 0 by Morfologik, and meets EdgeNGramTokenFilter's minimum length, so gets output, and since it's the first output term for the input term "tom", the input position increment is left as-is in the output term: 0.  That's how the first output term gets a position increment of 0.

Before LUCENE-4810 was committed and included in Lucene/Solr 4.3.0, EdgeNGramTokenFilter indiscriminately set all output terms' position increments to 1, so that explains why this behavior didn't occur with previously released versions.

I think the fix is a check in EdgeNGramTokenFilter when outputting the first term, that the position increment is greater than 0, and if it's not, then it should be set it to 1.

Does anybody know if this could also be an issue for other filters?

I'll work on a patch for EdgeNGramTokenFilter.

Steve

On Apr 21, 2013, at 9:21 AM, Karol Sikora <ka...@laboratorium.ee> wrote:

> hi,
> 
> I extracted minimal failing example, solr configs(schema, solrconfig.xml) and data are in attached archive.
> I try to import simple document:
> [
>     {
>         "publisher": [
>             "T. Gl\u00fccksberg"
>         ],  
>         "uid": "1000881" 
>     }, 
>     {
>         "publisher": [
>       "Ala a kota"
>         ],
>         "uid": "1000894"
>     }
> ]
> first fails on copyfield destination publisher_hl with exception (trace: https://gist.github.com/anonymous/5429558), second is added without any problems.
> schema.xml is here: https://gist.github.com/anonymous/5429562
> 
> When someone will trying to reproduce this behaviour remember to copy libs related with morfologik and icu filters.
> 
> This extracted example works fine with solr 4.0 - 4.2.1.
> 
> Regards,
> Karol
> 
> 
> 
> W dniu 21.04.2013 09:03, Simon Willnauer pisze:
>> hey karol,
>> 
>> can you reproduce this behaviour in a small test-case (curl command or
>> something like this) that we can reproduce?
>> 
>> @solr guys any idea what this could be?
>> 
>> simon
>> 
>> On Sun, Apr 21, 2013 at 1:52 AM, Karol Sikora
>> 
>> <ka...@laboratorium.ee>
>>  wrote:
>> 
>>> Hi all,
>>> 
>>> I have problem with solr 4.3 RC2 on my testing data for searching
>>> application which i'm developing.
>>> A lot of importing records fails with exception
>>> "java.lang.IllegalArgumentException: first position increment must be > 0
>>> (got 0)". On versions from early 4.0 to 4.2.1 all documents was added
>>> successfully, so I'm thinking that something is broken in new release.
>>> I'll try examine tomorrow what is broken.
>>> 
>>> 
>>> Regards,
>>> Karol
>>> 
>>> W dniu 20.04.2013 21:07, Andi Vajda pisze:
>>> 
>>> 
>>>> On Sat, 20 Apr 2013, Simon Willnauer wrote:
>>>> 
>>>> 
>>>>> Here is the RC:
>>>>> 
>>>>> 
>>>>> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
>>>>> 
>>>>> 
>>>>> happy voting...
>>>>> 
>>>>> here is my +1
>>>>> 
>>>> 
>>>> PyLucene 4.3 builds and passes its tests.
>>>> 
>>>> +1 !
>>>> 
>>>> Andi..
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: 
>>>> dev-unsubscribe@lucene.apache.org
>>>> 
>>>> For additional commands, e-mail: 
>>>> dev-help@lucene.apache.org
>>>> 
>>>> 
>>>> 
>>>> 
>>> --
>>>  Karol Sikora
>>> +48 781 493 788
>>> 
>>> Laboratorium EE
>>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
>>> 
>>> www.laboratorium.ee | www.laboratorium.ee/facebook
>>> 
>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: 
>>> dev-unsubscribe@lucene.apache.org
>>> 
>>> For additional commands, e-mail: 
>>> dev-help@lucene.apache.org
>>> 
>>> 
>>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: 
>> dev-unsubscribe@lucene.apache.org
>> 
>> For additional commands, e-mail: 
>> dev-help@lucene.apache.org
>> 
>> 
>> 
>> 
> 
> -- 
>  
> Karol Sikora
> Kierownik Informatyczny Projektu CBN - Interfejs 2.0
> +48 781 493 788
> 
> Laboratorium EE
> ul. Mokotowska 46A/23 | 00-543 Warszawa |
> 
> www.laboratorium.ee | www.laboratorium.ee/facebook


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Karol Sikora <ka...@laboratorium.ee>.
hi,

I extracted minimal failing example, solr configs(schema, 
solrconfig.xml) and data are in attached archive.
I try to import simple document:
[
     {
         "publisher": [
             "T. Gl\u00fccksberg"
         ],
         "uid": "1000881"
     },
     {
         "publisher": [
       "Ala a kota"
         ],
         "uid": "1000894"
     }
]
first fails on copyfield destination publisher_hl with exception (trace: 
https://gist.github.com/anonymous/5429558), second is added without any 
problems.
schema.xml is here: https://gist.github.com/anonymous/5429562

When someone will trying to reproduce this behaviour remember to copy 
libs related with morfologik and icu filters.

This extracted example works fine with solr 4.0 - 4.2.1.

Regards,
Karol



W dniu 21.04.2013 09:03, Simon Willnauer pisze:
> hey karol,
>
> can you reproduce this behaviour in a small test-case (curl command or
> something like this) that we can reproduce?
>
> @solr guys any idea what this could be?
>
> simon
>
> On Sun, Apr 21, 2013 at 1:52 AM, Karol Sikora
> <ka...@laboratorium.ee> wrote:
>> Hi all,
>>
>> I have problem with solr 4.3 RC2 on my testing data for searching
>> application which i'm developing.
>> A lot of importing records fails with exception
>> "java.lang.IllegalArgumentException: first position increment must be > 0
>> (got 0)". On versions from early 4.0 to 4.2.1 all documents was added
>> successfully, so I'm thinking that something is broken in new release.
>> I'll try examine tomorrow what is broken.
>>
>>
>> Regards,
>> Karol
>>
>> W dniu 20.04.2013 21:07, Andi Vajda pisze:
>>
>>> On Sat, 20 Apr 2013, Simon Willnauer wrote:
>>>
>>>> Here is the RC:
>>>>
>>>> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
>>>>
>>>> happy voting...
>>>>
>>>> here is my +1
>>>
>>> PyLucene 4.3 builds and passes its tests.
>>>
>>> +1 !
>>>
>>> Andi..
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>> --
>>   Karol Sikora
>> +48 781 493 788
>>
>> Laboratorium EE
>> ul. Mokotowska 46A/23 | 00-543 Warszawa |
>> www.laboratorium.ee | www.laboratorium.ee/facebook
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

-- 
  
Karol Sikora
Kierownik Informatyczny Projektu CBN - Interfejs 2.0
+48 781 493 788

Laboratorium EE
ul. Mokotowska 46A/23 | 00-543 Warszawa |
www.laboratorium.ee | www.laboratorium.ee/facebook


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Simon Willnauer <si...@gmail.com>.
hey karol,

can you reproduce this behaviour in a small test-case (curl command or
something like this) that we can reproduce?

@solr guys any idea what this could be?

simon

On Sun, Apr 21, 2013 at 1:52 AM, Karol Sikora
<ka...@laboratorium.ee> wrote:
> Hi all,
>
> I have problem with solr 4.3 RC2 on my testing data for searching
> application which i'm developing.
> A lot of importing records fails with exception
> "java.lang.IllegalArgumentException: first position increment must be > 0
> (got 0)". On versions from early 4.0 to 4.2.1 all documents was added
> successfully, so I'm thinking that something is broken in new release.
> I'll try examine tomorrow what is broken.
>
>
> Regards,
> Karol
>
> W dniu 20.04.2013 21:07, Andi Vajda pisze:
>
>>
>> On Sat, 20 Apr 2013, Simon Willnauer wrote:
>>
>>> Here is the RC:
>>>
>>> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
>>>
>>> happy voting...
>>>
>>> here is my +1
>>
>>
>> PyLucene 4.3 builds and passes its tests.
>>
>> +1 !
>>
>> Andi..
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>
> --
>  Karol Sikora
> +48 781 493 788
>
> Laboratorium EE
> ul. Mokotowska 46A/23 | 00-543 Warszawa |
> www.laboratorium.ee | www.laboratorium.ee/facebook
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Shai Erera <se...@gmail.com>.
Smoke tester finished successfully. +1 !

Shai


On Sun, Apr 21, 2013 at 2:52 AM, Karol Sikora
<ka...@laboratorium.ee>wrote:

> Hi all,
>
> I have problem with solr 4.3 RC2 on my testing data for searching
> application which i'm developing.
> A lot of importing records fails with exception "java.lang.**IllegalArgumentException:
> first position increment must be > 0 (got 0)". On versions from early 4.0
> to 4.2.1 all documents was added successfully, so I'm thinking that
> something is broken in new release.
> I'll try examine tomorrow what is broken.
>
>
> Regards,
> Karol
>
> W dniu 20.04.2013 21:07, Andi Vajda pisze:
>
>
>> On Sat, 20 Apr 2013, Simon Willnauer wrote:
>>
>>  Here is the RC:
>>> http://people.apache.org/~**simonw/staging_area/lucene-**
>>> solr-4.3.0-RC2-rev1470054<http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054>
>>>
>>> happy voting...
>>>
>>> here is my +1
>>>
>>
>> PyLucene 4.3 builds and passes its tests.
>>
>> +1 !
>>
>> Andi..
>>
>> ------------------------------**------------------------------**---------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.**org<de...@lucene.apache.org>
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>>
> --
>  Karol Sikora
> +48 781 493 788
>
> Laboratorium EE
> ul. Mokotowska 46A/23 | 00-543 Warszawa |
> www.laboratorium.ee | www.laboratorium.ee/facebook
>
>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.**org<de...@lucene.apache.org>
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Karol Sikora <ka...@laboratorium.ee>.
Hi,

here is my field types definition: http://pastebin.com/ZdR2Hfgu
I'll try to extract as little as possible reproduciable case, but it 
will take some time.

Regards,
Karols


W dniu 21.04.2013 09:22, Uwe Schindler pisze:
> Hi,
>
> can you provider your analysis chain (field type definitions) from schema.xml that is used while indexing? Those errors are generally be caused by broken Tokenizers/TokenFilters.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>> -----Original Message-----
>> From: Karol Sikora [mailto:karol.sikora@laboratorium.ee]
>> Sent: Sunday, April 21, 2013 1:53 AM
>> To: dev@lucene.apache.org
>> Subject: Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"
>>
>> Hi all,
>>
>> I have problem with solr 4.3 RC2 on my testing data for searching application
>> which i'm developing.
>> A lot of importing records fails with exception
>> "java.lang.IllegalArgumentException: first position increment must be >
>> 0 (got 0)". On versions from early 4.0 to 4.2.1 all documents was added
>> successfully, so I'm thinking that something is broken in new release.
>> I'll try examine tomorrow what is broken.
>>
>>
>> Regards,
>> Karol
>>
>> W dniu 20.04.2013 21:07, Andi Vajda pisze:
>>> On Sat, 20 Apr 2013, Simon Willnauer wrote:
>>>
>>>> Here is the RC:
>>>> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-r
>>>> ev1470054
>>>>
>>>>
>>>> happy voting...
>>>>
>>>> here is my +1
>>> PyLucene 4.3 builds and passes its tests.
>>>
>>> +1 !
>>>
>>> Andi..
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
>>> additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>
>> --
>>
>> Karol Sikora
>> +48 781 493 788
>>
>> Laboratorium EE
>> ul. Mokotowska 46A/23 | 00-543 Warszawa | www.laboratorium.ee |
>> www.laboratorium.ee/facebook
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
>> commands, e-mail: dev-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

-- 
  
Karol Sikora
+48 781 493 788

Laboratorium EE
ul. Mokotowska 46A/23 | 00-543 Warszawa |
www.laboratorium.ee | www.laboratorium.ee/facebook


RE: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

can you provider your analysis chain (field type definitions) from schema.xml that is used while indexing? Those errors are generally be caused by broken Tokenizers/TokenFilters.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Karol Sikora [mailto:karol.sikora@laboratorium.ee]
> Sent: Sunday, April 21, 2013 1:53 AM
> To: dev@lucene.apache.org
> Subject: Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"
> 
> Hi all,
> 
> I have problem with solr 4.3 RC2 on my testing data for searching application
> which i'm developing.
> A lot of importing records fails with exception
> "java.lang.IllegalArgumentException: first position increment must be >
> 0 (got 0)". On versions from early 4.0 to 4.2.1 all documents was added
> successfully, so I'm thinking that something is broken in new release.
> I'll try examine tomorrow what is broken.
> 
> 
> Regards,
> Karol
> 
> W dniu 20.04.2013 21:07, Andi Vajda pisze:
> >
> > On Sat, 20 Apr 2013, Simon Willnauer wrote:
> >
> >> Here is the RC:
> >> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-r
> >> ev1470054
> >>
> >>
> >> happy voting...
> >>
> >> here is my +1
> >
> > PyLucene 4.3 builds and passes its tests.
> >
> > +1 !
> >
> > Andi..
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> > additional commands, e-mail: dev-help@lucene.apache.org
> >
> >
> 
> --
> 
> Karol Sikora
> +48 781 493 788
> 
> Laboratorium EE
> ul. Mokotowska 46A/23 | 00-543 Warszawa | www.laboratorium.ee |
> www.laboratorium.ee/facebook
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Karol Sikora <ka...@laboratorium.ee>.
Hi all,

I have problem with solr 4.3 RC2 on my testing data for searching 
application which i'm developing.
A lot of importing records fails with exception 
"java.lang.IllegalArgumentException: first position increment must be > 
0 (got 0)". On versions from early 4.0 to 4.2.1 all documents was added 
successfully, so I'm thinking that something is broken in new release.
I'll try examine tomorrow what is broken.


Regards,
Karol

W dniu 20.04.2013 21:07, Andi Vajda pisze:
>
> On Sat, 20 Apr 2013, Simon Willnauer wrote:
>
>> Here is the RC:
>> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054 
>>
>>
>> happy voting...
>>
>> here is my +1
>
> PyLucene 4.3 builds and passes its tests.
>
> +1 !
>
> Andi..
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

-- 
  
Karol Sikora
+48 781 493 788

Laboratorium EE
ul. Mokotowska 46A/23 | 00-543 Warszawa |
www.laboratorium.ee | www.laboratorium.ee/facebook


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Andi Vajda <va...@osafoundation.org>.
On Sat, 20 Apr 2013, Simon Willnauer wrote:

> Here is the RC:
> http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
>
> happy voting...
>
> here is my +1

PyLucene 4.3 builds and passes its tests.

+1 !

Andi..

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: "[VOTE] Lucene/Solr 4.3 Take 2 (RC2)"

Posted by Mark Miller <ma...@gmail.com>.
+1

- Mark

On Apr 20, 2013, at 1:17 AM, Simon Willnauer <si...@gmail.com> wrote:

> 
> Here is the RC: http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC2-rev1470054
> 
> happy voting...
> 
> here is my +1
> 
> simon


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org