You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/05/12 14:03:12 UTC
[GitHub] [lucene] janhoy opened a new pull request #136: LUCENE-9589 Swedish Minimal Stemmer
janhoy opened a new pull request #136:
URL: https://github.com/apache/lucene/pull/136
https://issues.apache.org/jira/browse/LUCENE-9589 moved to new repo, see also old PR at https://github.com/apache/lucene-solr/pull/2062
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] janhoy merged pull request #136: LUCENE-9589 Swedish Minimal Stemmer
Posted by GitBox <gi...@apache.org>.
janhoy merged pull request #136:
URL: https://github.com/apache/lucene/pull/136
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] karlwettin edited a comment on pull request #136: LUCENE-9589 Swedish Minimal Stemmer
Posted by GitBox <gi...@apache.org>.
karlwettin edited a comment on pull request #136:
URL: https://github.com/apache/lucene/pull/136#issuecomment-843614395
I gave the stemmer a spin on [SAOL](https://en.wikipedia.org/wiki/Svenska_Akademiens_ordlista) 13 (2006). I have to stay within the bounds of fair use and can't publish the complete results.
Generally speaking I think it does a remarkable job with such a small decision tree. Given what it's meant to do, I would merge it.
A few notes that are more applicable on a not so minimal implementation:
The suffix-s pluralis rule has ~5300 exceptions where words ends with s is nominative case singularis.
It's also missing the rules defined in LUCENE-1515, especially 'an' and 'ans'-suffixes. Back then I came to the conclusion that 8% of the Swedish language can be inflected that way, but there is a list of ~200 words that needs to be setup as exceptions to those rules.
Two standard an/ans-suffixes:
| Stemmed | Original |
| ------------- |:-------------:|
ättiksgurk | ättiksgurka
ättiksgurka | ättiksgurkan
ättiksgurka | ättiksgurkans
ättiksgurk | ättiksgurkas
ättiksgurk | ättiksgurkor
ättiksgurk | ättiksgurkorna
ättiksgurk | ättiksgurkornas
ättiksgurk | ättiksgurkors
| Stemmed | Original |
| ------------- |:-------------:|
ättestup | ättestupa
ättestupa | ättestupan
ättestupa | ättestupans
ättestup | ättestupas
ättestup | ättestupor
ättestup | ättestuporna
ättestup | ättestupornas
ättestup | ättestupors
There are probably more complete and better examples of this in LUCENE-1515.
And if I have to go looking for problems, I see these:
| Stemmed | Original |
| ------------- |:-------------:|
höstmörk | höstmörker
höstmörk | höstmörkers
höstmörkr | höstmörkret
höstmörkr | höstmörkrets
| Stemmed | Original |
| ------------- |:-------------:|
höstkollektio | höstkollektion
höstkollektion | höstkollektionen
höstkollektion | höstkollektionens
höstkollektion | höstkollektioner
höstkollektion | höstkollektionerna
höstkollektion | höstkollektionernas
höstkollektion | höstkollektioners
höstkollektio | höstkollektions
This one is a number of different words with very different meaning that turn out completely mixed up, not all nouns though:
| Stemmed | Original |
| ------------- |:-------------:|
hölj | hölj
hölj | hölja
hölja | höljan
höljand | höljande
hölja | höljans
hölj | höljas
höljd | höljd
höljd | höljda
höljd | höljde
höljd | höljdes
hölj | hölje
hölj | höljen
höljen | höljena
höljen | höljenas
hölj | höljens
hölj | höljer
hölj | höljes
hölj | höljet
hölj | höljets
hölj | höljor
hölj | höljorna
hölj | höljornas
hölj | höljors
hölj | höljs
höljt | höljt
höljt | höljts
I'm afraid it isn't possible to extract stemmer rules and exception lists from SAOL due to copyright issues (unless we find a digital copy that's at least 20 years old), but perhaps an alternative and more global route would be to mine [Wikidata:Lexicographical data](https://www.wikidata.org/wiki/Wikidata:Lexicographical_data)?
https://www.wikidata.org/wiki/Lexeme:L38829
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] karlwettin edited a comment on pull request #136: LUCENE-9589 Swedish Minimal Stemmer
Posted by GitBox <gi...@apache.org>.
karlwettin edited a comment on pull request #136:
URL: https://github.com/apache/lucene/pull/136#issuecomment-843614395
I gave the stemmer a spin on [SAOL](https://en.wikipedia.org/wiki/Svenska_Akademiens_ordlista) 13 (2006). I have to stay within the bounds of fair use and can't publish the complete results.
Generally speaking I think it does a remarkable job with such a small decision tree. Given what it's meant to do, I would merge it.
A few notes that are more applicable on a not so minimal implementation:
The suffix-s pluralis rule have ~5300 exceptions where words ends with s is nominative case singularis.
It's also missing the rules defined in LUCENE-1515, especially 'an' and 'ans'-suffixes. Back then I came to the conclusion that 8% of the Swedish language can be inflected that way, but there is a list of ~200 words that needs to be setup as exceptions to those rules.
Two standard an/ans-suffixes:
| Stemmed | Original |
| ------------- |:-------------:|
ättiksgurk | ättiksgurka
ättiksgurka | ättiksgurkan
ättiksgurka | ättiksgurkans
ättiksgurk | ättiksgurkas
ättiksgurk | ättiksgurkor
ättiksgurk | ättiksgurkorna
ättiksgurk | ättiksgurkornas
ättiksgurk | ättiksgurkors
| Stemmed | Original |
| ------------- |:-------------:|
ättestup | ättestupa
ättestupa | ättestupan
ättestupa | ättestupans
ättestup | ättestupas
ättestup | ättestupor
ättestup | ättestuporna
ättestup | ättestupornas
ättestup | ättestupors
There are probably more complete and better examples of this in LUCENE-1515.
And if I have to go looking for problems, I see these:
| Stemmed | Original |
| ------------- |:-------------:|
höstmörk | höstmörker
höstmörk | höstmörkers
höstmörkr | höstmörkret
höstmörkr | höstmörkrets
| Stemmed | Original |
| ------------- |:-------------:|
höstkollektio | höstkollektion
höstkollektion | höstkollektionen
höstkollektion | höstkollektionens
höstkollektion | höstkollektioner
höstkollektion | höstkollektionerna
höstkollektion | höstkollektionernas
höstkollektion | höstkollektioners
höstkollektio | höstkollektions
This one is a number of different words with very different meaning that turn out completely mixed up, not all nous though:
| Stemmed | Original |
| ------------- |:-------------:|
hölj | hölj
hölj | hölja
hölja | höljan
höljand | höljande
hölja | höljans
hölj | höljas
höljd | höljd
höljd | höljda
höljd | höljde
höljd | höljdes
hölj | hölje
hölj | höljen
höljen | höljena
höljen | höljenas
hölj | höljens
hölj | höljer
hölj | höljes
hölj | höljet
hölj | höljets
hölj | höljor
hölj | höljorna
hölj | höljornas
hölj | höljors
hölj | höljs
höljt | höljt
höljt | höljts
I'm afraid it isn't possible to extract stemmer rules and exception lists from SAOL due to copyright issues (unless we find a digital copy that's at least 20 years old), but perhaps an alternative and more global route would be to mine [Wikidata:Lexicographical data](https://www.wikidata.org/wiki/Wikidata:Lexicographical_data)?
https://www.wikidata.org/wiki/Lexeme:L38829
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] karlwettin commented on pull request #136: LUCENE-9589 Swedish Minimal Stemmer
Posted by GitBox <gi...@apache.org>.
karlwettin commented on pull request #136:
URL: https://github.com/apache/lucene/pull/136#issuecomment-843614395
I gave the stemmer a spin on [SAOL](https://en.wikipedia.org/wiki/Svenska_Akademiens_ordlista) 13 (2006). I have to stay within the bounds of fair use and can't publish the complete results.
Generally speaking I think it does a remarkable job with such a small decision tree. Given what it's meant to do, I would merge it.
A few notes that are more applicable on a not so minimal implementation:
The suffix-s pluralis rule have ~5300 exceptions where words ends with s is nominative case singularis.
It's however missing the rules defined in LUCENE-1515, especially 'an' and 'ans'-suffixes. Back then I came to the conclusion that 8% of the Swedish language can be inflected that way, but there is a list of ~200 words that needs to be setup as exceptions to those rules.
Two standard an/ans-suffixes:
| Stemmed | Original |
| ------------- |:-------------:|
ättiksgurk | ättiksgurka
ättiksgurka | ättiksgurkan
ättiksgurka | ättiksgurkans
ättiksgurk | ättiksgurkas
ättiksgurk | ättiksgurkor
ättiksgurk | ättiksgurkorna
ättiksgurk | ättiksgurkornas
ättiksgurk | ättiksgurkors
| Stemmed | Original |
| ------------- |:-------------:|
ättestup | ättestupa
ättestupa | ättestupan
ättestupa | ättestupans
ättestup | ättestupas
ättestup | ättestupor
ättestup | ättestuporna
ättestup | ättestupornas
ättestup | ättestupors
There are probably more complete and better examples of this in LUCENE-1515.
And if I have to go looking for problems, I see these:
| Stemmed | Original |
| ------------- |:-------------:|
höstmörk | höstmörker
höstmörk | höstmörkers
höstmörkr | höstmörkret
höstmörkr | höstmörkrets
| Stemmed | Original |
| ------------- |:-------------:|
höstkollektio | höstkollektion
höstkollektion | höstkollektionen
höstkollektion | höstkollektionens
höstkollektion | höstkollektioner
höstkollektion | höstkollektionerna
höstkollektion | höstkollektionernas
höstkollektion | höstkollektioners
höstkollektio | höstkollektions
This one is a number of different words with very different meaning that turn out completely mixed up, not all nous though:
| Stemmed | Original |
| ------------- |:-------------:|
hölj | hölj
hölj | hölja
hölja | höljan
höljand | höljande
hölja | höljans
hölj | höljas
höljd | höljd
höljd | höljda
höljd | höljde
höljd | höljdes
hölj | hölje
hölj | höljen
höljen | höljena
höljen | höljenas
hölj | höljens
hölj | höljer
hölj höljes
hölj höljet
hölj höljets
hölj höljor
hölj höljorna
hölj höljornas
hölj höljors
hölj höljs
höljt höljt
höljt höljts
I'm afraid it isn't possible to extract stemmer rules and exception lists from SAOL due to copyright issues (unless we find a digital copy that's at least 20 years old), but perhaps an alternative and more global route would be to mine [Wikidata:Lexicographical data](https://www.wikidata.org/wiki/Wikidata:Lexicographical_data)?
https://www.wikidata.org/wiki/Lexeme:L38829
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] janhoy commented on a change in pull request #136: LUCENE-9589 Swedish Minimal Stemmer
Posted by GitBox <gi...@apache.org>.
janhoy commented on a change in pull request #136:
URL: https://github.com/apache/lucene/pull/136#discussion_r631081371
##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/sv/SwedishMinimalStemmer.java
##########
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.analysis.sv;
+
+/*
+ * This algorithm is updated based on code located at:
+ * http://members.unine.ch/jacques.savoy/clef/
+ *
+ * Full copyright for that code follows:
+ */
+
+/*
+ * Copyright (c) 2005, Jacques Savoy
Review comment:
Could these copyright notices be moved to LICENSE.txt or NOTICE.txt ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] janhoy commented on a change in pull request #136: LUCENE-9589 Swedish Minimal Stemmer
Posted by GitBox <gi...@apache.org>.
janhoy commented on a change in pull request #136:
URL: https://github.com/apache/lucene/pull/136#discussion_r639745969
##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/sv/SwedishMinimalStemmer.java
##########
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.analysis.sv;
+
+/*
+ * This algorithm is updated based on code located at:
+ * http://members.unine.ch/jacques.savoy/clef/
+ *
+ * Full copyright for that code follows:
+ */
+
+/*
+ * Copyright (c) 2005, Jacques Savoy
Review comment:
Hmm, I see a ton of other stemmers with exactly the same headers. So I'll leave them as is in this PR and rather do a separate copyright cleanup.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] HoustonPutman commented on a change in pull request #136: LUCENE-9589 Swedish Minimal Stemmer
Posted by GitBox <gi...@apache.org>.
HoustonPutman commented on a change in pull request #136:
URL: https://github.com/apache/lucene/pull/136#discussion_r631156602
##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/sv/SwedishMinimalStemmer.java
##########
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.analysis.sv;
+
+/*
+ * This algorithm is updated based on code located at:
+ * http://members.unine.ch/jacques.savoy/clef/
+ *
+ * Full copyright for that code follows:
+ */
+
+/*
+ * Copyright (c) 2005, Jacques Savoy
Review comment:
I believe they should be moved to the NOTICE.txt
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] karlwettin edited a comment on pull request #136: LUCENE-9589 Swedish Minimal Stemmer
Posted by GitBox <gi...@apache.org>.
karlwettin edited a comment on pull request #136:
URL: https://github.com/apache/lucene/pull/136#issuecomment-843614395
I gave the stemmer a spin on [SAOL](https://en.wikipedia.org/wiki/Svenska_Akademiens_ordlista) 13 (2006). I have to stay within the bounds of fair use and can't publish the complete results.
Generally speaking I think it does a remarkable job with such a small decision tree. Given what it's meant to do, I would merge it.
A few notes that are more applicable on a not so minimal implementation:
The suffix-s pluralis rule has ~5300 exceptions where words ends with s is nominative case singularis.
It's also missing the rules defined in LUCENE-1515, especially 'an' and 'ans'-suffixes. Back then I came to the conclusion that 8% of the Swedish language can be inflected that way, but there is a list of ~200 words that needs to be setup as exceptions to those rules.
Two standard an/ans-suffixes:
| Stemmed | Original |
| ------------- |:-------------:|
ättiksgurk | ättiksgurka
ättiksgurka | ättiksgurkan
ättiksgurka | ättiksgurkans
ättiksgurk | ättiksgurkas
ättiksgurk | ättiksgurkor
ättiksgurk | ättiksgurkorna
ättiksgurk | ättiksgurkornas
ättiksgurk | ättiksgurkors
| Stemmed | Original |
| ------------- |:-------------:|
ättestup | ättestupa
ättestupa | ättestupan
ättestupa | ättestupans
ättestup | ättestupas
ättestup | ättestupor
ättestup | ättestuporna
ättestup | ättestupornas
ättestup | ättestupors
There are probably more complete and better examples of this in LUCENE-1515.
And if I have to go looking for problems, I see these:
| Stemmed | Original |
| ------------- |:-------------:|
höstmörk | höstmörker
höstmörk | höstmörkers
höstmörkr | höstmörkret
höstmörkr | höstmörkrets
| Stemmed | Original |
| ------------- |:-------------:|
höstkollektio | höstkollektion
höstkollektion | höstkollektionen
höstkollektion | höstkollektionens
höstkollektion | höstkollektioner
höstkollektion | höstkollektionerna
höstkollektion | höstkollektionernas
höstkollektion | höstkollektioners
höstkollektio | höstkollektions
This one is a number of different words with very different meaning that turn out completely mixed up, not all nous though:
| Stemmed | Original |
| ------------- |:-------------:|
hölj | hölj
hölj | hölja
hölja | höljan
höljand | höljande
hölja | höljans
hölj | höljas
höljd | höljd
höljd | höljda
höljd | höljde
höljd | höljdes
hölj | hölje
hölj | höljen
höljen | höljena
höljen | höljenas
hölj | höljens
hölj | höljer
hölj | höljes
hölj | höljet
hölj | höljets
hölj | höljor
hölj | höljorna
hölj | höljornas
hölj | höljors
hölj | höljs
höljt | höljt
höljt | höljts
I'm afraid it isn't possible to extract stemmer rules and exception lists from SAOL due to copyright issues (unless we find a digital copy that's at least 20 years old), but perhaps an alternative and more global route would be to mine [Wikidata:Lexicographical data](https://www.wikidata.org/wiki/Wikidata:Lexicographical_data)?
https://www.wikidata.org/wiki/Lexeme:L38829
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] janhoy commented on pull request #136: LUCENE-9589 Swedish Minimal Stemmer
Posted by GitBox <gi...@apache.org>.
janhoy commented on pull request #136:
URL: https://github.com/apache/lucene/pull/136#issuecomment-839810656
@rmuir you already reviewed this PR over at lucene-solr repo. I am still trying to get feedback from a native swede but, otherwise I'm ready to merge this aiming for 8.9..
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] karlwettin edited a comment on pull request #136: LUCENE-9589 Swedish Minimal Stemmer
Posted by GitBox <gi...@apache.org>.
karlwettin edited a comment on pull request #136:
URL: https://github.com/apache/lucene/pull/136#issuecomment-843614395
I gave the stemmer a spin on [SAOL](https://en.wikipedia.org/wiki/Svenska_Akademiens_ordlista) 13 (2006). I have to stay within the bounds of fair use and can't publish the complete results.
Generally speaking I think it does a remarkable job with such a small decision tree. Given what it's meant to do, I would merge it.
A few notes that are more applicable on a not so minimal implementation:
The suffix-s pluralis rule have ~5300 exceptions where words ends with s is nominative case singularis.
It's however missing the rules defined in LUCENE-1515, especially 'an' and 'ans'-suffixes. Back then I came to the conclusion that 8% of the Swedish language can be inflected that way, but there is a list of ~200 words that needs to be setup as exceptions to those rules.
Two standard an/ans-suffixes:
| Stemmed | Original |
| ------------- |:-------------:|
ättiksgurk | ättiksgurka
ättiksgurka | ättiksgurkan
ättiksgurka | ättiksgurkans
ättiksgurk | ättiksgurkas
ättiksgurk | ättiksgurkor
ättiksgurk | ättiksgurkorna
ättiksgurk | ättiksgurkornas
ättiksgurk | ättiksgurkors
| Stemmed | Original |
| ------------- |:-------------:|
ättestup | ättestupa
ättestupa | ättestupan
ättestupa | ättestupans
ättestup | ättestupas
ättestup | ättestupor
ättestup | ättestuporna
ättestup | ättestupornas
ättestup | ättestupors
There are probably more complete and better examples of this in LUCENE-1515.
And if I have to go looking for problems, I see these:
| Stemmed | Original |
| ------------- |:-------------:|
höstmörk | höstmörker
höstmörk | höstmörkers
höstmörkr | höstmörkret
höstmörkr | höstmörkrets
| Stemmed | Original |
| ------------- |:-------------:|
höstkollektio | höstkollektion
höstkollektion | höstkollektionen
höstkollektion | höstkollektionens
höstkollektion | höstkollektioner
höstkollektion | höstkollektionerna
höstkollektion | höstkollektionernas
höstkollektion | höstkollektioners
höstkollektio | höstkollektions
This one is a number of different words with very different meaning that turn out completely mixed up, not all nous though:
| Stemmed | Original |
| ------------- |:-------------:|
hölj | hölj
hölj | hölja
hölja | höljan
höljand | höljande
hölja | höljans
hölj | höljas
höljd | höljd
höljd | höljda
höljd | höljde
höljd | höljdes
hölj | hölje
hölj | höljen
höljen | höljena
höljen | höljenas
hölj | höljens
hölj | höljer
hölj | höljes
hölj | höljet
hölj | höljets
hölj | höljor
hölj | höljorna
hölj | höljornas
hölj | höljors
hölj | höljs
höljt | höljt
höljt | höljts
I'm afraid it isn't possible to extract stemmer rules and exception lists from SAOL due to copyright issues (unless we find a digital copy that's at least 20 years old), but perhaps an alternative and more global route would be to mine [Wikidata:Lexicographical data](https://www.wikidata.org/wiki/Wikidata:Lexicographical_data)?
https://www.wikidata.org/wiki/Lexeme:L38829
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] janhoy commented on pull request #136: LUCENE-9589 Swedish Minimal Stemmer
Posted by GitBox <gi...@apache.org>.
janhoy commented on pull request #136:
URL: https://github.com/apache/lucene/pull/136#issuecomment-848803076
Since release 8.9 is in feature freeze I now target this at 9.0.0. I moved CHANGES entry and @since tags. Will commit later this week.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org