You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by servus01 <fr...@sportcast.de> on 2019/10/23 13:43:40 UTC

WordDelimiter in extended way.

Hello,

maybe somebody can help me out. We have a lot of datasets that are always
built according to the same scheme:

Expression - Expression

as an example:

"CCF *HD - 2nd* BL 2019-2020 1st matchday VfL Osnabrück vs. 1st FC
Heidenheim 1846 | 1st HZ without WZ"

or 

"Scouting Feed *mp4 - 2.* BL 2019-2020 1st matchday SV Wehen Wiesbaden vs.
Karlsruher SC"

Now Solr behaves in such a way that on the one hand the hyphens which have a
blank before and after are not indexed and also the search as soon as blank
- blank is searched does not return any results.
With the WordDelimiter I have already covered the cases like 2019-2020. But
for blank - blank i'm running out of ideas. Normally it should tokenize the
word before the hyphen the blanks with hyphen and the word after hyphen as
one token.

Best

Francois



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: WordDelimiter in extended way.

Posted by servus01 <fr...@sportcast.de>.

got it, thank you



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: WordDelimiter in extended way.

Posted by Shawn Heisey <ap...@elyograg.org>.

On 10/23/2019 9:41 AM, servus01 wrote:
> Hey,
> 
> thank you for helping me:
> 
> Thanks in advanced for any help, really appriciate.
> 
> <https://lucene.472066.n3.nabble.com/file/t494058/screenshot.jpg>
> <https://lucene.472066.n3.nabble.com/file/t494058/screenshot3.jpg>

It is not the WordDelimiter filter that is affecting your punctuation. 
It is the StandardTokenizer, which is the first analysis component that 
runs.  You can see this in the first screenshot, where that tokenizer 
outputs terms of "CCF" "HD" and "2nd".

That filter is capable of affecting punctuation, depending on its 
settings, but in this case, no punctuation is left by the time the 
analysis hits that filter.

Thanks,
Shawn

Re: WordDelimiter in extended way.

Posted by servus01 <fr...@sportcast.de>.

Hey,

thank you for helping me:







Thanks in advanced for any help, really appriciate.

<https://lucene.472066.n3.nabble.com/file/t494058/screenshot.jpg> 
<https://lucene.472066.n3.nabble.com/file/t494058/screenshot3.jpg> 



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: WordDelimiter in extended way.

Posted by Shawn Heisey <ap...@elyograg.org>.

On 10/23/2019 7:43 AM, servus01 wrote:
> Now Solr behaves in such a way that on the one hand the hyphens which have a
> blank before and after are not indexed and also the search as soon as blank
> - blank is searched does not return any results.
> With the WordDelimiter I have already covered the cases like 2019-2020. But
> for blank - blank i'm running out of ideas. Normally it should tokenize the
> word before the hyphen the blanks with hyphen and the word after hyphen as
> one token.

To figure out what's happening, we will need to see the entire analysis 
chain, both index and query.  In order to see those, we will need the 
field definition as well as the referenced fieldType definition from 
your schema.  Additional details needed:  Exact Solr version and the 
schema version.  The schema version is at the top of the schema.

Thanks,
Shawn