You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2020/03/04 14:56:26 UTC

[GitHub] [couchdb] natcohen opened a new issue #2635: Best practice for N-grame and set Lucene param with Clouseau

natcohen opened a new issue #2635: Best practice for N-grame and set Lucene param with Clouseau
URL: https://github.com/apache/couchdb/issues/2635
 
 
   CouchDB/Clouseau indexing allows analyzers but what about n-gram tokenization? What is the best practive for n-grams? Should we use an algorithm to do n-grams within the index javascript function? Or can we take advantage of Lucene n-gram function?
   
   Also how can we set Lucene parameters such as allowing leding wildcard (https://lucene.apache.org/core/4_0_0/queryparser/org/apache/lucene/queryparser/classic/QueryParserBase.html#setAllowLeadingWildcard(boolean))?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [couchdb] natcohen commented on issue #2635: Best practice for N-gram and set Lucene param with Clouseau

Posted by GitBox <gi...@apache.org>.
natcohen commented on issue #2635: Best practice for N-gram and set Lucene param with Clouseau
URL: https://github.com/apache/couchdb/issues/2635#issuecomment-598212487
 
 
   @rnewson I'd love to contribute and add the n-gram analyzer. Unfortunately I don't know Erlang and working on Clouseau is a bit overwhelming since the project seams quite complex with very little documentation... I m also not an expert in Java so that doesn't help either!
   
   Regarding the leading wildcard parameter, it was just an example! I don't plan to use it but wanted to know if there was a way to use all the parameters Lucene offers.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [couchdb] rnewson commented on issue #2635: Best practice for N-gram and set Lucene param with Clouseau

Posted by GitBox <gi...@apache.org>.
rnewson commented on issue #2635: Best practice for N-gram and set Lucene param with Clouseau
URL: https://github.com/apache/couchdb/issues/2635#issuecomment-606497843
 
 
   hi @natcohen sorry for silence.
   
   Appreciate the desire to help but things move forward in this project when folks contribute. It's useful to highlight a desire for this feature, though. If someone works on it to a reviewable standard, I'm sure someone will have time to help it the last few steps.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [couchdb] natcohen commented on issue #2635: Best practice for N-gram and set Lucene param with Clouseau

Posted by GitBox <gi...@apache.org>.
natcohen commented on issue #2635: Best practice for N-gram and set Lucene param with Clouseau
URL: https://github.com/apache/couchdb/issues/2635#issuecomment-601375396
 
 
   @rnewson Partial search is widely used especially for auto-complete. Any chance someone can help exposing the n-gram analyzer? I have posted an issue to get some guidance [here](https://github.com/cloudant-labs/clouseau/issues/29) but Clouseau doesn't seem super active! 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [couchdb] rnewson commented on issue #2635: Best practice for N-gram and set Lucene param with Clouseau

Posted by GitBox <gi...@apache.org>.
rnewson commented on issue #2635: Best practice for N-gram and set Lucene param with Clouseau
URL: https://github.com/apache/couchdb/issues/2635#issuecomment-598057020
 
 
   We don't expose the NGram analyzers in Clouseau today but we'd consider merging a pull request if you want to add it.
   
   We don't support setting of that parameter either, and I don't think we'd accept a patch to allow it given it has such bad performance implications.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [couchdb] natcohen edited a comment on issue #2635: Best practice for N-gram and set Lucene param with Clouseau

Posted by GitBox <gi...@apache.org>.
natcohen edited a comment on issue #2635: Best practice for N-gram and set Lucene param with Clouseau
URL: https://github.com/apache/couchdb/issues/2635#issuecomment-601375396
 
 
   @rnewson Partial search is widely used especially for auto-complete. Any chance someone can help exposing the n-gram analyzer? I have posted an issue to get some guidance [here](https://github.com/cloudant-labs/clouseau/issues/29) but Clouseau doesn't seem super active! 
   
   PS There are other useful analyzers that would be great exposing such as n-gram edge...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services