You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "jackluo923 (via GitHub)" <gi...@apache.org> on 2023/06/14 12:47:38 UTC

[GitHub] [pinot] jackluo923 commented on issue #10865: text_match operator fails to execute query containing stop words

jackluo923 commented on issue #10865:
URL: https://github.com/apache/pinot/issues/10865#issuecomment-1591126575

   The correct behavior is that 
   1. if stop words are specified during ingestion, remove them from the query during query time
   2. If stop words all stop words are excluded, we should not remove any stop words from the query
   3. If a customized list of stop words are excluded, only remove the customized list of stop words from the query
   
   To give you a concrete example, let's use the input example provided in Pinot's [documentation](https://docs.pinot.apache.org/basics/indexing/text-search-support#resume-text) with default text-index ingestion configs:
   > Distributed systems, Java, C++, Go, distributed query engines for analytics and data warehouses, Machine learning, spark, Kubernetes, transaction processing, Java, Python, C++, Machine learning, building and deploying large scale... CUDA, GPU processing, Tensor flow ...
   
   With the above input, the following query would return a match: 
   ```
   SELECT SKILLS_COL 
   FROM MyTable 
   WHERE TEXT_MATCH(SKILLS_COL, '"Machine learning" AND "gpu processing"')
   ```
   
   However, the following query would not return any match for the same input because the query contains the stop words `for` and `and`
   ```
   SELECT SKILLS_COL 
   FROM MyTable 
   WHERE TEXT_MATCH(SKILLS_COL, '"query engines for analytics" AND "building and deploying"')
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org