You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucenenet.apache.org by GitBox <gi...@apache.org> on 2020/04/28 15:22:48 UTC

[GitHub] [lucenenet] roysurles opened a new issue #246: Custom StopWord Analyzer - Exception Cannot read from a closed TextReader.

roysurles opened a new issue #246:
URL: https://github.com/apache/lucenenet/issues/246


   Hello, 
   We are trying to convert from v3.0.3 to v4.8.0-beta00007.   .Net Framework 4.5.
   
   We previously had a Custom StopWork Analyzer that inherited from Analyzer.  After upgrading, there is an abstract method that needs to be implemented named: 
   TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
   
   Following the documentation from https://lucenenet.apache.org/download/version-4.html to implement this method, we are getting exception:  "Cannot read from a closed TextReader."
   
   Here is our implementation:
   <code>
           protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
           {
               Analyzer analyzer = new StandardAnalyzer(_luceneVersion, reader);
               TokenStream ts = analyzer.GetTokenStream(fieldName, reader);
               var tokenizer = new StandardTokenizer(_luceneVersion, reader);
   
               try
               {
                   ts.Reset(); // Resets this stream to the beginning. (Required)
                   while (ts.IncrementToken())
                   {
                   }
                   ts.End();   // Perform end-of-stream operations, e.g. set the final offset.
               }
               catch (Exception ex)
               {
                   _ = ex.Message;
                   throw;
               }
               finally
               {
                   ts.Dispose();
               }
               return new TokenStreamComponents(tokenizer, ts);
           }
   </code>
   
   The exception occurs on ts.IncrementToken().
   
   Thanks
   Roy
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [lucenenet] NightOwl888 closed issue #246: Custom StopWord Analyzer - Exception Cannot read from a closed TextReader.

Posted by GitBox <gi...@apache.org>.

NightOwl888 closed issue #246:
URL: https://github.com/apache/lucenenet/issues/246


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [lucenenet] NightOwl888 commented on issue #246: Custom StopWord Analyzer - Exception Cannot read from a closed TextReader.

Posted by GitBox <gi...@apache.org>.

NightOwl888 commented on issue #246:
URL: https://github.com/apache/lucenenet/issues/246#issuecomment-620808822


   As `CreateComponents()` is a factory method (meaning it is a creational operation), only short-lived dependencies should be disposed there. Since you are disposing the stream first before returning it, it is not in a state where the caller of `CreateComponents()` can utilize it.
   
   To make a customized standard analyzer, the best approach would be to model your new class after the [existing StandardAnalyzer class](https://github.com/apache/lucenenet/blob/8cf15f7fd0bb7b22bb2e865895998583d049ab92/src/Lucene.Net.Analysis.Common/Analysis/Standard/StandardAnalyzer.cs).
   
   ```c#
       public sealed class MyStopwordAnalyzer : StopwordAnalyzerBase
       {
           /// <summary>
           /// An unmodifiable set containing some common English words that are usually not
           /// useful for searching. 
           /// </summary>
           public static readonly CharArraySet STOP_WORDS_SET = LoadEnglishStopWordsSet();
   
           private static CharArraySet LoadEnglishStopWordsSet() // LUCENENET: Avoid static constructors (see https://github.com/apache/lucenenet/pull/224#issuecomment-469284006)
           {
               IList<string> stopWords = new string[] { "a", "an", "and", "are", "as", "at", "be",
                   "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on",
                   "or", "such", "that", "the", "their", "then", "there", "these", "they", "this",
                   "to", "was", "will", "with" };
   #pragma warning disable 612, 618
               var stopSet = new CharArraySet(LuceneVersion.LUCENE_CURRENT, stopWords, false);
   #pragma warning restore 612, 618
               return CharArraySet.UnmodifiableSet(stopSet);
           }
   
           /// <summary>
           /// Builds an analyzer with the given stop words. </summary>
           /// <param name="matchVersion"> Lucene compatibility version - See <see cref="MyStopwordAnalyzer"/> </param>
           /// <param name="stopWords"> stop words  </param>
           public MyStopwordAnalyzer(LuceneVersion matchVersion, CharArraySet stopWords)
               : base(matchVersion, stopWords)
           {
           }
   
           /// <summary>
           /// Builds an analyzer with the default stop words (<see cref="STOP_WORDS_SET"/>). </summary>
           /// <param name="matchVersion"> Lucene compatibility version - See <see cref="MyStopwordAnalyzer"/> </param>
           public MyStopwordAnalyzer(LuceneVersion matchVersion)
               : this(matchVersion, STOP_WORDS_SET)
           {
           }
   
           /// <summary>
           /// Builds an analyzer with the stop words from the given reader. </summary>
           /// <seealso cref="WordlistLoader.GetWordSet(TextReader, LuceneVersion)"/>
           /// <param name="matchVersion"> Lucene compatibility version - See <see cref="MyStopwordAnalyzer"/> </param>
           /// <param name="stopwords"> <see cref="TextReader"/> to read stop words from  </param>
           public MyStopwordAnalyzer(LuceneVersion matchVersion, TextReader stopwords)
               : this(matchVersion, LoadStopwordSet(stopwords, matchVersion))
           {
           }
   
           protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
           {
               var src = new StandardTokenizer(m_matchVersion, reader);
               TokenStream tok = new StandardFilter(m_matchVersion, src);
               // tok = new LowerCaseFilter(m_matchVersion, tok); // optional
               tok = new StopFilter(m_matchVersion, tok, m_stopwords);
               return new TokenStreamComponents(src, tok);
           }
       }
   ```
   
   Do note that the existing `StandardAnalyzer` class also allows passing in a `CharArraySet` containing stopwords, which may meet your needs if you wish to use the `LowerCaseFilter` to normalize your text.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [lucenenet] NightOwl888 commented on issue #246: Custom StopWord Analyzer - Exception Cannot read from a closed TextReader.

Posted by GitBox <gi...@apache.org>.

NightOwl888 commented on issue #246:
URL: https://github.com/apache/lucenenet/issues/246#issuecomment-657122285


   I am closing this issue, as there hasn't been activity for some time. Feel free to reopen it if there are any additional updates.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org