You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by vicente garcia <vi...@gmail.com> on 2012/06/15 14:01:48 UTC

Tokenize a string

Hi, I have a little doubt.

I'd like to tokenize a string. Something like this:

StandardAnalyzer analyzer = new StandardAnalyzer("hola mi nombre es Vicente");

List<string> tokens = analyzer.GetTokens();

And tokens is: [hola] [mi] [nombre] [es] [Vicente]

is this possible?

Thanks :)


-- 
LinkedIn profile: http://www.linkedin.com/in/vicentegarcia

Twiter: http://twitter.com/clrstack

Blog: http://geeks.ms/blogs/vgarcia

Re: Tokenize a string

Posted by vicente garcia <vi...@gmail.com>.
Thanks a lot, It's the same I supossed :)



On Fri, Jun 15, 2012 at 2:33 PM, Simon Svensson <si...@devhost.se> wrote:
> None in this example. The analyzer could be a PerFieldAnalyzerWrapper, and
> the actual TokenStream retrieved would depend on the field specified. The
> fieldName parameter is not used in StandardAnalyzer.TokenStream, I could
> have passed null if I knew that when I wrote the code.
>
>
> On 2012-06-15 14:28, vicente garcia wrote:
>>
>> Thank you very much, it works!!
>>
>> But what is the meaning of "field"?
>>
>> Thanks a lot :)
>>
>> On Fri, Jun 15, 2012 at 2:23 PM, Simon Svensson<si...@devhost.se>  wrote:
>>>
>>>            var analyzer = new StandardAnalyzer(Version.LUCENE_29);
>>>            var textReader = new StringReader("hola mi nombre es
>>> Vicente");
>>>            var tokenStream = analyzer.TokenStream("field", textReader);
>>>            var terms = new List<String>();
>>>            var termAttribute =
>>> (TermAttribute)tokenStream.GetAttribute(typeof(TermAttribute));
>>>            while(tokenStream.IncrementToken()) {
>>>                terms.Add(termAttribute.Term());
>>>            }
>>>
>>>            // terms = { "hola", "mi", "nombre", "es", "vicente" ]
>>>
>>>
>>> On 2012-06-15 14:01, vicente garcia wrote:
>>>>
>>>> Hi, I have a little doubt.
>>>>
>>>> I'd like to tokenize a string. Something like this:
>>>>
>>>> StandardAnalyzer analyzer = new StandardAnalyzer("hola mi nombre es
>>>> Vicente");
>>>>
>>>> List<string>    tokens = analyzer.GetTokens();
>>>>
>>>> And tokens is: [hola] [mi] [nombre] [es] [Vicente]
>>>>
>>>> is this possible?
>>>>
>>>> Thanks :)
>>>>
>>>>
>>
>>
>



-- 
LinkedIn profile: http://www.linkedin.com/in/vicentegarcia

Twiter: http://twitter.com/clrstack

Blog: http://geeks.ms/blogs/vgarcia

Re: Tokenize a string

Posted by Simon Svensson <si...@devhost.se>.
None in this example. The analyzer could be a PerFieldAnalyzerWrapper, 
and the actual TokenStream retrieved would depend on the field 
specified. The fieldName parameter is not used in 
StandardAnalyzer.TokenStream, I could have passed null if I knew that 
when I wrote the code.

On 2012-06-15 14:28, vicente garcia wrote:
> Thank you very much, it works!!
>
> But what is the meaning of "field"?
>
> Thanks a lot :)
>
> On Fri, Jun 15, 2012 at 2:23 PM, Simon Svensson<si...@devhost.se>  wrote:
>>             var analyzer = new StandardAnalyzer(Version.LUCENE_29);
>>             var textReader = new StringReader("hola mi nombre es Vicente");
>>             var tokenStream = analyzer.TokenStream("field", textReader);
>>             var terms = new List<String>();
>>             var termAttribute =
>> (TermAttribute)tokenStream.GetAttribute(typeof(TermAttribute));
>>             while(tokenStream.IncrementToken()) {
>>                 terms.Add(termAttribute.Term());
>>             }
>>
>>             // terms = { "hola", "mi", "nombre", "es", "vicente" ]
>>
>>
>> On 2012-06-15 14:01, vicente garcia wrote:
>>> Hi, I have a little doubt.
>>>
>>> I'd like to tokenize a string. Something like this:
>>>
>>> StandardAnalyzer analyzer = new StandardAnalyzer("hola mi nombre es
>>> Vicente");
>>>
>>> List<string>    tokens = analyzer.GetTokens();
>>>
>>> And tokens is: [hola] [mi] [nombre] [es] [Vicente]
>>>
>>> is this possible?
>>>
>>> Thanks :)
>>>
>>>
>
>

Re: Tokenize a string

Posted by vicente garcia <vi...@gmail.com>.
Thank you very much, it works!!

But what is the meaning of "field"?

Thanks a lot :)

On Fri, Jun 15, 2012 at 2:23 PM, Simon Svensson <si...@devhost.se> wrote:
>            var analyzer = new StandardAnalyzer(Version.LUCENE_29);
>            var textReader = new StringReader("hola mi nombre es Vicente");
>            var tokenStream = analyzer.TokenStream("field", textReader);
>            var terms = new List<String>();
>            var termAttribute =
> (TermAttribute)tokenStream.GetAttribute(typeof(TermAttribute));
>            while(tokenStream.IncrementToken()) {
>                terms.Add(termAttribute.Term());
>            }
>
>            // terms = { "hola", "mi", "nombre", "es", "vicente" ]
>
>
> On 2012-06-15 14:01, vicente garcia wrote:
>>
>> Hi, I have a little doubt.
>>
>> I'd like to tokenize a string. Something like this:
>>
>> StandardAnalyzer analyzer = new StandardAnalyzer("hola mi nombre es
>> Vicente");
>>
>> List<string>  tokens = analyzer.GetTokens();
>>
>> And tokens is: [hola] [mi] [nombre] [es] [Vicente]
>>
>> is this possible?
>>
>> Thanks :)
>>
>>
>



-- 
LinkedIn profile: http://www.linkedin.com/in/vicentegarcia

Twiter: http://twitter.com/clrstack

Blog: http://geeks.ms/blogs/vgarcia

Re: Tokenize a string

Posted by Simon Svensson <si...@devhost.se>.
             var analyzer = new StandardAnalyzer(Version.LUCENE_29);
             var textReader = new StringReader("hola mi nombre es Vicente");
             var tokenStream = analyzer.TokenStream("field", textReader);
             var terms = new List<String>();
             var termAttribute = 
(TermAttribute)tokenStream.GetAttribute(typeof(TermAttribute));
             while(tokenStream.IncrementToken()) {
                 terms.Add(termAttribute.Term());
             }

             // terms = { "hola", "mi", "nombre", "es", "vicente" ]

On 2012-06-15 14:01, vicente garcia wrote:
> Hi, I have a little doubt.
>
> I'd like to tokenize a string. Something like this:
>
> StandardAnalyzer analyzer = new StandardAnalyzer("hola mi nombre es Vicente");
>
> List<string>  tokens = analyzer.GetTokens();
>
> And tokens is: [hola] [mi] [nombre] [es] [Vicente]
>
> is this possible?
>
> Thanks :)
>
>