You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by vicente garcia <vi...@gmail.com> on 2012/06/15 14:01:48 UTC
Tokenize a string
Hi, I have a little doubt.
I'd like to tokenize a string. Something like this:
StandardAnalyzer analyzer = new StandardAnalyzer("hola mi nombre es Vicente");
List<string> tokens = analyzer.GetTokens();
And tokens is: [hola] [mi] [nombre] [es] [Vicente]
is this possible?
Thanks :)
--
LinkedIn profile: http://www.linkedin.com/in/vicentegarcia
Twiter: http://twitter.com/clrstack
Blog: http://geeks.ms/blogs/vgarcia
Re: Tokenize a string
Posted by vicente garcia <vi...@gmail.com>.
Thanks a lot, It's the same I supossed :)
On Fri, Jun 15, 2012 at 2:33 PM, Simon Svensson <si...@devhost.se> wrote:
> None in this example. The analyzer could be a PerFieldAnalyzerWrapper, and
> the actual TokenStream retrieved would depend on the field specified. The
> fieldName parameter is not used in StandardAnalyzer.TokenStream, I could
> have passed null if I knew that when I wrote the code.
>
>
> On 2012-06-15 14:28, vicente garcia wrote:
>>
>> Thank you very much, it works!!
>>
>> But what is the meaning of "field"?
>>
>> Thanks a lot :)
>>
>> On Fri, Jun 15, 2012 at 2:23 PM, Simon Svensson<si...@devhost.se> wrote:
>>>
>>> var analyzer = new StandardAnalyzer(Version.LUCENE_29);
>>> var textReader = new StringReader("hola mi nombre es
>>> Vicente");
>>> var tokenStream = analyzer.TokenStream("field", textReader);
>>> var terms = new List<String>();
>>> var termAttribute =
>>> (TermAttribute)tokenStream.GetAttribute(typeof(TermAttribute));
>>> while(tokenStream.IncrementToken()) {
>>> terms.Add(termAttribute.Term());
>>> }
>>>
>>> // terms = { "hola", "mi", "nombre", "es", "vicente" ]
>>>
>>>
>>> On 2012-06-15 14:01, vicente garcia wrote:
>>>>
>>>> Hi, I have a little doubt.
>>>>
>>>> I'd like to tokenize a string. Something like this:
>>>>
>>>> StandardAnalyzer analyzer = new StandardAnalyzer("hola mi nombre es
>>>> Vicente");
>>>>
>>>> List<string> tokens = analyzer.GetTokens();
>>>>
>>>> And tokens is: [hola] [mi] [nombre] [es] [Vicente]
>>>>
>>>> is this possible?
>>>>
>>>> Thanks :)
>>>>
>>>>
>>
>>
>
--
LinkedIn profile: http://www.linkedin.com/in/vicentegarcia
Twiter: http://twitter.com/clrstack
Blog: http://geeks.ms/blogs/vgarcia
Re: Tokenize a string
Posted by Simon Svensson <si...@devhost.se>.
None in this example. The analyzer could be a PerFieldAnalyzerWrapper,
and the actual TokenStream retrieved would depend on the field
specified. The fieldName parameter is not used in
StandardAnalyzer.TokenStream, I could have passed null if I knew that
when I wrote the code.
On 2012-06-15 14:28, vicente garcia wrote:
> Thank you very much, it works!!
>
> But what is the meaning of "field"?
>
> Thanks a lot :)
>
> On Fri, Jun 15, 2012 at 2:23 PM, Simon Svensson<si...@devhost.se> wrote:
>> var analyzer = new StandardAnalyzer(Version.LUCENE_29);
>> var textReader = new StringReader("hola mi nombre es Vicente");
>> var tokenStream = analyzer.TokenStream("field", textReader);
>> var terms = new List<String>();
>> var termAttribute =
>> (TermAttribute)tokenStream.GetAttribute(typeof(TermAttribute));
>> while(tokenStream.IncrementToken()) {
>> terms.Add(termAttribute.Term());
>> }
>>
>> // terms = { "hola", "mi", "nombre", "es", "vicente" ]
>>
>>
>> On 2012-06-15 14:01, vicente garcia wrote:
>>> Hi, I have a little doubt.
>>>
>>> I'd like to tokenize a string. Something like this:
>>>
>>> StandardAnalyzer analyzer = new StandardAnalyzer("hola mi nombre es
>>> Vicente");
>>>
>>> List<string> tokens = analyzer.GetTokens();
>>>
>>> And tokens is: [hola] [mi] [nombre] [es] [Vicente]
>>>
>>> is this possible?
>>>
>>> Thanks :)
>>>
>>>
>
>
Re: Tokenize a string
Posted by vicente garcia <vi...@gmail.com>.
Thank you very much, it works!!
But what is the meaning of "field"?
Thanks a lot :)
On Fri, Jun 15, 2012 at 2:23 PM, Simon Svensson <si...@devhost.se> wrote:
> var analyzer = new StandardAnalyzer(Version.LUCENE_29);
> var textReader = new StringReader("hola mi nombre es Vicente");
> var tokenStream = analyzer.TokenStream("field", textReader);
> var terms = new List<String>();
> var termAttribute =
> (TermAttribute)tokenStream.GetAttribute(typeof(TermAttribute));
> while(tokenStream.IncrementToken()) {
> terms.Add(termAttribute.Term());
> }
>
> // terms = { "hola", "mi", "nombre", "es", "vicente" ]
>
>
> On 2012-06-15 14:01, vicente garcia wrote:
>>
>> Hi, I have a little doubt.
>>
>> I'd like to tokenize a string. Something like this:
>>
>> StandardAnalyzer analyzer = new StandardAnalyzer("hola mi nombre es
>> Vicente");
>>
>> List<string> tokens = analyzer.GetTokens();
>>
>> And tokens is: [hola] [mi] [nombre] [es] [Vicente]
>>
>> is this possible?
>>
>> Thanks :)
>>
>>
>
--
LinkedIn profile: http://www.linkedin.com/in/vicentegarcia
Twiter: http://twitter.com/clrstack
Blog: http://geeks.ms/blogs/vgarcia
Re: Tokenize a string
Posted by Simon Svensson <si...@devhost.se>.
var analyzer = new StandardAnalyzer(Version.LUCENE_29);
var textReader = new StringReader("hola mi nombre es Vicente");
var tokenStream = analyzer.TokenStream("field", textReader);
var terms = new List<String>();
var termAttribute =
(TermAttribute)tokenStream.GetAttribute(typeof(TermAttribute));
while(tokenStream.IncrementToken()) {
terms.Add(termAttribute.Term());
}
// terms = { "hola", "mi", "nombre", "es", "vicente" ]
On 2012-06-15 14:01, vicente garcia wrote:
> Hi, I have a little doubt.
>
> I'd like to tokenize a string. Something like this:
>
> StandardAnalyzer analyzer = new StandardAnalyzer("hola mi nombre es Vicente");
>
> List<string> tokens = analyzer.GetTokens();
>
> And tokens is: [hola] [mi] [nombre] [es] [Vicente]
>
> is this possible?
>
> Thanks :)
>
>