You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by "kai.hu" <li...@hotmail.com> on 2008/05/04 10:20:45 UTC

Re: Need addtional info for Field（希望看得懂中文的朋友帮我出出主意）

你只要索引并分词“下午去开会”就行了，把对应的时间保存进去。
如document.add(new Field("sub","下午去开会",Field.Store.YES,Field.Index.TOKENIZED));
document.add(new 
Field("time","01:02:02",Field.Store.YES,Field.Index.UN_TOKENIZED));
到时候搜索出的单个document里就包含这两个Field了。

only index and tokenized "下午去开会",and store the time with this sub.

--------------------------------------------------
From: "Cedric Ho" <ce...@gmail.com>
Sent: Tuesday, April 22, 2008 3:36 PM
To: <ja...@lucene.apache.org>
Subject: Re: Need addtional info for Field（希望看得懂中文的朋友帮我出出主意） 


> In that case you may want to index each:
>
> Field("Sub","下午去开会"，"01:02:02");
>
> as a separate document. So your document contains 3 fields
> 1. title
> 2. time
> 3. sub
>
> then you can get both title and time by searching the "sub" field.
>
> Cedric
>
>
> 2008/4/22 王建新 <li...@gmail.com>:
>>
>>  谢谢，我只是检索sub，不检索时间，在检索sub时，只想得到匹配Field对应的时间。 
>> 
>>  用payload似乎不可以？
>>
>>
>>
>>  ----- Original Message -----
>>  From: <Fa...@emc.com>
>>  To: <ja...@lucene.apache.org>
>>  Sent: Tuesday, April 22, 2008 1:55 PM
>>  Subject: RE: Need addtional info for Field（希望看得懂中文的朋友帮我出出主意）
>>
>>
>>  Try to use payload which is stored as additional information. Currently 
>> lucene only support per token payload, but you can add an arbitrary token 
>> for the time information.
>>
>>  I am not sure what are the query information? Only the subtitle or both 
>> subtitle and time?
>>
>>  Regards,
>>
>>  -----Original Message-----
>>  From: 王建新 [mailto:lieutroy@gmail.com]
>>  Sent: Tuesday, April 22, 2008 1:06 PM
>>  To: java-user
>>  Subject: Need addtional info for Field（希望看得懂中文的朋友帮我出出主意） 
>> 
>>
>>  用英文可能描述得不是很清楚，不好意思：）
>>
>>
>>  ----- Original Message -----
>>  From: 王建新
>>  To: Chris
>>  Sent: Tuesday, April 22, 2008 9:52 AM
>>  Subject: Re: Need addtional info for Field
>>
>>
>>  谢谢。
>>  我的问题是这样的：要对一批视频文件(video)建立索引(index)，在建立索引之前，我已经分析出了在视频的什么时间出现了什么样的字幕内容。
>>  在这种情况下，一个视频节目就相当于一个Document,那么需要(希望)对字幕建立索引，如下：
>>     Field("Sub","下午去开会"，"01:02:02");
>>     Field("Sub","后天去开会"，"01:03:05");
>>     [注："01:02:02"是附属的时间，lucene没有提供这种用法。]
>>
>>  这两个Field表示在当前的视频节目中，在01:02:02时间出现了字幕"下午去开会"，在01:03:05时间出现了"后天去开会"，如果用户(User)搜索"下午"，当前视频节目是可以匹配的，但是只匹配到了第一个Field，只需要知道时间"01:02:02"。如果用户搜索"开会"，则两个Field都可以匹配到。因此需要知道时间"01:02:02"和"01:03:05"。
>>  不知道我有没有说清楚。
>>
>>  我想知道lucene是不是可以通过某种方式解决这个问题，如果不行的话，需要怎样修改lucene呢？
>>
>>  王建新
>>   ----- Original Message -----
>>   From: Chris
>>   To: 王建新
>>   Sent: Monday, April 21, 2008 7:34 PM
>>   Subject: Re: Need addtional info for Field
>>
>>
>>   您的功能可以再清楚一點嗎，因為其實這樣處理，好像要斷詞....
>>
>>   但看到您沒斷，而且欄位名稱一樣是 multi-pair 值的話，不是用 String 存哦
>>
>>   以上
>>                      Chris.
>>
>>
>>   2008/4/21, 王建新 <li...@gmail.com>:
>>     你看得懂中文吗？
>>
>>     我不是很明白你的意思。
>>     你是说可以用lucene现有的功能来解决这个问题吗？
>>
>>       ----- Original Message -----
>>       From: Chris
>>       To: 王建新
>>       Sent: Monday, April 21, 2008 5:14 PM
>>       Subject: Re: Need addtional info for Field
>>
>>
>>       This problem is not solve with lucene but or method will solve it.
>>
>>       The structure is not define as this as well ......
>>
>>       You may check it clear....
>>
>>       above
>>                      Chris.
>>
>>
>>       2008/4/21, 王建新 <li...@gmail.com>:
>>         hi Chris, it is me "王建新"
>>
>>         I have a new problem, Could you give me any advice? Thank you.
>>
>>
>>         I want to use lucene with some additional info,like:
>>
>>         1.index
>>             Document additionalDoc=ew Document()
>>
>>             additionalDoc.add(new Field("field","AA BB","Addtional info 
>> ..............."));
>>             additionalDoc.add(new Field("field","BB CC","Addtional info 
>> 222222222222222222222222..............."));
>>
>>             writer.addDocument(additionalDoc)
>>
>>             ........
>>
>>
>>         2. search
>>
>>             Searcher searcher;
>>             ....
>>
>>             searcher.search(termQuery("field","BB"));
>>
>>
>>
>>
>>             in this condition, I want lucene returns the additionalDoc , 
>> also know which fileds were matched, then I will get the additional info 
>> from the matched fields.
>>
>>         Can lucene make it in version 2.3.1?
>>
>>
>>
>>       --
>>       Chris Lin
>>       chrislin0426@gmail.com
>>       Taipei , Taiwan.
>>       -----------------------------------------------------------
>>
>>
>>
>>   --
>>   Chris Lin
>>   chrislin0426@gmail.com
>>   Taipei , Taiwan.
>>   -----------------------------------------------------------
>>
>>  ---------------------------------------------------------------------
>>  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>  For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Need addtional info for Field（希望看得懂中文的朋友帮我出出主意）

Posted by 王建新 <li...@gmail.com>.

好的，谢谢！

----- Original Message ----- 
From: "kai.hu" <li...@hotmail.com>
To: <ja...@lucene.apache.org>
Sent: Sunday, May 04, 2008 4:27 PM
Subject: Re: Need addtional info for Field（希望看得懂中文的朋友帮我出出主意）


> 在google里搜一下中文分词，出车东的包外，应该还有很多了，如果你发现有更好分词，更高效率的，也推荐一份啊。
> 
> --------------------------------------------------
> From: "kai.hu" <li...@hotmail.com>
> Sent: Sunday, May 04, 2008 4:20 PM
> To: <ja...@lucene.apache.org>
> Subject: Re: Need addtional info for Field（希望看得懂中文的朋友帮我出出主意） 
> 
> 
>> 你只要索引并分词“下午去开会”就行了，把对应的时间保存进去。
>> 如document.add(new Field("sub","下午去开会",Field.Store.YES,Field.Index.TOKENIZED));
>> document.add(new 
>> Field("time","01:02:02",Field.Store.YES,Field.Index.UN_TOKENIZED));
>> 到时候搜索出的单个document里就包含这两个Field了。
>>
>> only index and tokenized "下午去开会",and store the time with this sub.
>>
>> --------------------------------------------------
>> From: "Cedric Ho" <ce...@gmail.com>
>> Sent: Tuesday, April 22, 2008 3:36 PM
>> To: <ja...@lucene.apache.org>
>> Subject: Re: Need addtional info for Field（希望看得懂中文的朋友帮我出出主意） 
>> 
>>
>>> In that case you may want to index each:
>>>
>>> Field("Sub","下午去开会"，"01:02:02");
>>>
>>> as a separate document. So your document contains 3 fields
>>> 1. title
>>> 2. time
>>> 3. sub
>>>
>>> then you can get both title and time by searching the "sub" field.
>>>
>>> Cedric
>>>
>>>
>>> 2008/4/22 王建新 <li...@gmail.com>:
>>>>
>>>>  谢谢，我只是检索sub，不检索时间，在检索sub时，只想得到匹配Field对应的时间。 
>>>>  用payload似乎不可以？
>>>>
>>>>
>>>>
>>>>  ----- Original Message -----
>>>>  From: <Fa...@emc.com>
>>>>  To: <ja...@lucene.apache.org>
>>>>  Sent: Tuesday, April 22, 2008 1:55 PM
>>>>  Subject: RE: Need addtional info for Field（希望看得懂中文的朋友帮我出出主意）
>>>>
>>>>
>>>>  Try to use payload which is stored as additional information. Currently 
>>>> lucene only support per token payload, but you can add an arbitrary 
>>>> token for the time information.
>>>>
>>>>  I am not sure what are the query information? Only the subtitle or both 
>>>> subtitle and time?
>>>>
>>>>  Regards,
>>>>
>>>>  -----Original Message-----
>>>>  From: 王建新 [mailto:lieutroy@gmail.com]
>>>>  Sent: Tuesday, April 22, 2008 1:06 PM
>>>>  To: java-user
>>>>  Subject: Need addtional info for Field（希望看得懂中文的朋友帮我出出主意） 
>>>> 
>>>>  用英文可能描述得不是很清楚，不好意思：）
>>>>
>>>>
>>>>  ----- Original Message -----
>>>>  From: 王建新
>>>>  To: Chris
>>>>  Sent: Tuesday, April 22, 2008 9:52 AM
>>>>  Subject: Re: Need addtional info for Field
>>>>
>>>>
>>>>  谢谢。
>>>>  我的问题是这样的：要对一批视频文件(video)建立索引(index)，在建立索引之前，我已经分析出了在视频的什么时间出现了什么样的字幕内容。
>>>>  在这种情况下，一个视频节目就相当于一个Document,那么需要(希望)对字幕建立索引，如下：
>>>>     Field("Sub","下午去开会"，"01:02:02");
>>>>     Field("Sub","后天去开会"，"01:03:05");
>>>>     [注："01:02:02"是附属的时间，lucene没有提供这种用法。]
>>>>
>>>>  这两个Field表示在当前的视频节目中，在01:02:02时间出现了字幕"下午去开会"，在01:03:05时间出现了"后天去开会"，如果用户(User)搜索"下午"，当前视频节目是可以匹配的，但是只匹配到了第一个Field，只需要知道时间"01:02:02"。如果用户搜索"开会"，则两个Field都可以匹配到。因此需要知道时间"01:02:02"和"01:03:05"。
>>>>  不知道我有没有说清楚。
>>>>
>>>>  我想知道lucene是不是可以通过某种方式解决这个问题，如果不行的话，需要怎样修改lucene呢？
>>>>
>>>>  王建新
>>>>   ----- Original Message -----
>>>>   From: Chris
>>>>   To: 王建新
>>>>   Sent: Monday, April 21, 2008 7:34 PM
>>>>   Subject: Re: Need addtional info for Field
>>>>
>>>>
>>>>   您的功能可以再清楚一點嗎，因為其實這樣處理，好像要斷詞....
>>>>
>>>>   但看到您沒斷，而且欄位名稱一樣是 multi-pair 值的話，不是用 String 存哦
>>>>
>>>>   以上
>>>>                      Chris.
>>>>
>>>>
>>>>   2008/4/21, 王建新 <li...@gmail.com>:
>>>>     你看得懂中文吗？
>>>>
>>>>     我不是很明白你的意思。
>>>>     你是说可以用lucene现有的功能来解决这个问题吗？
>>>>
>>>>       ----- Original Message -----
>>>>       From: Chris
>>>>       To: 王建新
>>>>       Sent: Monday, April 21, 2008 5:14 PM
>>>>       Subject: Re: Need addtional info for Field
>>>>
>>>>
>>>>       This problem is not solve with lucene but or method will solve it.
>>>>
>>>>       The structure is not define as this as well ......
>>>>
>>>>       You may check it clear....
>>>>
>>>>       above
>>>>                      Chris.
>>>>
>>>>
>>>>       2008/4/21, 王建新 <li...@gmail.com>:
>>>>         hi Chris, it is me "王建新"
>>>>
>>>>         I have a new problem, Could you give me any advice? Thank you.
>>>>
>>>>
>>>>         I want to use lucene with some additional info,like:
>>>>
>>>>         1.index
>>>>             Document additionalDoc=ew Document()
>>>>
>>>>             additionalDoc.add(new Field("field","AA BB","Addtional info 
>>>> ..............."));
>>>>             additionalDoc.add(new Field("field","BB CC","Addtional info 
>>>> 222222222222222222222222..............."));
>>>>
>>>>             writer.addDocument(additionalDoc)
>>>>
>>>>             ........
>>>>
>>>>
>>>>         2. search
>>>>
>>>>             Searcher searcher;
>>>>             ....
>>>>
>>>>             searcher.search(termQuery("field","BB"));
>>>>
>>>>
>>>>
>>>>
>>>>             in this condition, I want lucene returns the additionalDoc , 
>>>> also know which fileds were matched, then I will get the additional info 
>>>> from the matched fields.
>>>>
>>>>         Can lucene make it in version 2.3.1?
>>>>
>>>>
>>>>
>>>>       --
>>>>       Chris Lin
>>>>       chrislin0426@gmail.com
>>>>       Taipei , Taiwan.
>>>>       -----------------------------------------------------------
>>>>
>>>>
>>>>
>>>>   --
>>>>   Chris Lin
>>>>   chrislin0426@gmail.com
>>>>   Taipei , Taiwan.
>>>>   -----------------------------------------------------------
>>>>
>>>>  ---------------------------------------------------------------------
>>>>  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>  For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

Re: Need addtional info for Field（希望看得懂中文的朋友帮我出出主意）

Posted by "kai.hu" <li...@hotmail.com>.

在google里搜一下中文分词，出车东的包外，应该还有很多了，如果你发现有更好分词，更高效率的，也推荐一份啊。

--------------------------------------------------
From: "kai.hu" <li...@hotmail.com>
Sent: Sunday, May 04, 2008 4:20 PM
To: <ja...@lucene.apache.org>
Subject: Re: Need addtional info for Field（希望看得懂中文的朋友帮我出出主意） 


> 你只要索引并分词“下午去开会”就行了，把对应的时间保存进去。
> 如document.add(new Field("sub","下午去开会",Field.Store.YES,Field.Index.TOKENIZED));
> document.add(new 
> Field("time","01:02:02",Field.Store.YES,Field.Index.UN_TOKENIZED));
> 到时候搜索出的单个document里就包含这两个Field了。
>
> only index and tokenized "下午去开会",and store the time with this sub.
>
> --------------------------------------------------
> From: "Cedric Ho" <ce...@gmail.com>
> Sent: Tuesday, April 22, 2008 3:36 PM
> To: <ja...@lucene.apache.org>
> Subject: Re: Need addtional info for Field（希望看得懂中文的朋友帮我出出主意） 
> 
>
>> In that case you may want to index each:
>>
>> Field("Sub","下午去开会"，"01:02:02");
>>
>> as a separate document. So your document contains 3 fields
>> 1. title
>> 2. time
>> 3. sub
>>
>> then you can get both title and time by searching the "sub" field.
>>
>> Cedric
>>
>>
>> 2008/4/22 王建新 <li...@gmail.com>:
>>>
>>>  谢谢，我只是检索sub，不检索时间，在检索sub时，只想得到匹配Field对应的时间。 
>>>  用payload似乎不可以？
>>>
>>>
>>>
>>>  ----- Original Message -----
>>>  From: <Fa...@emc.com>
>>>  To: <ja...@lucene.apache.org>
>>>  Sent: Tuesday, April 22, 2008 1:55 PM
>>>  Subject: RE: Need addtional info for Field（希望看得懂中文的朋友帮我出出主意）
>>>
>>>
>>>  Try to use payload which is stored as additional information. Currently 
>>> lucene only support per token payload, but you can add an arbitrary 
>>> token for the time information.
>>>
>>>  I am not sure what are the query information? Only the subtitle or both 
>>> subtitle and time?
>>>
>>>  Regards,
>>>
>>>  -----Original Message-----
>>>  From: 王建新 [mailto:lieutroy@gmail.com]
>>>  Sent: Tuesday, April 22, 2008 1:06 PM
>>>  To: java-user
>>>  Subject: Need addtional info for Field（希望看得懂中文的朋友帮我出出主意） 
>>> 
>>>  用英文可能描述得不是很清楚，不好意思：）
>>>
>>>
>>>  ----- Original Message -----
>>>  From: 王建新
>>>  To: Chris
>>>  Sent: Tuesday, April 22, 2008 9:52 AM
>>>  Subject: Re: Need addtional info for Field
>>>
>>>
>>>  谢谢。
>>>  我的问题是这样的：要对一批视频文件(video)建立索引(index)，在建立索引之前，我已经分析出了在视频的什么时间出现了什么样的字幕内容。
>>>  在这种情况下，一个视频节目就相当于一个Document,那么需要(希望)对字幕建立索引，如下：
>>>     Field("Sub","下午去开会"，"01:02:02");
>>>     Field("Sub","后天去开会"，"01:03:05");
>>>     [注："01:02:02"是附属的时间，lucene没有提供这种用法。]
>>>
>>>  这两个Field表示在当前的视频节目中，在01:02:02时间出现了字幕"下午去开会"，在01:03:05时间出现了"后天去开会"，如果用户(User)搜索"下午"，当前视频节目是可以匹配的，但是只匹配到了第一个Field，只需要知道时间"01:02:02"。如果用户搜索"开会"，则两个Field都可以匹配到。因此需要知道时间"01:02:02"和"01:03:05"。
>>>  不知道我有没有说清楚。
>>>
>>>  我想知道lucene是不是可以通过某种方式解决这个问题，如果不行的话，需要怎样修改lucene呢？
>>>
>>>  王建新
>>>   ----- Original Message -----
>>>   From: Chris
>>>   To: 王建新
>>>   Sent: Monday, April 21, 2008 7:34 PM
>>>   Subject: Re: Need addtional info for Field
>>>
>>>
>>>   您的功能可以再清楚一點嗎，因為其實這樣處理，好像要斷詞....
>>>
>>>   但看到您沒斷，而且欄位名稱一樣是 multi-pair 值的話，不是用 String 存哦
>>>
>>>   以上
>>>                      Chris.
>>>
>>>
>>>   2008/4/21, 王建新 <li...@gmail.com>:
>>>     你看得懂中文吗？
>>>
>>>     我不是很明白你的意思。
>>>     你是说可以用lucene现有的功能来解决这个问题吗？
>>>
>>>       ----- Original Message -----
>>>       From: Chris
>>>       To: 王建新
>>>       Sent: Monday, April 21, 2008 5:14 PM
>>>       Subject: Re: Need addtional info for Field
>>>
>>>
>>>       This problem is not solve with lucene but or method will solve it.
>>>
>>>       The structure is not define as this as well ......
>>>
>>>       You may check it clear....
>>>
>>>       above
>>>                      Chris.
>>>
>>>
>>>       2008/4/21, 王建新 <li...@gmail.com>:
>>>         hi Chris, it is me "王建新"
>>>
>>>         I have a new problem, Could you give me any advice? Thank you.
>>>
>>>
>>>         I want to use lucene with some additional info,like:
>>>
>>>         1.index
>>>             Document additionalDoc=ew Document()
>>>
>>>             additionalDoc.add(new Field("field","AA BB","Addtional info 
>>> ..............."));
>>>             additionalDoc.add(new Field("field","BB CC","Addtional info 
>>> 222222222222222222222222..............."));
>>>
>>>             writer.addDocument(additionalDoc)
>>>
>>>             ........
>>>
>>>
>>>         2. search
>>>
>>>             Searcher searcher;
>>>             ....
>>>
>>>             searcher.search(termQuery("field","BB"));
>>>
>>>
>>>
>>>
>>>             in this condition, I want lucene returns the additionalDoc , 
>>> also know which fileds were matched, then I will get the additional info 
>>> from the matched fields.
>>>
>>>         Can lucene make it in version 2.3.1?
>>>
>>>
>>>
>>>       --
>>>       Chris Lin
>>>       chrislin0426@gmail.com
>>>       Taipei , Taiwan.
>>>       -----------------------------------------------------------
>>>
>>>
>>>
>>>   --
>>>   Chris Lin
>>>   chrislin0426@gmail.com
>>>   Taipei , Taiwan.
>>>   -----------------------------------------------------------
>>>
>>>  ---------------------------------------------------------------------
>>>  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>  For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org