You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "kai.hu" <li...@hotmail.com> on 2008/05/04 10:20:45 UTC
Re: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
你只要索引并分词“下午去开会”就行了,把对应的时间保存进去。
如document.add(new Field("sub","下午去开会",Field.Store.YES,Field.Index.TOKENIZED));
document.add(new
Field("time","01:02:02",Field.Store.YES,Field.Index.UN_TOKENIZED));
到时候搜索出的单个document里就包含这两个Field了。
only index and tokenized "下午去开会",and store the time with this sub.
--------------------------------------------------
From: "Cedric Ho" <ce...@gmail.com>
Sent: Tuesday, April 22, 2008 3:36 PM
To: <ja...@lucene.apache.org>
Subject: Re: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
> In that case you may want to index each:
>
> Field("Sub","下午去开会","01:02:02");
>
> as a separate document. So your document contains 3 fields
> 1. title
> 2. time
> 3. sub
>
> then you can get both title and time by searching the "sub" field.
>
> Cedric
>
>
> 2008/4/22 王建新 <li...@gmail.com>:
>>
>> 谢谢,我只是检索sub,不检索时间,在检索sub时,只想得到匹配Field对应的时间。
>>
>> 用payload似乎不可以?
>>
>>
>>
>> ----- Original Message -----
>> From: <Fa...@emc.com>
>> To: <ja...@lucene.apache.org>
>> Sent: Tuesday, April 22, 2008 1:55 PM
>> Subject: RE: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
>>
>>
>> Try to use payload which is stored as additional information. Currently
>> lucene only support per token payload, but you can add an arbitrary token
>> for the time information.
>>
>> I am not sure what are the query information? Only the subtitle or both
>> subtitle and time?
>>
>> Regards,
>>
>> -----Original Message-----
>> From: 王建新 [mailto:lieutroy@gmail.com]
>> Sent: Tuesday, April 22, 2008 1:06 PM
>> To: java-user
>> Subject: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
>>
>>
>> 用英文可能描述得不是很清楚,不好意思:)
>>
>>
>> ----- Original Message -----
>> From: 王建新
>> To: Chris
>> Sent: Tuesday, April 22, 2008 9:52 AM
>> Subject: Re: Need addtional info for Field
>>
>>
>> 谢谢。
>> 我的问题是这样的:要对一批视频文件(video)建立索引(index),在建立索引之前,我已经分析出了在视频的什么时间出现了什么样的字幕内容。
>> 在这种情况下,一个视频节目就相当于一个Document,那么需要(希望)对字幕建立索引,如下:
>> Field("Sub","下午去开会","01:02:02");
>> Field("Sub","后天去开会","01:03:05");
>> [注:"01:02:02"是附属的时间,lucene没有提供这种用法。]
>>
>> 这两个Field表示在当前的视频节目中,在01:02:02时间出现了字幕"下午去开会",在01:03:05时间出现了"后天去开会",如果用户(User)搜索"下午",当前视频节目是可以匹配的,但是只匹配到了第一个Field,只需要知道时间"01:02:02"。如果用户搜索"开会",则两个Field都可以匹配到。因此需要知道时间"01:02:02"和"01:03:05"。
>> 不知道我有没有说清楚。
>>
>> 我想知道lucene是不是可以通过某种方式解决这个问题,如果不行的话,需要怎样修改lucene呢?
>>
>> 王建新
>> ----- Original Message -----
>> From: Chris
>> To: 王建新
>> Sent: Monday, April 21, 2008 7:34 PM
>> Subject: Re: Need addtional info for Field
>>
>>
>> 您的功能可以再清楚一點嗎,因為其實這樣處理,好像要斷詞....
>>
>> 但看到您沒斷,而且欄位名稱一樣是 multi-pair 值的話,不是用 String 存哦
>>
>> 以上
>> Chris.
>>
>>
>> 2008/4/21, 王建新 <li...@gmail.com>:
>> 你看得懂中文吗?
>>
>> 我不是很明白你的意思。
>> 你是说可以用lucene现有的功能来解决这个问题吗?
>>
>> ----- Original Message -----
>> From: Chris
>> To: 王建新
>> Sent: Monday, April 21, 2008 5:14 PM
>> Subject: Re: Need addtional info for Field
>>
>>
>> This problem is not solve with lucene but or method will solve it.
>>
>> The structure is not define as this as well ......
>>
>> You may check it clear....
>>
>> above
>> Chris.
>>
>>
>> 2008/4/21, 王建新 <li...@gmail.com>:
>> hi Chris, it is me "王建新"
>>
>> I have a new problem, Could you give me any advice? Thank you.
>>
>>
>> I want to use lucene with some additional info,like:
>>
>> 1.index
>> Document additionalDoc=ew Document()
>>
>> additionalDoc.add(new Field("field","AA BB","Addtional info
>> ..............."));
>> additionalDoc.add(new Field("field","BB CC","Addtional info
>> 222222222222222222222222..............."));
>>
>> writer.addDocument(additionalDoc)
>>
>> ........
>>
>>
>> 2. search
>>
>> Searcher searcher;
>> ....
>>
>> searcher.search(termQuery("field","BB"));
>>
>>
>>
>>
>> in this condition, I want lucene returns the additionalDoc ,
>> also know which fileds were matched, then I will get the additional info
>> from the matched fields.
>>
>> Can lucene make it in version 2.3.1?
>>
>>
>>
>> --
>> Chris Lin
>> chrislin0426@gmail.com
>> Taipei , Taiwan.
>> -----------------------------------------------------------
>>
>>
>>
>> --
>> Chris Lin
>> chrislin0426@gmail.com
>> Taipei , Taiwan.
>> -----------------------------------------------------------
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
Posted by 王建新 <li...@gmail.com>.
好的,谢谢!
----- Original Message -----
From: "kai.hu" <li...@hotmail.com>
To: <ja...@lucene.apache.org>
Sent: Sunday, May 04, 2008 4:27 PM
Subject: Re: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
> 在google里搜一下中文分词,出车东的包外,应该还有很多了,如果你发现有更好分词,更高效率的,也推荐一份啊。
>
> --------------------------------------------------
> From: "kai.hu" <li...@hotmail.com>
> Sent: Sunday, May 04, 2008 4:20 PM
> To: <ja...@lucene.apache.org>
> Subject: Re: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
>
>
>> 你只要索引并分词“下午去开会”就行了,把对应的时间保存进去。
>> 如document.add(new Field("sub","下午去开会",Field.Store.YES,Field.Index.TOKENIZED));
>> document.add(new
>> Field("time","01:02:02",Field.Store.YES,Field.Index.UN_TOKENIZED));
>> 到时候搜索出的单个document里就包含这两个Field了。
>>
>> only index and tokenized "下午去开会",and store the time with this sub.
>>
>> --------------------------------------------------
>> From: "Cedric Ho" <ce...@gmail.com>
>> Sent: Tuesday, April 22, 2008 3:36 PM
>> To: <ja...@lucene.apache.org>
>> Subject: Re: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
>>
>>
>>> In that case you may want to index each:
>>>
>>> Field("Sub","下午去开会","01:02:02");
>>>
>>> as a separate document. So your document contains 3 fields
>>> 1. title
>>> 2. time
>>> 3. sub
>>>
>>> then you can get both title and time by searching the "sub" field.
>>>
>>> Cedric
>>>
>>>
>>> 2008/4/22 王建新 <li...@gmail.com>:
>>>>
>>>> 谢谢,我只是检索sub,不检索时间,在检索sub时,只想得到匹配Field对应的时间。
>>>> 用payload似乎不可以?
>>>>
>>>>
>>>>
>>>> ----- Original Message -----
>>>> From: <Fa...@emc.com>
>>>> To: <ja...@lucene.apache.org>
>>>> Sent: Tuesday, April 22, 2008 1:55 PM
>>>> Subject: RE: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
>>>>
>>>>
>>>> Try to use payload which is stored as additional information. Currently
>>>> lucene only support per token payload, but you can add an arbitrary
>>>> token for the time information.
>>>>
>>>> I am not sure what are the query information? Only the subtitle or both
>>>> subtitle and time?
>>>>
>>>> Regards,
>>>>
>>>> -----Original Message-----
>>>> From: 王建新 [mailto:lieutroy@gmail.com]
>>>> Sent: Tuesday, April 22, 2008 1:06 PM
>>>> To: java-user
>>>> Subject: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
>>>>
>>>> 用英文可能描述得不是很清楚,不好意思:)
>>>>
>>>>
>>>> ----- Original Message -----
>>>> From: 王建新
>>>> To: Chris
>>>> Sent: Tuesday, April 22, 2008 9:52 AM
>>>> Subject: Re: Need addtional info for Field
>>>>
>>>>
>>>> 谢谢。
>>>> 我的问题是这样的:要对一批视频文件(video)建立索引(index),在建立索引之前,我已经分析出了在视频的什么时间出现了什么样的字幕内容。
>>>> 在这种情况下,一个视频节目就相当于一个Document,那么需要(希望)对字幕建立索引,如下:
>>>> Field("Sub","下午去开会","01:02:02");
>>>> Field("Sub","后天去开会","01:03:05");
>>>> [注:"01:02:02"是附属的时间,lucene没有提供这种用法。]
>>>>
>>>> 这两个Field表示在当前的视频节目中,在01:02:02时间出现了字幕"下午去开会",在01:03:05时间出现了"后天去开会",如果用户(User)搜索"下午",当前视频节目是可以匹配的,但是只匹配到了第一个Field,只需要知道时间"01:02:02"。如果用户搜索"开会",则两个Field都可以匹配到。因此需要知道时间"01:02:02"和"01:03:05"。
>>>> 不知道我有没有说清楚。
>>>>
>>>> 我想知道lucene是不是可以通过某种方式解决这个问题,如果不行的话,需要怎样修改lucene呢?
>>>>
>>>> 王建新
>>>> ----- Original Message -----
>>>> From: Chris
>>>> To: 王建新
>>>> Sent: Monday, April 21, 2008 7:34 PM
>>>> Subject: Re: Need addtional info for Field
>>>>
>>>>
>>>> 您的功能可以再清楚一點嗎,因為其實這樣處理,好像要斷詞....
>>>>
>>>> 但看到您沒斷,而且欄位名稱一樣是 multi-pair 值的話,不是用 String 存哦
>>>>
>>>> 以上
>>>> Chris.
>>>>
>>>>
>>>> 2008/4/21, 王建新 <li...@gmail.com>:
>>>> 你看得懂中文吗?
>>>>
>>>> 我不是很明白你的意思。
>>>> 你是说可以用lucene现有的功能来解决这个问题吗?
>>>>
>>>> ----- Original Message -----
>>>> From: Chris
>>>> To: 王建新
>>>> Sent: Monday, April 21, 2008 5:14 PM
>>>> Subject: Re: Need addtional info for Field
>>>>
>>>>
>>>> This problem is not solve with lucene but or method will solve it.
>>>>
>>>> The structure is not define as this as well ......
>>>>
>>>> You may check it clear....
>>>>
>>>> above
>>>> Chris.
>>>>
>>>>
>>>> 2008/4/21, 王建新 <li...@gmail.com>:
>>>> hi Chris, it is me "王建新"
>>>>
>>>> I have a new problem, Could you give me any advice? Thank you.
>>>>
>>>>
>>>> I want to use lucene with some additional info,like:
>>>>
>>>> 1.index
>>>> Document additionalDoc=ew Document()
>>>>
>>>> additionalDoc.add(new Field("field","AA BB","Addtional info
>>>> ..............."));
>>>> additionalDoc.add(new Field("field","BB CC","Addtional info
>>>> 222222222222222222222222..............."));
>>>>
>>>> writer.addDocument(additionalDoc)
>>>>
>>>> ........
>>>>
>>>>
>>>> 2. search
>>>>
>>>> Searcher searcher;
>>>> ....
>>>>
>>>> searcher.search(termQuery("field","BB"));
>>>>
>>>>
>>>>
>>>>
>>>> in this condition, I want lucene returns the additionalDoc ,
>>>> also know which fileds were matched, then I will get the additional info
>>>> from the matched fields.
>>>>
>>>> Can lucene make it in version 2.3.1?
>>>>
>>>>
>>>>
>>>> --
>>>> Chris Lin
>>>> chrislin0426@gmail.com
>>>> Taipei , Taiwan.
>>>> -----------------------------------------------------------
>>>>
>>>>
>>>>
>>>> --
>>>> Chris Lin
>>>> chrislin0426@gmail.com
>>>> Taipei , Taiwan.
>>>> -----------------------------------------------------------
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
Re: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
Posted by "kai.hu" <li...@hotmail.com>.
在google里搜一下中文分词,出车东的包外,应该还有很多了,如果你发现有更好分词,更高效率的,也推荐一份啊。
--------------------------------------------------
From: "kai.hu" <li...@hotmail.com>
Sent: Sunday, May 04, 2008 4:20 PM
To: <ja...@lucene.apache.org>
Subject: Re: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
> 你只要索引并分词“下午去开会”就行了,把对应的时间保存进去。
> 如document.add(new Field("sub","下午去开会",Field.Store.YES,Field.Index.TOKENIZED));
> document.add(new
> Field("time","01:02:02",Field.Store.YES,Field.Index.UN_TOKENIZED));
> 到时候搜索出的单个document里就包含这两个Field了。
>
> only index and tokenized "下午去开会",and store the time with this sub.
>
> --------------------------------------------------
> From: "Cedric Ho" <ce...@gmail.com>
> Sent: Tuesday, April 22, 2008 3:36 PM
> To: <ja...@lucene.apache.org>
> Subject: Re: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
>
>
>> In that case you may want to index each:
>>
>> Field("Sub","下午去开会","01:02:02");
>>
>> as a separate document. So your document contains 3 fields
>> 1. title
>> 2. time
>> 3. sub
>>
>> then you can get both title and time by searching the "sub" field.
>>
>> Cedric
>>
>>
>> 2008/4/22 王建新 <li...@gmail.com>:
>>>
>>> 谢谢,我只是检索sub,不检索时间,在检索sub时,只想得到匹配Field对应的时间。
>>> 用payload似乎不可以?
>>>
>>>
>>>
>>> ----- Original Message -----
>>> From: <Fa...@emc.com>
>>> To: <ja...@lucene.apache.org>
>>> Sent: Tuesday, April 22, 2008 1:55 PM
>>> Subject: RE: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
>>>
>>>
>>> Try to use payload which is stored as additional information. Currently
>>> lucene only support per token payload, but you can add an arbitrary
>>> token for the time information.
>>>
>>> I am not sure what are the query information? Only the subtitle or both
>>> subtitle and time?
>>>
>>> Regards,
>>>
>>> -----Original Message-----
>>> From: 王建新 [mailto:lieutroy@gmail.com]
>>> Sent: Tuesday, April 22, 2008 1:06 PM
>>> To: java-user
>>> Subject: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
>>>
>>> 用英文可能描述得不是很清楚,不好意思:)
>>>
>>>
>>> ----- Original Message -----
>>> From: 王建新
>>> To: Chris
>>> Sent: Tuesday, April 22, 2008 9:52 AM
>>> Subject: Re: Need addtional info for Field
>>>
>>>
>>> 谢谢。
>>> 我的问题是这样的:要对一批视频文件(video)建立索引(index),在建立索引之前,我已经分析出了在视频的什么时间出现了什么样的字幕内容。
>>> 在这种情况下,一个视频节目就相当于一个Document,那么需要(希望)对字幕建立索引,如下:
>>> Field("Sub","下午去开会","01:02:02");
>>> Field("Sub","后天去开会","01:03:05");
>>> [注:"01:02:02"是附属的时间,lucene没有提供这种用法。]
>>>
>>> 这两个Field表示在当前的视频节目中,在01:02:02时间出现了字幕"下午去开会",在01:03:05时间出现了"后天去开会",如果用户(User)搜索"下午",当前视频节目是可以匹配的,但是只匹配到了第一个Field,只需要知道时间"01:02:02"。如果用户搜索"开会",则两个Field都可以匹配到。因此需要知道时间"01:02:02"和"01:03:05"。
>>> 不知道我有没有说清楚。
>>>
>>> 我想知道lucene是不是可以通过某种方式解决这个问题,如果不行的话,需要怎样修改lucene呢?
>>>
>>> 王建新
>>> ----- Original Message -----
>>> From: Chris
>>> To: 王建新
>>> Sent: Monday, April 21, 2008 7:34 PM
>>> Subject: Re: Need addtional info for Field
>>>
>>>
>>> 您的功能可以再清楚一點嗎,因為其實這樣處理,好像要斷詞....
>>>
>>> 但看到您沒斷,而且欄位名稱一樣是 multi-pair 值的話,不是用 String 存哦
>>>
>>> 以上
>>> Chris.
>>>
>>>
>>> 2008/4/21, 王建新 <li...@gmail.com>:
>>> 你看得懂中文吗?
>>>
>>> 我不是很明白你的意思。
>>> 你是说可以用lucene现有的功能来解决这个问题吗?
>>>
>>> ----- Original Message -----
>>> From: Chris
>>> To: 王建新
>>> Sent: Monday, April 21, 2008 5:14 PM
>>> Subject: Re: Need addtional info for Field
>>>
>>>
>>> This problem is not solve with lucene but or method will solve it.
>>>
>>> The structure is not define as this as well ......
>>>
>>> You may check it clear....
>>>
>>> above
>>> Chris.
>>>
>>>
>>> 2008/4/21, 王建新 <li...@gmail.com>:
>>> hi Chris, it is me "王建新"
>>>
>>> I have a new problem, Could you give me any advice? Thank you.
>>>
>>>
>>> I want to use lucene with some additional info,like:
>>>
>>> 1.index
>>> Document additionalDoc=ew Document()
>>>
>>> additionalDoc.add(new Field("field","AA BB","Addtional info
>>> ..............."));
>>> additionalDoc.add(new Field("field","BB CC","Addtional info
>>> 222222222222222222222222..............."));
>>>
>>> writer.addDocument(additionalDoc)
>>>
>>> ........
>>>
>>>
>>> 2. search
>>>
>>> Searcher searcher;
>>> ....
>>>
>>> searcher.search(termQuery("field","BB"));
>>>
>>>
>>>
>>>
>>> in this condition, I want lucene returns the additionalDoc ,
>>> also know which fileds were matched, then I will get the additional info
>>> from the matched fields.
>>>
>>> Can lucene make it in version 2.3.1?
>>>
>>>
>>>
>>> --
>>> Chris Lin
>>> chrislin0426@gmail.com
>>> Taipei , Taiwan.
>>> -----------------------------------------------------------
>>>
>>>
>>>
>>> --
>>> Chris Lin
>>> chrislin0426@gmail.com
>>> Taipei , Taiwan.
>>> -----------------------------------------------------------
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org