You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Mead Lai <la...@gmail.com> on 2011/10/20 09:26:34 UTC

About "join.search" in 3.4 version.

Hello all,

Now, I find there is a "org.apache.lucene.search.join" function in Lucene
3.4 version.
But I found no demo for "join" function in the source code package:
"lucene-3.4.0-src.tar".

Now I have some articles, which could be modified by editors, like this
relationship:
 an article : modify records = 1:n.

Document of article: contain the text of this article.
Document of records: article_id, name of editor, date_time(when modify it).
Search condition would be: keywords(search article),name of editor, range of
time(start_time, end_time),
that will find the articles in some particular time which had been modified
by someone.

E.g: condition = during '2011-09-23' to '2011-10-19', editor: 'Alan',
keyword: 'duck'.
The results will found all articles contain 'duck', and edited by 'Alan'
between '2011-09-23' and '2011-10-19'.
My question is, could "org.apache.lucene.search.join" solve this case?
If possible, thanks for providing some example or clue.

Thanks for your time.

Regards,
Mead Lai

Re: Lucene java doc help

Posted by Ian Lea <ia...@gmail.com>.

The "Expert" note means that ordinary non-expert users like me should
not be using it directly.  It will likely be called behind the scenes
by some other method not flagged as expert, and it is those that we
should be using.  Only developers and clever people doing clever
extensions and the like would be expected to use the expert methods
directly.

--
Ian.

On Fri, Oct 21, 2011 at 8:30 AM, janwen <to...@163.com> wrote:
> Following is an Enum Constant from lucene java doc,I do not understand the meaning of "Expert" note,What does the Expert mean? thanks
> ANALYZED_NO_NORMS
> public static final Field.Index ANALYZED_NO_NORMS
> Expert: Index the tokens produced by running the field's value through an Analyzer, and also separately disable the storing of norms. See NOT_ANALYZED_NO_NORMS for what norms are and why you may want to disable them.
>
> 2011-10-21
>
>
>
> janwen | China
> website : http://www.qianpin.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Lucene java doc help

Posted by janwen <to...@163.com>.

Following is an Enum Constant from lucene java doc,I do not understand the meaning of "Expert" note,What does the Expert mean? thanks
ANALYZED_NO_NORMS
public static final Field.Index ANALYZED_NO_NORMS
Expert: Index the tokens produced by running the field's value through an Analyzer, and also separately disable the storing of norms. See NOT_ANALYZED_NO_NORMS for what norms are and why you may want to disable them. 

2011-10-21



janwen | China 
website : http://www.qianpin.com/

Re: About "join.search" in 3.4 version.

Posted by Mead Lai <la...@gmail.com>.

Now I have create a filter by override  "DocIdSet getDocIdSet (IndexReader
reader) throws IOException ".
It works nice, but I feel anxious about the efficiency.
The* limit[]* would contain one hundred thousand article_id inside(10,000),
and fetech one thousand articles by querying keywords on content.

       TopDocs topDocs = searcher.search(resultQuery, filter, *1000*, sort);

 -----Would it be slow and inefficiency?  the total articles is one million
amount documents in our system.
-----Thank you.

 @Override
 public DocIdSet getDocIdSet(IndexReader reader) throws IOException {
  final OpenBitSet bits = new OpenBitSet(reader.maxDoc());
  String[] *limit* = new String[]{"id_11645","id_11646"};
  int[] docs = new int[1];
  int[] freqs = new int[1];
  for (String id : limit) {
   if (id != null) {
    TermDocs termDocs = reader.termDocs(new Term("id", id));
    int count = termDocs.read(docs, freqs);
    if (count == 1) {
     bits.set(docs[0]);
    }
   }
  }
  return bits;
 }

Regards,
Mead


On Fri, Oct 21, 2011 at 9:06 AM, Mead Lai <la...@gmail.com> wrote:

> Thank you, Mike.
> Are you sure the 'Solr' has implemented 'Join' function.
> I just skims through some tour guids about Solr, and not sure about that.
> Appreciate you very much.
>
> I figure out another way to handler this problem.
> Our system also has duplication of these articles and the records(about who
> and when edit this article) in the database,
> so, I shall search the data with 'time range' condtion in the database
> first, then, use a Lucene Filter to get right results.
>
> SELECT DISTINCT article_ids FROM records r
> WHERE r.edit_date > '2011-09-23' and r.edit_date < '2011-10-19' and
> r.user_id='000000_editor_id'
> and, the article_ids will pass into Lucene to filter the search results.
>
> Althought it's a little clumsy and stupid, it can work for this case.
>
> Regards,
> Mead
>
>
>   On Thu, Oct 20, 2011 at 6:44 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> I don't think the new join package in Lucene 3.4 will work for this
>> case; you need more general join implementation, which eg Solr and
>> ElasticSearch have implemented.
>>
>> Generic join hasn't yet been factored out into Lucene (but I think it
>> really needs to be... any volunteers!?).
>>
>> Lucene's join package can handle use cases like nested documents or
>> parent/child, because it requires that you index a single primary row
>> AND all joined documents together as a single block of documents.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Thu, Oct 20, 2011 at 3:26 AM, Mead Lai <la...@gmail.com> wrote:
>> > Hello all,
>> >
>> > Now, I find there is a "org.apache.lucene.search.join" function in
>> Lucene
>> > 3.4 version.
>> > But I found no demo for "join" function in the source code package:
>> > "lucene-3.4.0-src.tar".
>> >
>> > Now I have some articles, which could be modified by editors, like this
>> > relationship:
>> >  an article : modify records = 1:n.
>> >
>> > Document of article: contain the text of this article.
>> > Document of records: article_id, name of editor, date_time(when modify
>> it).
>> > Search condition would be: keywords(search article),name of editor,
>> range of
>> > time(start_time, end_time),
>> > that will find the articles in some particular time which had been
>> modified
>> > by someone.
>> >
>> > E.g: condition = during '2011-09-23' to '2011-10-19', editor: 'Alan',
>> > keyword: 'duck'.
>> > The results will found all articles contain 'duck', and edited by 'Alan'
>> > between '2011-09-23' and '2011-10-19'.
>> > My question is, could "org.apache.lucene.search.join" solve this case?
>> > If possible, thanks for providing some example or clue.
>> >
>> > Thanks for your time.
>> >
>> > Regards,
>> > Mead Lai
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

Re: About "join.search" in 3.4 version.

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Thu, Oct 20, 2011 at 9:06 PM, Mead Lai <la...@gmail.com> wrote:
> Thank you, Mike.
> Are you sure the 'Solr' has implemented 'Join' function.
> I just skims through some tour guids about Solr, and not sure about that.
> Appreciate you very much.

Woops, I'm sorry: I believe Solr's join functionality was only
implemented in trunk (to eventually be 4.0).

It was never backported to 3.x (nor released).

But ElasticSearch has join functionality in their released versions....

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: About "join.search" in 3.4 version.

Posted by Mead Lai <la...@gmail.com>.

Thank you, Mike.
Are you sure the 'Solr' has implemented 'Join' function.
I just skims through some tour guids about Solr, and not sure about that.
Appreciate you very much.

I figure out another way to handler this problem.
Our system also has duplication of these articles and the records(about who
and when edit this article) in the database,
so, I shall search the data with 'time range' condtion in the database
first, then, use a Lucene Filter to get right results.

SELECT DISTINCT article_ids FROM records r
WHERE r.edit_date > '2011-09-23' and r.edit_date < '2011-10-19' and
r.user_id='000000_editor_id'
and, the article_ids will pass into Lucene to filter the search results.

Althought it's a little clumsy and stupid, it can work for this case.

Regards,
Mead


On Thu, Oct 20, 2011 at 6:44 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> I don't think the new join package in Lucene 3.4 will work for this
> case; you need more general join implementation, which eg Solr and
> ElasticSearch have implemented.
>
> Generic join hasn't yet been factored out into Lucene (but I think it
> really needs to be... any volunteers!?).
>
> Lucene's join package can handle use cases like nested documents or
> parent/child, because it requires that you index a single primary row
> AND all joined documents together as a single block of documents.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Oct 20, 2011 at 3:26 AM, Mead Lai <la...@gmail.com> wrote:
> > Hello all,
> >
> > Now, I find there is a "org.apache.lucene.search.join" function in Lucene
> > 3.4 version.
> > But I found no demo for "join" function in the source code package:
> > "lucene-3.4.0-src.tar".
> >
> > Now I have some articles, which could be modified by editors, like this
> > relationship:
> >  an article : modify records = 1:n.
> >
> > Document of article: contain the text of this article.
> > Document of records: article_id, name of editor, date_time(when modify
> it).
> > Search condition would be: keywords(search article),name of editor, range
> of
> > time(start_time, end_time),
> > that will find the articles in some particular time which had been
> modified
> > by someone.
> >
> > E.g: condition = during '2011-09-23' to '2011-10-19', editor: 'Alan',
> > keyword: 'duck'.
> > The results will found all articles contain 'duck', and edited by 'Alan'
> > between '2011-09-23' and '2011-10-19'.
> > My question is, could "org.apache.lucene.search.join" solve this case?
> > If possible, thanks for providing some example or clue.
> >
> > Thanks for your time.
> >
> > Regards,
> > Mead Lai
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: About "join.search" in 3.4 version.

Posted by Michael McCandless <lu...@mikemccandless.com>.

I don't think the new join package in Lucene 3.4 will work for this
case; you need more general join implementation, which eg Solr and
ElasticSearch have implemented.

Generic join hasn't yet been factored out into Lucene (but I think it
really needs to be... any volunteers!?).

Lucene's join package can handle use cases like nested documents or
parent/child, because it requires that you index a single primary row
AND all joined documents together as a single block of documents.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Oct 20, 2011 at 3:26 AM, Mead Lai <la...@gmail.com> wrote:
> Hello all,
>
> Now, I find there is a "org.apache.lucene.search.join" function in Lucene
> 3.4 version.
> But I found no demo for "join" function in the source code package:
> "lucene-3.4.0-src.tar".
>
> Now I have some articles, which could be modified by editors, like this
> relationship:
>  an article : modify records = 1:n.
>
> Document of article: contain the text of this article.
> Document of records: article_id, name of editor, date_time(when modify it).
> Search condition would be: keywords(search article),name of editor, range of
> time(start_time, end_time),
> that will find the articles in some particular time which had been modified
> by someone.
>
> E.g: condition = during '2011-09-23' to '2011-10-19', editor: 'Alan',
> keyword: 'duck'.
> The results will found all articles contain 'duck', and edited by 'Alan'
> between '2011-09-23' and '2011-10-19'.
> My question is, could "org.apache.lucene.search.join" solve this case?
> If possible, thanks for providing some example or clue.
>
> Thanks for your time.
>
> Regards,
> Mead Lai
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org