You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by cerberus yao <ch...@gmail.com> on 2005/04/11 11:50:41 UTC

Lucene Search Result with Line Numbers?

Hi, Lucene users:

  Does anyone knows how to add the Lucene search results with Line
number in original source content?
  for example:
     I have a file "Test.java" which is indexed by lucene.
     When I search inside the index, how to enhance the search result
with line number in Test.java?

  ps. the lucene sandbox has a highlighter for search result. where
can I find a line number decorator for search results?

 Thx.
 Best regards,

--
Charley, Chang-Li Yao

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene Search Result with Line Numbers?

Posted by Karl Øie <ka...@gan.no>.
Yes, the biggest drawback is text spanning lines:

L1 - it was the best of times,
L2 - it was the worst of times

will return no hits for the search "it was the best of times, it was 
the worst of times" (with quotes). because no single lucene document 
contains the whole text alone.

I would be interested in an alternative approach here because i have 
encountered this problem myself. A possible solution would be to have a 
freetext index and a linetext index, and the query is run against the 
fulltext index, but when hits are returned, these hits are compared 
against the linetext index to find each freetext hit's exact 
linenumber.

Mvh Karl Øie

On 11. apr. 2005, at 15.46, cerberus yao wrote:

> But the "crash.java" is a just single document physically.
> Do we have any drawback if we treat each line in "crash.java" as a 
> doucment?
>
> Another question:
>   If we need to present the search result with the hit lines plus n
> lines forward and backword, how can I do this if each lines are
> seperated in each document?
>   for example:
>
>  1. contents in crash.java are:
>       public class crash {
>           public static void main(String[] args) {
>           }
>       }
>  2. query "main"
>  3. search result= the hit line +1 line and -1 line
>      1 public class crash {
>      2    public static void main(String[] args) {
>      3   }
>
> On Apr 11, 2005 8:28 PM, Karl Øie <ka...@gan.no> wrote:
>> Most indexing creates a Lucene document for each Source document. What
>> would need is to create a Lucene document for each line.
>>
>> String src_doc = "crash.java";
>> int line_number = 0;
>> while(reader!=EOF) {
>>         String line = reader.readLine();
>>         Document ld = new Document();
>>         ld.add(new Field("id", src_doc, true, true, false));
>>         ld.add(new Field("line", ""+line_number, true, true, false));
>>         ld.add(new Field("text", line.toString(), false, true, true));
>>         index_writer.addDocument(ld);
>>         line_number++;
>> }
>>
>> This will create a small lucene document for each line, upon search 
>> you
>> will find documents based on the content of the line and the line
>> number as a field. The reason syntax highlighting works without
>> creating a lucene document for each line is because syntax 
>> highlighting
>> bases its result on groups of occurencies of text, not line numbers.
>>
>> Mvh Karl Øie
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
- Somewhere, out there on the Net, is an HD full of lame quotes


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene Search Result with Line Numbers?

Posted by Karl Øie <ka...@gan.no>.
Oh, forgot your last question, thats why the field "line" has to be 
stored, upon query you have to get the "line" number from the document 
that represents the line and in "forward" / "back" actions you will 
have sort the resultset by line value and print only chunks of that 
result.

Mvh Karl Øie

> Another question:
>   If we need to present the search result with the hit lines plus n
> lines forward and backword, how can I do this if each lines are
> seperated in each document?
>   for example:
>
>  1. contents in crash.java are:
>       public class crash {
>           public static void main(String[] args) {
>           }
>       }
>  2. query "main"
>  3. search result= the hit line +1 line and -1 line
>      1 public class crash {
>      2    public static void main(String[] args) {
>      3   }
>
> On Apr 11, 2005 8:28 PM, Karl Øie <ka...@gan.no> wrote:
>> Most indexing creates a Lucene document for each Source document. What
>> would need is to create a Lucene document for each line.
>>
>> String src_doc = "crash.java";
>> int line_number = 0;
>> while(reader!=EOF) {
>>         String line = reader.readLine();
>>         Document ld = new Document();
>>         ld.add(new Field("id", src_doc, true, true, false));
>>         ld.add(new Field("line", ""+line_number, true, true, false));
>>         ld.add(new Field("text", line.toString(), false, true, true));
>>         index_writer.addDocument(ld);
>>         line_number++;
>> }
>>
>> This will create a small lucene document for each line, upon search 
>> you
>> will find documents based on the content of the line and the line
>> number as a field. The reason syntax highlighting works without
>> creating a lucene document for each line is because syntax 
>> highlighting
>> bases its result on groups of occurencies of text, not line numbers.
>>
>> Mvh Karl Øie
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
- Real life should have a search function. I need my socks.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene Search Result with Line Numbers?

Posted by cerberus yao <ch...@gmail.com>.
But the "crash.java" is a just single document physically.
Do we have any drawback if we treat each line in "crash.java" as a doucment?

Another question:
  If we need to present the search result with the hit lines plus n
lines forward and backword, how can I do this if each lines are
seperated in each document?
  for example:

 1. contents in crash.java are:
      public class crash {
          public static void main(String[] args) {
          }
      }
 2. query "main"
 3. search result= the hit line +1 line and -1 line
     1 public class crash {
     2    public static void main(String[] args) {
     3   }

On Apr 11, 2005 8:28 PM, Karl Øie <ka...@gan.no> wrote:
> Most indexing creates a Lucene document for each Source document. What
> would need is to create a Lucene document for each line.
> 
> String src_doc = "crash.java";
> int line_number = 0;
> while(reader!=EOF) {
>         String line = reader.readLine();
>         Document ld = new Document();
>         ld.add(new Field("id", src_doc, true, true, false));
>         ld.add(new Field("line", ""+line_number, true, true, false));
>         ld.add(new Field("text", line.toString(), false, true, true));
>         index_writer.addDocument(ld);
>         line_number++;
> }
> 
> This will create a small lucene document for each line, upon search you
> will find documents based on the content of the line and the line
> number as a field. The reason syntax highlighting works without
> creating a lucene document for each line is because syntax highlighting
> bases its result on groups of occurencies of text, not line numbers.
> 
> Mvh Karl Øie
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene Search Result with Line Numbers?

Posted by Karl Øie <ka...@gan.no>.
Most indexing creates a Lucene document for each Source document. What 
would need is to create a Lucene document for each line.

String src_doc = "crash.java";
int line_number = 0;
while(reader!=EOF) {
	String line = reader.readLine();
	Document ld = new Document();
	ld.add(new Field("id", src_doc, true, true, false));
	ld.add(new Field("line", ""+line_number, true, true, false));
	ld.add(new Field("text", line.toString(), false, true, true));
	index_writer.addDocument(ld);
	line_number++;
}

This will create a small lucene document for each line, upon search you 
will find documents based on the content of the line and the line 
number as a field. The reason syntax highlighting works without 
creating a lucene document for each line is because syntax highlighting 
bases its result on groups of occurencies of text, not line numbers.

Mvh Karl Øie

On 11. apr. 2005, at 11.50, cerberus yao wrote:

> Hi, Lucene users:
>
>   Does anyone knows how to add the Lucene search results with Line
> number in original source content?
>   for example:
>      I have a file "Test.java" which is indexed by lucene.
>      When I search inside the index, how to enhance the search result
> with line number in Test.java?
>
>   ps. the lucene sandbox has a highlighter for search result. where
> can I find a line number decorator for search results?
>
>  Thx.
>  Best regards,
>
> --
> Charley, Chang-Li Yao
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
- there is nothing wrong with using linux. if that's the lifestyle you 
want to live, i wont judge you. i just wont support you at the parades.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene Search Result with Line Numbers?

Posted by Doug Cutting <cu...@apache.org>.
cerberus yao wrote:
>   Does anyone knows how to add the Lucene search results with Line
> number in original source content?

When you display each hit, first scan the text and build an array 
containing the positions of each newline.  Then use the highlighter (in 
contrib/highlighter) to find fragments:

   TextFragment[] frags = highlighter.getBestTextFragments(...);

Finally, display these fragments as complete lines by searching the 
array of line starts for the first fragment's start position to find the 
start line, and the last fragment's end position to find the last line.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org