You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@lucenenet.apache.org by Ken Cox <ki...@hotmail.com> on 2007/02/17 03:14:37 UTC

Storing file Contents & using Highlighter in v2

Hi all,

I'm new to Lucene & have a quiestion about indexing the "contents" of files 
& the use of the Highlighter.

I'm using Lucene 2.0.0.3 & Highlighter 2.0.0.1.

Based on the Test & Demo projects i've managed to get everything going just 
fine except the ability to highlight the query string within a chunk of text 
from the indexed contents.

The reason is the "contents" field doesn't actually store the text in the 
index...

// Add the contents of the file to a field named "contents".  Specify a 
Reader,
// so that the text of the file is tokenized and indexed, but not stored.
// Note that FileReader expects the file to be in the system's default 
encoding.
// If that's not the case searching for special characters will fail.
doc.Add(new Field("contents", new System.IO.StreamReader(f.FullName, 
System.Text.Encoding.Default)));

So my question is whats the best way to get the text of a file into the 
index so i can use the Highlighter on it?

Currently i've added another field & am using a parser class i found in an 
earlier version of Lucene to get the text of the files into the index.
Is this still the best way to do it using Lucene 2.0.0.3 or is there a new 
way?

Thanks,
Ken

_________________________________________________________________
Discover fun and games at  @  http://xtramsn.co.nz/kids

RE: Storing file Contents & using Highlighter in v2

Posted by George Aroush <ge...@aroush.net>.

Thanks Pasha.

This means we have to support the current method through SharpZipLibAdapter
class (in  SharpZipLibAdapter.cs) for cross index compatibility.

Regards,

-- George Aroush

-----Original Message-----
From: Pasha Bizhan [mailto:lucene-list@lucenedotnet.com] 
Sent: Friday, February 23, 2007 11:34 AM
To: lucene-net-user@incubator.apache.org
Subject: RE: Storing file Contents & using Highlighter in v2

Hi, 

> From: George Aroush [mailto:george@aroush.net]

> Lucene.Net supports compress field, but not out-of-the-box (because 
> .NET 1.1 doesn't support ZIP compression.)  Look at the file 
> SharpZipLibAdapter.cs to see how you can use 3rd party compression, or 
> use .NET 2.0 which has compression support.

Just for information. .NET 2.0 compression is incompatible with java.zip.
You can't use .NET compression class to read/write lucene compatible index.

Pasha Bizhan
http://searchblackbox.com

RE: Storing file Contents & using Highlighter in v2

Posted by Pasha Bizhan <lu...@lucenedotnet.com>.

Hi, 

> From: George Aroush [mailto:george@aroush.net] 

> Lucene.Net supports compress field, but not out-of-the-box 
> (because .NET 1.1 doesn't support ZIP compression.)  Look at 
> the file SharpZipLibAdapter.cs to see how you can use 3rd 
> party compression, or use .NET 2.0 which has compression support.

Just for information. .NET 2.0 compression is incompatible 
with java.zip. You can't use .NET compression class to 
read/write lucene compatible index.

Pasha Bizhan
http://searchblackbox.com

RE: Storing file Contents & using Highlighter in v2

Posted by George Aroush <ge...@aroush.net>.

Hi Ken,

The best way depends on your need.  If you still have the original text
around, it doesn't make sense to store the raw text in a Lucene index just
because you need highlighting.  In this case you can just store in a Lucene
field a reference to the original indexed text file and use this reference
to get the text for highlighting.

If you don't have access to the original text, then what you have done is
fine -- but you should consider using compressed field to store the text.
Lucene.Net supports compress field, but not out-of-the-box (because .NET 1.1
doesn't support ZIP compression.)  Look at the file SharpZipLibAdapter.cs to
see how you can use 3rd party compression, or use .NET 2.0 which has
compression support.

Regards,

-- George Aroush

-----Original Message-----
From: Ken Cox [mailto:kiwikencox@hotmail.com] 
Sent: Friday, February 16, 2007 9:15 PM
To: lucene-net-user@incubator.apache.org
Subject: Storing file Contents & using Highlighter in v2

Hi all,

I'm new to Lucene & have a quiestion about indexing the "contents" of files
& the use of the Highlighter.

I'm using Lucene 2.0.0.3 & Highlighter 2.0.0.1.

Based on the Test & Demo projects i've managed to get everything going just
fine except the ability to highlight the query string within a chunk of text
from the indexed contents.

The reason is the "contents" field doesn't actually store the text in the
index...

// Add the contents of the file to a field named "contents".  Specify a
Reader, // so that the text of the file is tokenized and indexed, but not
stored.
// Note that FileReader expects the file to be in the system's default
encoding.
// If that's not the case searching for special characters will fail.
doc.Add(new Field("contents", new System.IO.StreamReader(f.FullName,
System.Text.Encoding.Default)));

So my question is whats the best way to get the text of a file into the
index so i can use the Highlighter on it?

Currently i've added another field & am using a parser class i found in an
earlier version of Lucene to get the text of the files into the index.
Is this still the best way to do it using Lucene 2.0.0.3 or is there a new
way?

Thanks,
Ken

_________________________________________________________________
Discover fun and games at  @  http://xtramsn.co.nz/kids