You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Karl Geppert <ka...@chemwatch.net> on 2007/10/09 07:29:44 UTC
Newbie question
Okay - this is probably a dumb question, but I'll ask a style question
here also to help;
I want to build an index of chemical names and associated numbers with
them in lucene. I'm using the below code extract but when I come to the
code to add the Field.Keyword and Field.Text to the index, the compiler
tells me that
[C# Error] WinForm.cs(110): 'Lucene.Net.Documents.Field' does not
contain a definition for 'Keyword'
[C# Error] WinForm.cs(111): 'Lucene.Net.Documents.Field' does not
contain a definition for 'Text'
Can someone tell me what I'm doing wrong?
Second part of the question is in terms of searching for details
regarding Chemicals in Lucene, at the moment I have planned to add the
name and the index-number so we can identify the rest of the material.
Should I add other fields of fixed-text regarding each material, such as
CAS numbers (where there can be 1-50 of them), physical properties and
so forth, or am I better/is it more efficient to stick to traditional
search methods for these?
Karl
(ps - this is being written with borland c# 2006 implementation).
using System;
using System.ComponentModel;
using System.Windows.Forms;
using System.Data;
using System.IO;
using System.Text.RegularExpressions;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.Analysis.Standard;
namespace CSharpTest1
{
public class FileReadStrings {
public static void ReadStrings( ) {
String line;
StreamReader f = new StreamReader("c:\\temp\\dictextract.txt");
while((line=f.ReadLine()) != null) {
String[] strings = line.Split(new char[]{ Convert.ToChar(9)});
if (strings.Length == 2) {
Document Doc = new Document();
Doc.add( Field.Keyword( "CW", strings[ 0]));
Doc.add( Field.Text( "NAME", strings[ 1]));
} //if
Console.WriteLine(line);
} //while
f.Close();
}
private static void AddIndex() {
IndexWriter writer = new IndexWriter("c:\\temp\\", new
StandardAnalyzer(), true);
Console.WriteLine( writer.DocCount());
writer.Optimize();
writer.Close();
}
}
}
________________________________________________________________________
This email has been scanned for all viruses by the MessageLabs SkyScan
service. For more information on a proactive anti-virus service working
around the clock, around the globe, visit http://www.hi-speed.net.au
________________________________________________________________________
Re: Newbie question
Posted by Jokin Cuadrado <jo...@gmail.com>.
also, for quick reference you can take a look to the test code,
private Lucene.Net.Documents.Document MakeDocumentWithFields()
{
Lucene.Net.Documents.Document doc = new Lucene.Net.Documents.Document();
doc.Add(new Field("keyword", "test1", Field.Store.YES,
Field.Index.UN_TOKENIZED));
doc.Add(new Field("keyword", "test2", Field.Store.YES,
Field.Index.UN_TOKENIZED));
doc.Add(new Field("text", "test1", Field.Store.YES, Field.Index.TOKENIZED));
doc.Add(new Field("text", "test2", Field.Store.YES, Field.Index.TOKENIZED));
doc.Add(new Field("unindexed", "test1", Field.Store.YES, Field.Index.NO));
doc.Add(new Field("unindexed", "test2", Field.Store.YES, Field.Index.NO));
doc.Add(new Field("unstored", "test1", Field.Store.NO,
Field.Index.TOKENIZED));
doc.Add(new Field("unstored", "test2", Field.Store.NO,
Field.Index.TOKENIZED));
return doc;
}
On 10/9/07, Jokin Cuadrado <jo...@gmail.com> wrote:
> where do you get that code?
>
> the way for adding fields to document is:
>
> Field myField = New Field("CW", strings 0], Field.Store.YES,
> Field.Index.UN_TOKENIZED);
> doc.Add(myField);
>
> On 10/9/07, Karl Geppert <ka...@chemwatch.net> wrote:
> > Okay - this is probably a dumb question, but I'll ask a style question
> > here also to help;
> >
> > I want to build an index of chemical names and associated numbers with
> > them in lucene. I'm using the below code extract but when I come to the
> > code to add the Field.Keyword and Field.Text to the index, the compiler
> > tells me that
> >
> > [C# Error] WinForm.cs(110): 'Lucene.Net.Documents.Field' does not
> > contain a definition for 'Keyword'
> > [C# Error] WinForm.cs(111): 'Lucene.Net.Documents.Field' does not
> > contain a definition for 'Text'
> >
>
Re: Newbie question
Posted by Jokin Cuadrado <jo...@gmail.com>.
where do you get that code?
the way for adding fields to document is:
Field myField = New Field("CW", strings 0], Field.Store.YES,
Field.Index.UN_TOKENIZED);
doc.Add(myField);
On 10/9/07, Karl Geppert <ka...@chemwatch.net> wrote:
> Okay - this is probably a dumb question, but I'll ask a style question
> here also to help;
>
> I want to build an index of chemical names and associated numbers with
> them in lucene. I'm using the below code extract but when I come to the
> code to add the Field.Keyword and Field.Text to the index, the compiler
> tells me that
>
> [C# Error] WinForm.cs(110): 'Lucene.Net.Documents.Field' does not
> contain a definition for 'Keyword'
> [C# Error] WinForm.cs(111): 'Lucene.Net.Documents.Field' does not
> contain a definition for 'Text'
>
RE: Newbie question
Posted by Ali Khawaja <ak...@microsoft.com>.
You can use phonetic search in databases that will help you tackle this very specific issue.
-----Original Message-----
From: Karl Geppert [mailto:karlg@chemwatch.net]
Sent: Thursday, October 11, 2007 8:14 PM
To: lucene-net-user@incubator.apache.org
Subject: Re: Newbie question
Hi Jokin,
Thanks for the pointer - I was working from the Manning Lucene book,
expecting the differences not to be too large, and your pointer is of
inestimable value.
The main reason for approaching Lucene is to provide the matching for
the Chemical names. Chemical names end up being of immense complexity
and ambiguity. Persuading users to type exact strings is always
problematic. Lucene appears to have better flexibility in its matching
of terms than a relational database.
Karl
Jokin Cuadrado wrote:
> may I'm missing something, but i think that lucene is not the best way
> for searching in structured information. maybe a relational database
> with a optimized structure would do the job better in this case.
>
> On 10/9/07, Karl Geppert <ka...@chemwatch.net> wrote:
>
>
>> Second part of the question is in terms of searching for details
>> regarding Chemicals in Lucene, at the moment I have planned to add the
>> name and the index-number so we can identify the rest of the material.
>> Should I add other fields of fixed-text regarding each material, such as
>> CAS numbers (where there can be 1-50 of them), physical properties and
>> so forth, or am I better/is it more efficient to stick to traditional
>> search methods for these?
>>
>> Karl
>>
>
> _____________________________________________________________________
>
> This message has been checked for all known viruses by the
> MessageLabs Virus Scanning Service. For further information visit
> http://www.Hi-Speed.net.au
> ________________________________________________________________________
>
________________________________________________________________________
This email has been scanned for all viruses by the MessageLabs SkyScan
service. For more information on a proactive anti-virus service working
around the clock, around the globe, visit http://www.hi-speed.net.au
________________________________________________________________________
Re: Newbie question
Posted by Karl Geppert <ka...@chemwatch.net>.
Hi Jokin,
Thanks for the pointer - I was working from the Manning Lucene book,
expecting the differences not to be too large, and your pointer is of
inestimable value.
The main reason for approaching Lucene is to provide the matching for
the Chemical names. Chemical names end up being of immense complexity
and ambiguity. Persuading users to type exact strings is always
problematic. Lucene appears to have better flexibility in its matching
of terms than a relational database.
Karl
Jokin Cuadrado wrote:
> may I'm missing something, but i think that lucene is not the best way
> for searching in structured information. maybe a relational database
> with a optimized structure would do the job better in this case.
>
> On 10/9/07, Karl Geppert <ka...@chemwatch.net> wrote:
>
>
>> Second part of the question is in terms of searching for details
>> regarding Chemicals in Lucene, at the moment I have planned to add the
>> name and the index-number so we can identify the rest of the material.
>> Should I add other fields of fixed-text regarding each material, such as
>> CAS numbers (where there can be 1-50 of them), physical properties and
>> so forth, or am I better/is it more efficient to stick to traditional
>> search methods for these?
>>
>> Karl
>>
>
> _____________________________________________________________________
>
> This message has been checked for all known viruses by the
> MessageLabs Virus Scanning Service. For further information visit
> http://www.Hi-Speed.net.au
> ________________________________________________________________________
>
________________________________________________________________________
This email has been scanned for all viruses by the MessageLabs SkyScan
service. For more information on a proactive anti-virus service working
around the clock, around the globe, visit http://www.hi-speed.net.au
________________________________________________________________________
Re: Newbie question
Posted by Jokin Cuadrado <jo...@gmail.com>.
may I'm missing something, but i think that lucene is not the best way
for searching in structured information. maybe a relational database
with a optimized structure would do the job better in this case.
On 10/9/07, Karl Geppert <ka...@chemwatch.net> wrote:
> Second part of the question is in terms of searching for details
> regarding Chemicals in Lucene, at the moment I have planned to add the
> name and the index-number so we can identify the rest of the material.
> Should I add other fields of fixed-text regarding each material, such as
> CAS numbers (where there can be 1-50 of them), physical properties and
> so forth, or am I better/is it more efficient to stick to traditional
> search methods for these?
>
> Karl