You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Karl Geppert <ka...@chemwatch.net> on 2007/10/09 07:29:44 UTC

Newbie question

Okay - this is probably a dumb question, but I'll ask a style question 
here also to help;

I want to build an index of chemical names and associated numbers with 
them in lucene.  I'm using the below code extract but when I come to the 
code to add the Field.Keyword and Field.Text to the index, the compiler 
tells me that

[C# Error] WinForm.cs(110): 'Lucene.Net.Documents.Field' does not 
contain a definition for 'Keyword'
[C# Error] WinForm.cs(111): 'Lucene.Net.Documents.Field' does not 
contain a definition for 'Text'

Can someone tell me what I'm doing wrong?

Second part of the question is in terms of searching for details 
regarding Chemicals in Lucene, at the moment I have planned to add the 
name and the index-number so we can identify the rest of the material.  
Should I add other fields of fixed-text regarding each material, such as 
CAS numbers (where there can be 1-50 of them), physical properties and 
so forth, or am I better/is it more efficient to stick to traditional 
search methods for these?

Karl

(ps - this is being written with borland c# 2006 implementation).

using System;
using System.ComponentModel;
using System.Windows.Forms;
using System.Data;
using System.IO;
using System.Text.RegularExpressions;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.Analysis.Standard;

namespace CSharpTest1
{

public class FileReadStrings {
  public static void ReadStrings( ) {
    String line;


    StreamReader f = new StreamReader("c:\\temp\\dictextract.txt");
    while((line=f.ReadLine()) != null) {
      String[] strings = line.Split(new char[]{ Convert.ToChar(9)});
      if (strings.Length == 2) {
        Document Doc = new Document();

        Doc.add( Field.Keyword( "CW", strings[ 0]));
        Doc.add( Field.Text( "NAME", strings[ 1]));
      } //if
      Console.WriteLine(line);
    } //while
    f.Close();
  }

  private static void AddIndex() {
    IndexWriter writer = new IndexWriter("c:\\temp\\", new 
StandardAnalyzer(), true);



    Console.WriteLine( writer.DocCount());
    writer.Optimize();
    writer.Close();
  }
}
}

________________________________________________________________________

This email has been scanned for all viruses by the MessageLabs SkyScan
service. For more information on a proactive anti-virus service working
around the clock, around the globe, visit http://www.hi-speed.net.au
________________________________________________________________________

Re: Newbie question

Posted by Jokin Cuadrado <jo...@gmail.com>.
also, for quick reference you can take a look to the test code,

		private Lucene.Net.Documents.Document MakeDocumentWithFields()
		{
			Lucene.Net.Documents.Document doc = new Lucene.Net.Documents.Document();
			doc.Add(new Field("keyword", "test1", Field.Store.YES,
Field.Index.UN_TOKENIZED));
			doc.Add(new Field("keyword", "test2", Field.Store.YES,
Field.Index.UN_TOKENIZED));
			doc.Add(new Field("text", "test1", Field.Store.YES, Field.Index.TOKENIZED));
			doc.Add(new Field("text", "test2", Field.Store.YES, Field.Index.TOKENIZED));
			doc.Add(new Field("unindexed", "test1", Field.Store.YES, Field.Index.NO));
			doc.Add(new Field("unindexed", "test2", Field.Store.YES, Field.Index.NO));
			doc.Add(new Field("unstored", "test1", Field.Store.NO,
Field.Index.TOKENIZED));
			doc.Add(new Field("unstored", "test2", Field.Store.NO,
Field.Index.TOKENIZED));
			return doc;
		}


On 10/9/07, Jokin Cuadrado <jo...@gmail.com> wrote:
> where do you get that code?
>
> the way for adding fields to document is:
>
> Field myField = New Field("CW", strings 0], Field.Store.YES,
> Field.Index.UN_TOKENIZED);
> doc.Add(myField);
>
> On 10/9/07, Karl Geppert <ka...@chemwatch.net> wrote:
> > Okay - this is probably a dumb question, but I'll ask a style question
> > here also to help;
> >
> > I want to build an index of chemical names and associated numbers with
> > them in lucene.  I'm using the below code extract but when I come to the
> > code to add the Field.Keyword and Field.Text to the index, the compiler
> > tells me that
> >
> > [C# Error] WinForm.cs(110): 'Lucene.Net.Documents.Field' does not
> > contain a definition for 'Keyword'
> > [C# Error] WinForm.cs(111): 'Lucene.Net.Documents.Field' does not
> > contain a definition for 'Text'
> >
>

Re: Newbie question

Posted by Jokin Cuadrado <jo...@gmail.com>.
where do you get that code?

the way for adding fields to document is:

Field myField = New Field("CW", strings 0], Field.Store.YES,
Field.Index.UN_TOKENIZED);
doc.Add(myField);

On 10/9/07, Karl Geppert <ka...@chemwatch.net> wrote:
> Okay - this is probably a dumb question, but I'll ask a style question
> here also to help;
>
> I want to build an index of chemical names and associated numbers with
> them in lucene.  I'm using the below code extract but when I come to the
> code to add the Field.Keyword and Field.Text to the index, the compiler
> tells me that
>
> [C# Error] WinForm.cs(110): 'Lucene.Net.Documents.Field' does not
> contain a definition for 'Keyword'
> [C# Error] WinForm.cs(111): 'Lucene.Net.Documents.Field' does not
> contain a definition for 'Text'
>

RE: Newbie question

Posted by Ali Khawaja <ak...@microsoft.com>.
You can use phonetic search in databases that will help you tackle this very specific issue.

-----Original Message-----
From: Karl Geppert [mailto:karlg@chemwatch.net]
Sent: Thursday, October 11, 2007 8:14 PM
To: lucene-net-user@incubator.apache.org
Subject: Re: Newbie question

Hi Jokin,

Thanks for the pointer - I was working from the Manning Lucene book,
expecting the differences not to be too large, and your pointer is of
inestimable value.

The main reason for approaching Lucene is to provide the matching for
the Chemical names.  Chemical names end up being of immense complexity
and ambiguity.  Persuading users to type exact strings is always
problematic.  Lucene appears to have better flexibility in its matching
of terms than a relational database.

Karl

Jokin Cuadrado wrote:
> may I'm missing something, but i think that lucene is not the best way
> for searching in structured information. maybe a relational database
> with a optimized structure would do the job better in this case.
>
> On 10/9/07, Karl Geppert <ka...@chemwatch.net> wrote:
>
>
>> Second part of the question is in terms of searching for details
>> regarding Chemicals in Lucene, at the moment I have planned to add the
>> name and the index-number so we can identify the rest of the material.
>> Should I add other fields of fixed-text regarding each material, such as
>> CAS numbers (where there can be 1-50 of them), physical properties and
>> so forth, or am I better/is it more efficient to stick to traditional
>> search methods for these?
>>
>> Karl
>>
>
> _____________________________________________________________________
>
> This message has been checked for all known viruses by the
> MessageLabs Virus Scanning Service. For further information visit
> http://www.Hi-Speed.net.au
> ________________________________________________________________________
>


________________________________________________________________________

This email has been scanned for all viruses by the MessageLabs SkyScan
service. For more information on a proactive anti-virus service working
around the clock, around the globe, visit http://www.hi-speed.net.au
________________________________________________________________________

Re: Newbie question

Posted by Karl Geppert <ka...@chemwatch.net>.
Hi Jokin,

Thanks for the pointer - I was working from the Manning Lucene book, 
expecting the differences not to be too large, and your pointer is of 
inestimable value.

The main reason for approaching Lucene is to provide the matching for 
the Chemical names.  Chemical names end up being of immense complexity 
and ambiguity.  Persuading users to type exact strings is always 
problematic.  Lucene appears to have better flexibility in its matching 
of terms than a relational database.

Karl

Jokin Cuadrado wrote:
> may I'm missing something, but i think that lucene is not the best way
> for searching in structured information. maybe a relational database
> with a optimized structure would do the job better in this case.
>
> On 10/9/07, Karl Geppert <ka...@chemwatch.net> wrote:
>
>   
>> Second part of the question is in terms of searching for details
>> regarding Chemicals in Lucene, at the moment I have planned to add the
>> name and the index-number so we can identify the rest of the material.
>> Should I add other fields of fixed-text regarding each material, such as
>> CAS numbers (where there can be 1-50 of them), physical properties and
>> so forth, or am I better/is it more efficient to stick to traditional
>> search methods for these?
>>
>> Karl
>>     
>
> _____________________________________________________________________
>
> This message has been checked for all known viruses by the 
> MessageLabs Virus Scanning Service. For further information visit
> http://www.Hi-Speed.net.au
> ________________________________________________________________________
>   


________________________________________________________________________

This email has been scanned for all viruses by the MessageLabs SkyScan
service. For more information on a proactive anti-virus service working
around the clock, around the globe, visit http://www.hi-speed.net.au
________________________________________________________________________

Re: Newbie question

Posted by Jokin Cuadrado <jo...@gmail.com>.
may I'm missing something, but i think that lucene is not the best way
for searching in structured information. maybe a relational database
with a optimized structure would do the job better in this case.

On 10/9/07, Karl Geppert <ka...@chemwatch.net> wrote:

> Second part of the question is in terms of searching for details
> regarding Chemicals in Lucene, at the moment I have planned to add the
> name and the index-number so we can identify the rest of the material.
> Should I add other fields of fixed-text regarding each material, such as
> CAS numbers (where there can be 1-50 of them), physical properties and
> so forth, or am I better/is it more efficient to stick to traditional
> search methods for these?
>
> Karl