You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Lucas E Wall <wa...@hotmail.com> on 2011/02/02 00:57:35 UTC

Question

I am new to Lucene and have the following questions.  What is the best way to understand what is required to install Lucene in a server?  Also, can i make Lucene run searches on links to xml data on the web?Thanks 		 	   		  

RE: Question

Posted by Igor Chirokov <ig...@hotmail.com>.
You can llok: http://www.walnutilsoft.com/
 
In this solution you have integrated Lucine solutiion for SQL Server and Oracle databases as well.
Here is my product, it's free.
Also you can see simple code for .net and java Lucine.
 
Thanks,
Igor
 
> Subject: RE: Question
> Date: Wed, 2 Feb 2011 23:35:22 +0100
> From: rene.de.vries@howardshome.com
> To: lucene-net-user@lucene.apache.org
> 
> If you're looking for a quick, manageable solution for Full-text search on SQL Server data, I recommend taking a look at Solr. Solr is like a huge management layer around Lucene and hides a lot of the details, even though you can still get at them through config files.
> 
> It has a GREAT Data Import Handler which does all the heavy lifting for you to connect to Sql Server. I was able to set up indexing 15 million news articles in one afternoon. For indexing PDF's and such, Solr has dedicated handlers, which I haven't used but seem pretty easy to set up. 
> 
> Solr is a full java solution, but don't let that scare you as it is easy to set up and lot of help is available. There is a SolrNet library which allows easy access to all the solr functions from a .net app. This tutorial is particular handy: http://crazorsharp.blogspot.com/2010/01/full-text-search-using-solr-lucene-and.html 
> 
> Other than that, I recommend reading the book Lucene in Action, or the Solr book. There are also LOTS of good tutorials available
> 
> René
> 
> -----Original Message-----
> From: Aaron Powell [mailto:me@aaron-powell.com] 
> Sent: woensdag 2 februari 2011 1:05
> To: lucene-net-user@lucene.apache.org
> Subject: Re: Question
> 
> You don't actually install Lucene.Net, it's just a library which you
> reference into your application. Solr is an installable Lucene service,
> which essentially provides RESTful endpoints to Lucene (java), or so goes my
> understanding.
> 
> With regards to what you can search with Lucene, well that really comes down
> to anything you can push into the index. Keep in mind that Lucene is just a
> indexer and searcher, it's not a crawler or anything. You have to push the
> data to the indexer, and you have to write queries to get it back out.
> I've got some blogs on my site about getting started with Lucene.Net -
> http://www.aaron-powell.com/lucene-net-overview
> Aaron Powell
> Umbraco Core Team Member <http://umbraco.codeplex.com> | FunnelWeb Team
> Member <http://funnelweblog.com>
> 
> http://www.aaron-powell.com | http://twitter.com/slace | Skype:
> aaron.l.powell | MSN: aazzap@hotmail.com
> 
> 
> On Wed, Feb 2, 2011 at 10:57 AM, Lucas E Wall <wa...@hotmail.com> wrote:
> 
> >
> > I am new to Lucene and have the following questions. What is the best way
> > to understand what is required to install Lucene in a server? Also, can i
> > make Lucene run searches on links to xml data on the web?Thanks
> >
 		 	   		  

RE: Question

Posted by René de Vries <re...@howardshome.com>.
If you're looking for a quick, manageable solution for Full-text search on SQL Server data, I recommend taking a look at Solr. Solr is like a huge management layer around Lucene and hides a lot of the details, even though you can still get at them through config files.

It has a GREAT Data Import Handler which does all the heavy lifting for you to connect to Sql Server. I was able to set up indexing 15 million news articles in one afternoon.  For indexing PDF's and such, Solr has dedicated handlers, which I haven't used but seem pretty easy to set up. 

Solr is a full java solution, but don't let that scare you as it is easy to set up and lot of help is available. There is a SolrNet library which allows easy access to all the solr functions from a .net app. This tutorial is particular handy: http://crazorsharp.blogspot.com/2010/01/full-text-search-using-solr-lucene-and.html 

Other than that, I recommend reading the book Lucene in Action, or the Solr book. There are also LOTS of good tutorials available

René

-----Original Message-----
From: Aaron Powell [mailto:me@aaron-powell.com] 
Sent: woensdag 2 februari 2011 1:05
To: lucene-net-user@lucene.apache.org
Subject: Re: Question

You don't actually install Lucene.Net, it's just a library which you
reference into your application. Solr is an installable Lucene service,
which essentially provides RESTful endpoints to Lucene (java), or so goes my
understanding.

With regards to what you can search with Lucene, well that really comes down
to anything you can push into the index. Keep in mind that Lucene is just a
indexer and searcher, it's not a crawler or anything. You have to push the
data to the indexer, and you have to write queries to get it back out.
I've got some blogs on my site about getting started with Lucene.Net -
http://www.aaron-powell.com/lucene-net-overview
Aaron Powell
Umbraco Core Team Member <http://umbraco.codeplex.com> | FunnelWeb Team
Member <http://funnelweblog.com>

http://www.aaron-powell.com | http://twitter.com/slace | Skype:
aaron.l.powell | MSN: aazzap@hotmail.com


On Wed, Feb 2, 2011 at 10:57 AM, Lucas E Wall <wa...@hotmail.com> wrote:

>
> I am new to Lucene and have the following questions.  What is the best way
> to understand what is required to install Lucene in a server?  Also, can i
> make Lucene run searches on links to xml data on the web?Thanks
>

Re: Question

Posted by Kevin Miller <sc...@gmail.com>.
I've also had success using Tika to do text extraction (via IKVM)

This GitHub repo has example code and tests for pulling contents out of PDF,
word documents etc.

https://github.com/KevM/tikaondotnet

Works great for me in a product I helped create. Once you have the text of
the document you index it as you would normal content.

Kevin Miller
<https://github.com/KevM/tikaondotnet>

On Wed, Feb 2, 2011 at 1:10 AM, Prescott Nasser <ge...@hotmail.com>wrote:

>
> Just to add since you're likely on a windows platform, check out Ifilters
> and how to use them- they are probably the easiest way you have to extract
> data from pdf/html/xml.
>
> Check out this for getting started with using the Ifilter interface:
> http://www.codeproject.com/KB/cs/IFilter.aspx?msg=2428047
>
> Once you extract the plain text - that is where Lucene comes in to parse
> that plain text and create an index.
>
> ~P
>
>
>
>
> > From: me@aaron-powell.com
> > Date: Wed, 2 Feb 2011 12:09:01 +1100
> > Subject: Re: Question
> > To: lucene-net-user@lucene.apache.org
> >
> > Lucene.Net uses the same binary data store that Lucene uses which is
> stored
> > on the file system (generally, it depends on what Directory instance you
> > provide to the indexer & searcher).
> >
> > Some projects, such as NHibernate.Search and RavenDB use Lucene.Net
> > internally and handle syncronizing the data stores (DB & Lucene).
> > If you're trying to index things such as HTML/ XML/ PDF/ etc you have to
> > write your own way to read the data into Lucene though.
> > Aaron Powell
> > Umbraco Core Team Member <http://umbraco.codeplex.com> | FunnelWeb Team
> > Member <http://funnelweblog.com>
> >
> > http://www.aaron-powell.com | http://twitter.com/slace | Skype:
> > aaron.l.powell | MSN: aazzap@hotmail.com
> >
> >
> > On Wed, Feb 2, 2011 at 12:03 PM, Lucas E Wall <wa...@hotmail.com>
> wrote:
> >
> > >
> > > Thanks, Aaron. I went through your blog and it makes a lot of sense.
> > > Given that Lucen is asp friendly, can I call the library from mssql?
> Where
> > > does the indexing gets stored? Do I need to provide a database for do
> files
> > > I need indexed, and for the index as well? May be my questions are a
> little
> > > bit too entry level.
> > >
> > > > From: me@aaron-powell.com
> > > > Date: Wed, 2 Feb 2011 11:04:45 +1100
> > > > Subject: Re: Question
> > > > To: lucene-net-user@lucene.apache.org
> > > >
> > > > You don't actually install Lucene.Net, it's just a library which you
> > > > reference into your application. Solr is an installable Lucene
> service,
> > > > which essentially provides RESTful endpoints to Lucene (java), or so
> goes
> > > my
> > > > understanding.
> > > >
> > > > With regards to what you can search with Lucene, well that really
> comes
> > > down
> > > > to anything you can push into the index. Keep in mind that Lucene is
> just
> > > a
> > > > indexer and searcher, it's not a crawler or anything. You have to
> push
> > > the
> > > > data to the indexer, and you have to write queries to get it back
> out.
> > > > I've got some blogs on my site about getting started with Lucene.Net
> -
> > > > http://www.aaron-powell.com/lucene-net-overview
> > > > Aaron Powell
> > > > Umbraco Core Team Member <http://umbraco.codeplex.com> | FunnelWeb
> Team
> > > > Member <http://funnelweblog.com>
> > > >
> > > > http://www.aaron-powell.com | http://twitter.com/slace | Skype:
> > > > aaron.l.powell | MSN: aazzap@hotmail.com
> > > >
> > > >
> > > > On Wed, Feb 2, 2011 at 10:57 AM, Lucas E Wall <walllucas@hotmail.com
> >
> > > wrote:
> > > >
> > > > >
> > > > > I am new to Lucene and have the following questions. What is the
> best
> > > way
> > > > > to understand what is required to install Lucene in a server? Also,
> > > can i
> > > > > make Lucene run searches on links to xml data on the web?Thanks
> > > > >
> > >
> > >
>

RE: Question

Posted by Prescott Nasser <ge...@hotmail.com>.
Just to add since you're likely on a windows platform, check out Ifilters and how to use them- they are probably the easiest way you have to extract data from pdf/html/xml.
 
Check out this for getting started with using the Ifilter interface: http://www.codeproject.com/KB/cs/IFilter.aspx?msg=2428047
 
Once you extract the plain text - that is where Lucene comes in to parse that plain text and create an index.
 
~P




> From: me@aaron-powell.com
> Date: Wed, 2 Feb 2011 12:09:01 +1100
> Subject: Re: Question
> To: lucene-net-user@lucene.apache.org
> 
> Lucene.Net uses the same binary data store that Lucene uses which is stored
> on the file system (generally, it depends on what Directory instance you
> provide to the indexer & searcher).
> 
> Some projects, such as NHibernate.Search and RavenDB use Lucene.Net
> internally and handle syncronizing the data stores (DB & Lucene).
> If you're trying to index things such as HTML/ XML/ PDF/ etc you have to
> write your own way to read the data into Lucene though.
> Aaron Powell
> Umbraco Core Team Member <http://umbraco.codeplex.com> | FunnelWeb Team
> Member <http://funnelweblog.com>
> 
> http://www.aaron-powell.com | http://twitter.com/slace | Skype:
> aaron.l.powell | MSN: aazzap@hotmail.com
> 
> 
> On Wed, Feb 2, 2011 at 12:03 PM, Lucas E Wall <wa...@hotmail.com> wrote:
> 
> >
> > Thanks, Aaron. I went through your blog and it makes a lot of sense.
> > Given that Lucen is asp friendly, can I call the library from mssql? Where
> > does the indexing gets stored? Do I need to provide a database for do files
> > I need indexed, and for the index as well? May be my questions are a little
> > bit too entry level.
> >
> > > From: me@aaron-powell.com
> > > Date: Wed, 2 Feb 2011 11:04:45 +1100
> > > Subject: Re: Question
> > > To: lucene-net-user@lucene.apache.org
> > >
> > > You don't actually install Lucene.Net, it's just a library which you
> > > reference into your application. Solr is an installable Lucene service,
> > > which essentially provides RESTful endpoints to Lucene (java), or so goes
> > my
> > > understanding.
> > >
> > > With regards to what you can search with Lucene, well that really comes
> > down
> > > to anything you can push into the index. Keep in mind that Lucene is just
> > a
> > > indexer and searcher, it's not a crawler or anything. You have to push
> > the
> > > data to the indexer, and you have to write queries to get it back out.
> > > I've got some blogs on my site about getting started with Lucene.Net -
> > > http://www.aaron-powell.com/lucene-net-overview
> > > Aaron Powell
> > > Umbraco Core Team Member <http://umbraco.codeplex.com> | FunnelWeb Team
> > > Member <http://funnelweblog.com>
> > >
> > > http://www.aaron-powell.com | http://twitter.com/slace | Skype:
> > > aaron.l.powell | MSN: aazzap@hotmail.com
> > >
> > >
> > > On Wed, Feb 2, 2011 at 10:57 AM, Lucas E Wall <wa...@hotmail.com>
> > wrote:
> > >
> > > >
> > > > I am new to Lucene and have the following questions. What is the best
> > way
> > > > to understand what is required to install Lucene in a server? Also,
> > can i
> > > > make Lucene run searches on links to xml data on the web?Thanks
> > > >
> >
> > 		 	   		  

Re: Question

Posted by Aaron Powell <me...@aaron-powell.com>.
Lucene.Net uses the same binary data store that Lucene uses which is stored
on the file system (generally, it depends on what Directory instance you
provide to the indexer & searcher).

Some projects, such as NHibernate.Search and RavenDB use Lucene.Net
internally and handle syncronizing the data stores (DB & Lucene).
If you're trying to index things such as HTML/ XML/ PDF/ etc you have to
write your own way to read the data into Lucene though.
Aaron Powell
Umbraco Core Team Member <http://umbraco.codeplex.com> | FunnelWeb Team
Member <http://funnelweblog.com>

http://www.aaron-powell.com | http://twitter.com/slace | Skype:
aaron.l.powell | MSN: aazzap@hotmail.com


On Wed, Feb 2, 2011 at 12:03 PM, Lucas E Wall <wa...@hotmail.com> wrote:

>
> Thanks, Aaron.  I went through your blog and it makes a lot of sense.
>  Given that Lucen is asp friendly, can I call the library from mssql?  Where
> does the indexing gets stored?  Do I need to provide a database for do files
> I need indexed, and for the index as well?  May be my questions are a little
> bit too entry level.
>
> > From: me@aaron-powell.com
> > Date: Wed, 2 Feb 2011 11:04:45 +1100
> > Subject: Re: Question
> > To: lucene-net-user@lucene.apache.org
> >
> > You don't actually install Lucene.Net, it's just a library which you
> > reference into your application. Solr is an installable Lucene service,
> > which essentially provides RESTful endpoints to Lucene (java), or so goes
> my
> > understanding.
> >
> > With regards to what you can search with Lucene, well that really comes
> down
> > to anything you can push into the index. Keep in mind that Lucene is just
> a
> > indexer and searcher, it's not a crawler or anything. You have to push
> the
> > data to the indexer, and you have to write queries to get it back out.
> > I've got some blogs on my site about getting started with Lucene.Net -
> > http://www.aaron-powell.com/lucene-net-overview
> > Aaron Powell
> > Umbraco Core Team Member <http://umbraco.codeplex.com> | FunnelWeb Team
> > Member <http://funnelweblog.com>
> >
> > http://www.aaron-powell.com | http://twitter.com/slace | Skype:
> > aaron.l.powell | MSN: aazzap@hotmail.com
> >
> >
> > On Wed, Feb 2, 2011 at 10:57 AM, Lucas E Wall <wa...@hotmail.com>
> wrote:
> >
> > >
> > > I am new to Lucene and have the following questions.  What is the best
> way
> > > to understand what is required to install Lucene in a server?  Also,
> can i
> > > make Lucene run searches on links to xml data on the web?Thanks
> > >
>
>

RE: Question

Posted by Lucas E Wall <wa...@hotmail.com>.
Thanks, Aaron.  I went through your blog and it makes a lot of sense.  Given that Lucen is asp friendly, can I call the library from mssql?  Where does the indexing gets stored?  Do I need to provide a database for do files I need indexed, and for the index as well?  May be my questions are a little bit too entry level.

> From: me@aaron-powell.com
> Date: Wed, 2 Feb 2011 11:04:45 +1100
> Subject: Re: Question
> To: lucene-net-user@lucene.apache.org
> 
> You don't actually install Lucene.Net, it's just a library which you
> reference into your application. Solr is an installable Lucene service,
> which essentially provides RESTful endpoints to Lucene (java), or so goes my
> understanding.
> 
> With regards to what you can search with Lucene, well that really comes down
> to anything you can push into the index. Keep in mind that Lucene is just a
> indexer and searcher, it's not a crawler or anything. You have to push the
> data to the indexer, and you have to write queries to get it back out.
> I've got some blogs on my site about getting started with Lucene.Net -
> http://www.aaron-powell.com/lucene-net-overview
> Aaron Powell
> Umbraco Core Team Member <http://umbraco.codeplex.com> | FunnelWeb Team
> Member <http://funnelweblog.com>
> 
> http://www.aaron-powell.com | http://twitter.com/slace | Skype:
> aaron.l.powell | MSN: aazzap@hotmail.com
> 
> 
> On Wed, Feb 2, 2011 at 10:57 AM, Lucas E Wall <wa...@hotmail.com> wrote:
> 
> >
> > I am new to Lucene and have the following questions.  What is the best way
> > to understand what is required to install Lucene in a server?  Also, can i
> > make Lucene run searches on links to xml data on the web?Thanks
> >
 		 	   		  

Re: Question

Posted by Aaron Powell <me...@aaron-powell.com>.
You don't actually install Lucene.Net, it's just a library which you
reference into your application. Solr is an installable Lucene service,
which essentially provides RESTful endpoints to Lucene (java), or so goes my
understanding.

With regards to what you can search with Lucene, well that really comes down
to anything you can push into the index. Keep in mind that Lucene is just a
indexer and searcher, it's not a crawler or anything. You have to push the
data to the indexer, and you have to write queries to get it back out.
I've got some blogs on my site about getting started with Lucene.Net -
http://www.aaron-powell.com/lucene-net-overview
Aaron Powell
Umbraco Core Team Member <http://umbraco.codeplex.com> | FunnelWeb Team
Member <http://funnelweblog.com>

http://www.aaron-powell.com | http://twitter.com/slace | Skype:
aaron.l.powell | MSN: aazzap@hotmail.com


On Wed, Feb 2, 2011 at 10:57 AM, Lucas E Wall <wa...@hotmail.com> wrote:

>
> I am new to Lucene and have the following questions.  What is the best way
> to understand what is required to install Lucene in a server?  Also, can i
> make Lucene run searches on links to xml data on the web?Thanks
>