You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mark <ma...@gmail.com> on 2010/12/07 04:42:44 UTC

Solr Newbie - need a point in the right direction

Hi,

First time poster here - I'm not entirely sure where I need to look for this
information.

What I'm trying to do is extract some (presumably) structured information
from non-uniform data (eg, prices from a nutch crawl) that needs to show in
search queries, and I've come up against a wall.

I've been unable to figure out where is the best place to begin.

I had a look through the solr wiki and did a search via Lucid's search tool
and I'm guessing this is handled at index time through my schema? But I've
also seen dismax being thrown around as a possible solution and this has
confused me.

Basically, if you guys could point me in the right direction for resources
(even as much as saying, you need X, it's over there) that would be a huge
help.

Cheers

Mark

Re: Solr Newbie - need a point in the right direction

Posted by Mark <ma...@gmail.com>.
Thanks to everyone who responded, no wonder I was getting confused, I was
completely focusing on the wrong half of the equation.

I had a cursory look through some of the Nutch documentation available and
it is looking promising.

Thanks everyone.

Mark

On Tue, Dec 7, 2010 at 10:19 PM, webdev1977 <we...@gmail.com> wrote:

>
> I my experience, the hardest (but most flexible part) is exactly what was
> mentioned.. processing the data.  Nutch does have a really easy plugin
> interface that you can use, and the example plugin is a great place to
> start.  Once you have the raw parsed text, you can do what ever you want
> with it.  For example, I wrote a  plugin to add geospatial information to
> my
> NutchDocument.  You then map the fields you added in the NutchDocument to
> something you want to have Solr index.  In my case I created a geography
> field where I put lat, lon info.  Then you create that same geography field
> in the nutch to solr mapping file as well as your solr schema.xml file.
> Then, when you run the crawl and tell it to use "solrindex" it will send
> the
> document to solr to be indexed.  Since you have your new field in the
> schema, it knows what to do with it at index time.  Now you can build a
> user
> interface around what you want to do with that field.
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Newbie-need-a-point-in-the-right-direction-tp2031381p2033687.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Solr Newbie - need a point in the right direction

Posted by webdev1977 <we...@gmail.com>.
I my experience, the hardest (but most flexible part) is exactly what was
mentioned.. processing the data.  Nutch does have a really easy plugin
interface that you can use, and the example plugin is a great place to
start.  Once you have the raw parsed text, you can do what ever you want
with it.  For example, I wrote a  plugin to add geospatial information to my
NutchDocument.  You then map the fields you added in the NutchDocument to
something you want to have Solr index.  In my case I created a geography
field where I put lat, lon info.  Then you create that same geography field
in the nutch to solr mapping file as well as your solr schema.xml file. 
Then, when you run the crawl and tell it to use "solrindex" it will send the
document to solr to be indexed.  Since you have your new field in the
schema, it knows what to do with it at index time.  Now you can build a user
interface around what you want to do with that field.  


-- 
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Newbie-need-a-point-in-the-right-direction-tp2031381p2033687.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Newbie - need a point in the right direction

Posted by Gora Mohanty <go...@mimirtech.com>.
On Tue, Dec 7, 2010 at 9:12 AM, Mark <ma...@gmail.com> wrote:
[...]
> What I'm trying to do is extract some (presumably) structured information
> from non-uniform data (eg, prices from a nutch crawl) that needs to show in
> search queries, and I've come up against a wall.
>
> I've been unable to figure out where is the best place to begin.
>
> I had a look through the solr wiki and did a search via Lucid's search tool
> and I'm guessing this is handled at index time through my schema? But I've
> also seen dismax being thrown around as a possible solution and this has
> confused me.
>
> Basically, if you guys could point me in the right direction for resources
> (even as much as saying, you need X, it's over there) that would be a huge
> help.
[...]

Sorry, the above is a little unclear, at least to me. The basic steps in running
Solr are:
* Installing, configuring, and getting Solr running
* Indexing data, as also updating, and deleting: The best way to do this
  depends on where your data are coming from. Since you mention Nutch,
  that already integrates with Solr, although by default in a manner that
  dumps the entire content from a crawl into a Solr field. You will probably
  need to write a custom Nutch parser plugin in order to extract a subset
  from the content. Please see http://wiki.apache.org/nutch/RunningNutchAndSolr
* Searching through Solr

A good way of getting started is by going through the Solr tutorial:
http://lucene.apache.org/solr/tutorial.html . The Solr Wiki is also fairly
extensive: http://wiki.apache.org/solr/FrontPage . Finally, searching
Google for "solr getting started" turns up many likely-looking links.

Regards,
Gora

Re: Solr Newbie - need a point in the right direction

Posted by Erick Erickson <er...@gmail.com>.
Solr is downstream of what I think you want. There's nothing in Solr
that allows you to take an arbitrary page and extract specific info
from it. I suspect the Nutch folks have dealt with this kind of question,
looking over the user's list there might give some insight.

Basically, once you have the page, you extract the information to
put into your structured Solr document, "extracting the information"
is the hard part and there's nothing built into Solr that I know of
that helps with that...

Best
Erick

On Mon, Dec 6, 2010 at 10:42 PM, Mark <ma...@gmail.com> wrote:

> Hi,
>
> First time poster here - I'm not entirely sure where I need to look for
> this
> information.
>
> What I'm trying to do is extract some (presumably) structured information
> from non-uniform data (eg, prices from a nutch crawl) that needs to show in
> search queries, and I've come up against a wall.
>
> I've been unable to figure out where is the best place to begin.
>
> I had a look through the solr wiki and did a search via Lucid's search tool
> and I'm guessing this is handled at index time through my schema? But I've
> also seen dismax being thrown around as a possible solution and this has
> confused me.
>
> Basically, if you guys could point me in the right direction for resources
> (even as much as saying, you need X, it's over there) that would be a huge
> help.
>
> Cheers
>
> Mark
>