You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by javozzo <da...@gmail.com> on 2013/10/18 11:09:17 UTC

how to retireve content page in solr

Hi, i'm new in solr.
I use Nutch 1.1 to crawl web pages. 
I use solr to indexer these pages. 
My problem is: how to retrieve the content information about a document
"stored" il solr?

Example
If I have a page http://www.prova.com/prova.html
that contains the text "This is a web page"

Is there a way to retrieve the text This is a web page?
Any ideas?
My application is written in java.
Thanks
Danilo



--
View this message in context: http://lucene.472066.n3.nabble.com/how-to-retireve-content-page-in-solr-tp4096302.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to retireve content page in solr

Posted by Otis Gospodnetic <ot...@gmail.com>.

Hi,

Ignore Nutch for a bit and just follow the Solr tutorial to learn about the
Solr side. Should be quick.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Oct 18, 2013 11:30 AM, "javozzo" <da...@gmail.com> wrote:

> hi Harshvardhan Ojha;
> i'm using nutch 1.1 and solr 3.6.0.
> I mean whole document. I try to create a search engine with nutch and solr
> and i would obtain a interface like this:
>
> name1
> http://www.prova.com/name1.html
> first rows of content document
>
> name2
> http://www.prova.com/name2.html
> first rows of content document
>
> name3
> http://www.prova.com/name3.html
> first rows of content document
>
> any ideas?
> Thanks
> Danilo
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-to-retireve-content-page-in-solr-tp4096302p4096333.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: how to retireve content page in solr

Posted by javozzo <da...@gmail.com>.

hi Harshvardhan Ojha;
i'm using nutch 1.1 and solr 3.6.0.
I mean whole document. I try to create a search engine with nutch and solr
and i would obtain a interface like this:

name1
http://www.prova.com/name1.html
first rows of content document

name2
http://www.prova.com/name2.html
first rows of content document

name3
http://www.prova.com/name3.html
first rows of content document

any ideas?
Thanks
Danilo



--
View this message in context: http://lucene.472066.n3.nabble.com/how-to-retireve-content-page-in-solr-tp4096302p4096333.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to retireve content page in solr

Posted by Harshvardhan Ojha <oj...@gmail.com>.

Hi Danila,

What do you mean by content information?
A whole document?
Metadata?
do you keep it separate in some fields?
Or is it about solr search queries?


Regards
Harshvardhan Ojha


On Fri, Oct 18, 2013 at 1:09 PM, javozzo <da...@gmail.com> wrote:

> Hi, i'm new in solr.
> I use Nutch 1.1 to crawl web pages.
> I use solr to indexer these pages.
> My problem is: how to retrieve the content information about a document
> "stored" il solr?
>
> Example
> If I have a page http://www.prova.com/prova.html
> that contains the text "This is a web page"
>
> Is there a way to retrieve the text This is a web page?
> Any ideas?
> My application is written in java.
> Thanks
> Danilo
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-to-retireve-content-page-in-solr-tp4096302.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>