You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Reid, Stephen" <sr...@novantas.com> on 2012/02/02 22:36:37 UTC

solr to index php files

Hi ,

I am a  beginner with Solr and would like to index dynamic php files ( page.php?ID=233) and static php files and .shtml files.  This is for a small website, which hits a small MySql database on the backend, however some php files are static and are not part of the database.

Can you tell me the best way to achieve this?

Also, I know that XML data is returned by default, but how do I go about creating a custom page for the results?


Thanks,
Steve



IMPORTANT NOTICE: This message is intended only for the addressee and
may contain confidential, privileged information. If you are not the
intended recipient, you may not use, copy or disclose any information
contained in the message. If you have received this message in error,
please notify the sender by reply e-mail and delete the message.

RE: solr to index php files

Posted by "Reid, Stephen" <sr...@novantas.com>.
Thanks Emmanuel,

I should have been more clear about my use of the word 'dynamic', I actually meant PHP files that do not pull content from a database. I will take a look at the links you provided, ...thanks again for putting me on the right path.

Steve

-----Original Message-----
From: Emmanuel Espina [mailto:espinaemmanuel@gmail.com] 
Sent: Thursday, February 02, 2012 4:49 PM
To: solr-user@lucene.apache.org
Subject: Re: solr to index php files

What do you mean by static php files? As far as I know PHP is to make pages look dynamic. If you want to index dynamic pages as they where just HTML you will have to download them, and add them to Solr.
Programming a small program in SolrJ and using some HTTP library
(http://hc.apache.org/httpclient-3.x/) to download the pages is the usual thing.

Generating lists of URL, downloading those files to a temporary location and adding them to Solr is the tipical approach. To add them to Solr you can use http://wiki.apache.org/solr/ExtractingRequestHandler or parse them yourself using a library such as TagSoup http://ccil.org/~cowan/XML/tagsoup/ that I havn't tested myself but apparently it is very robust

2012/2/2 Reid, Stephen <sr...@novantas.com>:
> Hi ,
>
> I am a  beginner with Solr and would like to index dynamic php files ( page.php?ID=233) and static php files and .shtml files.  This is for a small website, which hits a small MySql database on the backend, however some php files are static and are not part of the database.
>
> Can you tell me the best way to achieve this?
>
> Also, I know that XML data is returned by default, but how do I go about creating a custom page for the results?
>
>
> Thanks,
> Steve
>
>
>
> IMPORTANT NOTICE: This message is intended only for the addressee and 
> may contain confidential, privileged information. If you are not the 
> intended recipient, you may not use, copy or disclose any information 
> contained in the message. If you have received this message in error, 
> please notify the sender by reply e-mail and delete the message.

IMPORTANT NOTICE: This message is intended only for the addressee and
may contain confidential, privileged information. If you are not the
intended recipient, you may not use, copy or disclose any information
contained in the message. If you have received this message in error,
please notify the sender by reply e-mail and delete the message.

Re: solr to index php files

Posted by Emmanuel Espina <es...@gmail.com>.
What do you mean by static php files? As far as I know PHP is to make
pages look dynamic. If you want to index dynamic pages as they where
just HTML you will have to download them, and add them to Solr.
Programming a small program in SolrJ and using some HTTP library
(http://hc.apache.org/httpclient-3.x/) to download the pages is the
usual thing.

Generating lists of URL, downloading those files to a temporary
location and adding them to Solr is the tipical approach. To add them
to Solr you can use
http://wiki.apache.org/solr/ExtractingRequestHandler or parse them
yourself using a library such as TagSoup
http://ccil.org/~cowan/XML/tagsoup/ that I havn't tested myself but
apparently it is very robust

2012/2/2 Reid, Stephen <sr...@novantas.com>:
> Hi ,
>
> I am a  beginner with Solr and would like to index dynamic php files ( page.php?ID=233) and static php files and .shtml files.  This is for a small website, which hits a small MySql database on the backend, however some php files are static and are not part of the database.
>
> Can you tell me the best way to achieve this?
>
> Also, I know that XML data is returned by default, but how do I go about creating a custom page for the results?
>
>
> Thanks,
> Steve
>
>
>
> IMPORTANT NOTICE: This message is intended only for the addressee and
> may contain confidential, privileged information. If you are not the
> intended recipient, you may not use, copy or disclose any information
> contained in the message. If you have received this message in error,
> please notify the sender by reply e-mail and delete the message.

Re: solr to index php files

Posted by Ahmet Arslan <io...@yahoo.com>.
> I am a  beginner with Solr and would like to index
> dynamic php files ( page.php?ID=233) and static php files
> and .shtml files.  This is for a small website, which
> hits a small MySql database on the backend, however some php
> files are static and are not part of the database.
> 
> Can you tell me the best way to achieve this?

Looks like a material for nutch and solr 
http://www.lucidimagination.com/blog/2010/09/10/refresh-using-nutch-with-solr/

> Also, I know that XML data is returned by default, but how
> do I go about creating a custom page for the results?

Consider using velocity response writer ( aka solaritas )
http://wiki.apache.org/solr/VelocityResponseWriter