You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jonathan Candelaria <ca...@appstate.edu> on 2018/05/26 18:33:38 UTC

Re: How to specify what pages to add to index?

Hello.
I have a page that consists of a domain name and several folders in it
corresponding to different web applications.

eg:
website.university.edu/app1
website.university.edu/app2
website.university.edu/app3

And all the pages are stored in separate folders in an html directory.
There is nothing in the main directory other than "This page is left blank"
(all pages are databases that are for internal use only).

How do I get Solr to index website.university.edu/app2 specifically?
I've been searching docs and Google for a while, but I can't seem to find
where can I specify the root url that I need to be indexed.

All my Solr instances that don't have this old-world style setup (eg:
app4.website.university.edu) work perfectly.

Any help would be greatly appreciated. I've only updated old versions of
Solr and migrated a few instances. I've only worked with Solr for about 4
months. I'm not an expert.


-- 

*Jonathan Candelaria*
Business Applications Analyst
Belk Library and Information Commons
Office: (828) 262-2774



LEGAL DISCLAIMER
The information transmitted is intended solely for the individual or entity
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of or
taking action in reliance upon this information by persons or entities
other than the intended recipient is prohibited. If you have received this
email in error, please contact the sender and delete the material from any
computer. By replying to this e-mail, you consent to Appalachian State
University's monitoring activities of all communication that occurs on
Appalachian State University's systems.

Re: How to specify what pages to add to index?

Posted by Jonathan Candelaria <ca...@appstate.edu>.
Thanks- It's actually more like a localhost/app2:
app2 in question is Omeka (digital publishing platform)
When Omeka is installed on a server, it's usually all alone on the server.

So you *tell *it to index something and what core corresponds to that index
and it indexes it?

If so, I think I'll have to parse through the indexing components and see
how the command lines fed to Solr are generated.

On Sat, May 26, 2018 at 5:44 PM, Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> I think you may have other pieces of software in that equation. Solr does
> not normally pull data from websites, it gets data pushed.
>
> Well, data import handler can do it. Then you normally start indexing by a
> command to Solr. That commans corresponds to a request handler in
> solrconfig.xml that also refers to a  separate config file with extra
> parameters. That may be your next step.
>
> But you may also have something like Mitch, which will spider a site and
> push it into Solr. Check for that.
>
> Regards,
>     Alex
>
>
> On Sat, May 26, 2018, 2:33 PM Jonathan Candelaria, <
> candelariajr@appstate.edu> wrote:
>
> > Hello.
> > I have a page that consists of a domain name and several folders in it
> > corresponding to different web applications.
> >
> > eg:
> > website.university.edu/app1
> > website.university.edu/app2
> > website.university.edu/app3
> >
> > And all the pages are stored in separate folders in an html directory.
> > There is nothing in the main directory other than "This page is left
> blank"
> > (all pages are databases that are for internal use only).
> >
> > How do I get Solr to index website.university.edu/app2 specifically?
> > I've been searching docs and Google for a while, but I can't seem to find
> > where can I specify the root url that I need to be indexed.
> >
> > All my Solr instances that don't have this old-world style setup (eg:
> > app4.website.university.edu) work perfectly.
> >
> > Any help would be greatly appreciated. I've only updated old versions of
> > Solr and migrated a few instances. I've only worked with Solr for about 4
> > months. I'm not an expert.
> >
> >
> > --
> >
> > *Jonathan Candelaria*
> > Business Applications Analyst
> > Belk Library and Information Commons
> > Office: (828) 262-2774
> >
> >
> >
> > LEGAL DISCLAIMER
> > The information transmitted is intended solely for the individual or
> entity
> > to which it is addressed and may contain confidential and/or privileged
> > material. Any review, retransmission, dissemination or other use of or
> > taking action in reliance upon this information by persons or entities
> > other than the intended recipient is prohibited. If you have received
> this
> > email in error, please contact the sender and delete the material from
> any
> > computer. By replying to this e-mail, you consent to Appalachian State
> > University's monitoring activities of all communication that occurs on
> > Appalachian State University's systems.
> >
>



-- 

*Jonathan Candelaria*
Business Applications Analyst
Belk Library and Information Commons
Office: (828) 262-2774



LEGAL DISCLAIMER
The information transmitted is intended solely for the individual or entity
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of or
taking action in reliance upon this information by persons or entities
other than the intended recipient is prohibited. If you have received this
email in error, please contact the sender and delete the material from any
computer. By replying to this e-mail, you consent to Appalachian State
University's monitoring activities of all communication that occurs on
Appalachian State University's systems.

Re: How to specify what pages to add to index?

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
I think you may have other pieces of software in that equation. Solr does
not normally pull data from websites, it gets data pushed.

Well, data import handler can do it. Then you normally start indexing by a
command to Solr. That commans corresponds to a request handler in
solrconfig.xml that also refers to a  separate config file with extra
parameters. That may be your next step.

But you may also have something like Mitch, which will spider a site and
push it into Solr. Check for that.

Regards,
    Alex


On Sat, May 26, 2018, 2:33 PM Jonathan Candelaria, <
candelariajr@appstate.edu> wrote:

> Hello.
> I have a page that consists of a domain name and several folders in it
> corresponding to different web applications.
>
> eg:
> website.university.edu/app1
> website.university.edu/app2
> website.university.edu/app3
>
> And all the pages are stored in separate folders in an html directory.
> There is nothing in the main directory other than "This page is left blank"
> (all pages are databases that are for internal use only).
>
> How do I get Solr to index website.university.edu/app2 specifically?
> I've been searching docs and Google for a while, but I can't seem to find
> where can I specify the root url that I need to be indexed.
>
> All my Solr instances that don't have this old-world style setup (eg:
> app4.website.university.edu) work perfectly.
>
> Any help would be greatly appreciated. I've only updated old versions of
> Solr and migrated a few instances. I've only worked with Solr for about 4
> months. I'm not an expert.
>
>
> --
>
> *Jonathan Candelaria*
> Business Applications Analyst
> Belk Library and Information Commons
> Office: (828) 262-2774
>
>
>
> LEGAL DISCLAIMER
> The information transmitted is intended solely for the individual or entity
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of or
> taking action in reliance upon this information by persons or entities
> other than the intended recipient is prohibited. If you have received this
> email in error, please contact the sender and delete the material from any
> computer. By replying to this e-mail, you consent to Appalachian State
> University's monitoring activities of all communication that occurs on
> Appalachian State University's systems.
>