You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Jay Scott <bi...@gmail.com> on 2022/04/17 02:25:28 UTC

Is solr what I want, or something else?

I apologize if I've gone overboard in asking my question,
but I have had trouble making myself understood, especially
when I don't know the proper terminology.

I have the impression (perhaps mistaken) that I can use
solr to search for ISBN and get a list of all the ISBNs
found in any of the indexed files.  I don't need to do that.

I want to index all the words in all the files in a list
of directories, like the way
the old mnogoSearch used to do.  Basically, this is
a "grep -l" cache-preparer, so to speak.

I hope that the search software lets me search for combinations
of words; I've been assuming that's built in.

I want to do all of this locally -- not use the cloud or
anything like that.  mnogoSearch worked okay for me, but
it's dead, and I'd like to move on to something modern.
Apache nutch is a web crawler -- setting up a web server
solely for the purpose of specifying what files I want
indexed seems -- artificial.  I guess I could do that
but, golly....  Seems like there ought to be something more
direct.

Solr was suggested as a way to do this.  Do I want something
else?

j.

Re: Is solr what I want, or something else?

Posted by Dave <ha...@gmail.com>.
Solr can easily do what you want if I understand you correctly. Key terminology to use would be “document” for the expected items your search would return, in your case sounds like the folder with the text files, “fields” being the metadata points for each document, in your case sounds like text for the raw text, issn/isbn, and hopefully a title, and “id” being a unique field identifying your “document” in your case could simply be the folder name.  Hopefully that helps a bit with the basics to describe what you want

> On Apr 17, 2022, at 12:56 PM, Jay Scott <bi...@gmail.com> wrote:
> 
> i'm going to watch some tutorials and see if they'll show
> me what i need.  i still have a feeling solr is much more
> than what i need, but, oh, well.  it won't hurt me to learn
> something new.  let me do some homework; if i need
> more help i'll ask.  thanks to all who replied.
> 
> j.
> 
> 
>> On Sun, Apr 17, 2022 at 11:03 AM dmitri maziuk <dm...@gmail.com>
>> wrote:
>> 
>>> On 2022-04-16 9:25 PM, Jay Scott wrote:
>>> 
>>> I want to do all of this locally -- not use the cloud or
>>> anything like that.
>> 
>> Can you fire up a docker container? These days it's not that hard to
>> spin up an instance of something to play with and see what it does.
>> 
>> Dima
>> 
>> 

Re: Is solr what I want, or something else?

Posted by Thomas Corthals <th...@klascement.net>.
Op ma 18 apr. 2022 02:05 schreef Shawn Heisey <el...@elyograg.org>:

> On 4/17/2022 10:55 AM, Jay Scott wrote:
> > i'm going to watch some tutorials and see if they'll show
> > me what i need.  i still have a feeling solr is much more
> > than what i need, but, oh, well.  it won't hurt me to learn
> > something new.  let me do some homework; if i need
> > more help i'll ask.  thanks to all who replied.
> >
>
> Solr includes a TON of functionality that the vast majority of the
> userbase will never need.
>
> All of that can be a little overwhelming.  We as a project don't have
> something we need -- a config for a very simple use case, that doesn't
> have the kitchen sink in it.
>
> Thanks,
> Shawn
>

I started with the sample config and commented out everything I reckoned I
didn't need. Which means I've done more research into some of the
functionality I don't use than in some of the things I do.

Thomas

Re: Is solr what I want, or something else?

Posted by Shawn Heisey <el...@elyograg.org>.
On 4/17/2022 10:55 AM, Jay Scott wrote:
> i'm going to watch some tutorials and see if they'll show
> me what i need.  i still have a feeling solr is much more
> than what i need, but, oh, well.  it won't hurt me to learn
> something new.  let me do some homework; if i need
> more help i'll ask.  thanks to all who replied.
>

Solr includes a TON of functionality that the vast majority of the 
userbase will never need.

All of that can be a little overwhelming.  We as a project don't have 
something we need -- a config for a very simple use case, that doesn't 
have the kitchen sink in it.

Thanks,
Shawn


Re: Is solr what I want, or something else?

Posted by Jay Scott <bi...@gmail.com>.
i'm going to watch some tutorials and see if they'll show
me what i need.  i still have a feeling solr is much more
than what i need, but, oh, well.  it won't hurt me to learn
something new.  let me do some homework; if i need
more help i'll ask.  thanks to all who replied.

j.


On Sun, Apr 17, 2022 at 11:03 AM dmitri maziuk <dm...@gmail.com>
wrote:

> On 2022-04-16 9:25 PM, Jay Scott wrote:
>
> > I want to do all of this locally -- not use the cloud or
> > anything like that.
>
> Can you fire up a docker container? These days it's not that hard to
> spin up an instance of something to play with and see what it does.
>
> Dima
>
>

Re: Is solr what I want, or something else?

Posted by dmitri maziuk <dm...@gmail.com>.
On 2022-04-16 9:25 PM, Jay Scott wrote:

> I want to do all of this locally -- not use the cloud or
> anything like that.

Can you fire up a docker container? These days it's not that hard to 
spin up an instance of something to play with and see what it does.

Dima


Re: Is solr what I want, or something else?

Posted by Nguyen Nguyen <ng...@gmail.com>.
On Sat, Apr 16, 2022 at 9:27 PM Shawn Heisey <ap...@elyograg.org> wrote:

> On 4/16/2022 8:25 PM, Jay Scott wrote:
> > I hope that the search software lets me search for combinations
> > of words; I've been assuming that's built in.
>
> Yes, most likely Solr will handle this need.
>
> > I want to do all of this locally -- not use the cloud or
> > anything like that.  mnogoSearch worked okay for me, but
> > it's dead, and I'd like to move on to something modern.
> > Apache nutch is a web crawler -- setting up a web server
> > solely for the purpose of specifying what files I want
> > indexed seems -- artificial.  I guess I could do that
> > but, golly....  Seems like there ought to be something more
> > direct.
>
> As I understand it, Nutch doesn't actually do search.  It's really good
> at crawling a website and gathering all the data it contains, but relies
> on other software for searching what it has gathered. We hear from a lot
> of people that are having Solr handle indexing for Nutch.
>
> > Solr was suggested as a way to do this.  Do I want something
> > else?
>
> That's a tough question to answer and be sure the answer is right.  In
> general, Solr probably meets the needs of just about any kind of
> searching you want to do, but sometimes people manage to find things
> where Solr isn't the right solution.
>
> Based on what little information is here about your needs, I'm going to
> cautiously say Solr is probably a good fit.  To be sure that answer is
> correct, we will need more information.  Exactly what information we
> will need is not completely straightforward. If you start with some high
> level information about the data you want to search, then we will know
> what questions to ask next.
>
> The first thing to nail down ... what do you want to get as the result
> of a search?  Do you want Solr to provide ALL of the information in the
> result grid, or is it enough for Solr to return some kind of unique ID
> that your software can then look up in another system to provide detail
> to the user?  That is the start of defining a "document" for Solr.  In
> one large system that I designed, a Solr document was basically a row in
> a database table.  The table had 160 million rows ... the entire table
> file in MySQL was over a terabyte.  Solr actually did have a lot of
> information stored for each of those documents, so a search result grid
> displayed to the user was populated entirely from Solr.  If the user
> then clicked on one of those results, the database would be consulted
> for full details, using the unique identifier in the search results.
>
> Thanks,
> Shawn
>
>

Re: Is solr what I want, or something else?

Posted by Shawn Heisey <ap...@elyograg.org>.
On 4/16/2022 8:25 PM, Jay Scott wrote:
> I hope that the search software lets me search for combinations
> of words; I've been assuming that's built in.

Yes, most likely Solr will handle this need.

> I want to do all of this locally -- not use the cloud or
> anything like that.  mnogoSearch worked okay for me, but
> it's dead, and I'd like to move on to something modern.
> Apache nutch is a web crawler -- setting up a web server
> solely for the purpose of specifying what files I want
> indexed seems -- artificial.  I guess I could do that
> but, golly....  Seems like there ought to be something more
> direct.

As I understand it, Nutch doesn't actually do search.  It's really good 
at crawling a website and gathering all the data it contains, but relies 
on other software for searching what it has gathered. We hear from a lot 
of people that are having Solr handle indexing for Nutch.

> Solr was suggested as a way to do this.  Do I want something
> else?

That's a tough question to answer and be sure the answer is right.  In 
general, Solr probably meets the needs of just about any kind of 
searching you want to do, but sometimes people manage to find things 
where Solr isn't the right solution.

Based on what little information is here about your needs, I'm going to 
cautiously say Solr is probably a good fit.  To be sure that answer is 
correct, we will need more information.  Exactly what information we 
will need is not completely straightforward. If you start with some high 
level information about the data you want to search, then we will know 
what questions to ask next.

The first thing to nail down ... what do you want to get as the result 
of a search?  Do you want Solr to provide ALL of the information in the 
result grid, or is it enough for Solr to return some kind of unique ID 
that your software can then look up in another system to provide detail 
to the user?  That is the start of defining a "document" for Solr.  In 
one large system that I designed, a Solr document was basically a row in 
a database table.  The table had 160 million rows ... the entire table 
file in MySQL was over a terabyte.  Solr actually did have a lot of 
information stored for each of those documents, so a search result grid 
displayed to the user was populated entirely from Solr.  If the user 
then clicked on one of those results, the database would be consulted 
for full details, using the unique identifier in the search results.

Thanks,
Shawn