You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@community.apache.org by Shane Curcuru <as...@shanecurcuru.org> on 2017/03/28 15:15:42 UTC

Using Solr/Lucene to provide our own site search?

As has been pondered many times (recently by Rich and Sally, among many
others), it would be really nice to better help newcomers find the right
information at the ASF or our projects.  We have one of the industry's
leading search tools right here: why aren't we using it, and even
better, semi-consistently across apache.org sites that want to?

Is there *sustainable* interest here in ComDev to organize this work?
There are a lot of moving bits, most of which need to be done by
volunteers (infra typically only promises to maintain core services).
So the issue is 1) having a great story why this would be useful, to
attract the 2) volunteers to work on each of the parts.

- Lucene PMC has a VM already; one hopes they could run an index of some
size there already.
-- Requires Lucene/Solr experience to configure, setup, and test.

- Websites we want to search need to have search code integrated.  As
@fluxo points out, it's simple to plop a Solr search box somewhere - but
harder to make it fit site design, use correct config, etc.

- Runbooks and maintenance - this is something that (likely) ComDev or
someone not-infra would need to maintain.  In particular, if we wanted
to add this to apache.org, we'd *really* need to ensure it can get fixed
when it breaks.

I think a good search index including apache.org and community.a.o would
be very valuable to help newcomers find the *right* page where
information is stored.  The question is, do we have the people to do all
the bits to make it happen this year?

-- 

- Shane
  https://www.apache.org/foundation/marks/resources

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Using Solr/Lucene to provide our own site search?

Posted by Piergiorgio Lucidi <pi...@apache.org>.
Hi,

if we also need other complementary connectors, we could consider to use
Apache ManifoldCF for crawling.
Now we have a lot of new connectors and a new GUI in the next coming
release!

^__^



2017-03-28 21:44 GMT+02:00 Grant Ingersoll <gs...@apache.org>:

> https://github.com/lucidworks/searchhub has all the crawlers/setup already
> setup for a number of ASF projects (email, Github, websites, wikis, Stack
> Overflow) and a pretty easy framework for specifying others (I looked at
> the FOAF stuff, but it wasn't consistent enough to automate).  Lucidworks
> (my employer/company) is happy to donate licenses of Fusion, our commercial
> product on top of Solr and Spark, if the ASF will provide hardware.  Or, if
> someone will put up the Pull Request to add all the projects, we can host
> it, as we already have a multinode cluster setup and we have read only APIs
> available, so it would just take UI integration.
>
> -Grant
>
>
> On Tue, Mar 28, 2017 at 1:16 PM Dave Fisher <da...@comcast.net> wrote:
>
> > Hi -
> >
> > I’ve got knowledge too and I also have some ideas I am thinking about. I
> > also have some bandwidth now that I am going into job search mode.
> >
> > I think an important step is to think through what the taxonomy should be
> > as that will help inform the common schema.
> >
> > Regards,
> > Dave
> >
> > > On Mar 28, 2017, at 9:34 AM, Alexandre Rafalovitch <arafalov@gmail.com
> >
> > wrote:
> > >
> > > Just to provide links:
> > > http://jirasearch.mikemccandless.com/search.py?index=jira - Lucene
> > > (not Solr) based search of issues for several projects. Very deep
> > > understanding of the domain. Adding more is probably not that hard.
> > > http://search-lucene.com/ - Solr-based, search over mailing lists,
> > > wikis, issues, etc for a bunch (a larger number) of projects. Run by
> > > Sematext (Otis' company)
> > > http://find.searchhub.org/ - commercial LucidWorks' Fusion-based IIRC
> > > (though some bits are open-source). Lots of projects and sources. But
> > > it feels a bit dogfoody, so the attention it gets is uneven.
> > >
> > > So, I think Nick/Chris' point is valid that the definition of the
> > > project may need to take this into account and it is entirely possible
> > > that expanding these (if the project owners would agree) might be
> > > actually the easiest path forward.
> > >
> > >
> > > Regards,
> > >   Alex.
> > > ----
> > > http://www.solr-start.com/ - Resources for Solr users, new and
> > experienced
> > >
> > >
> > > On 28 March 2017 at 12:20, Chris Mattmann <ma...@apache.org> wrote:
> > >> +1 I think that minimizing the requirement to run specific
> > infrastructure, and trying
> > >> to convince those already running such services I believe like Otis
> and
> > Grant/others
> > >> from Lucid are optimal choices.
> > >>
> > >> Cheers,
> > >> Chris
> > >>
> > >>
> > >>
> > >>
> > >> On 3/28/17, 12:19 PM, "Nick Burch" <ni...@apache.org> wrote:
> > >>
> > >>    On Tue, 28 Mar 2017, Shane Curcuru wrote:
> > >>> As has been pondered many times (recently by Rich and Sally, among
> many
> > >>> others), it would be really nice to better help newcomers find the
> > right
> > >>> information at the ASF or our projects.  We have one of the
> industry's
> > >>> leading search tools right here: why aren't we using it, and even
> > >>> better, semi-consistently across apache.org sites that want to?
> > >>
> > >>    Some Apache projects do have externally hosted instances of SOLR
> > indexing
> > >>    and searching their project sites. Tika and Lucene are two such
> > sites, off
> > >>    the top of my head. Would asking the committers maintaining those
> > about
> > >>    adding some more sites be an option?
> > >>
> > >>    Nick
> > >>
> > >>    ------------------------------------------------------------
> ---------
> > >>    To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> > >>    For additional commands, e-mail: dev-help@community.apache.org
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> > >> For additional commands, e-mail: dev-help@community.apache.org
> > >>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> > > For additional commands, e-mail: dev-help@community.apache.org
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> > For additional commands, e-mail: dev-help@community.apache.org
> >
> >
>



-- 
Piergiorgio Lucidi
Technology Evangelist @ Sourcesense
Author and Technical Reviewer @ Packt Publishing
Mentor / PMC Member / Committer @ Apache Software Foundation
Wiki Gardener / Forum Moderator / Certified Instructor, Engineer and
Administrator @ Alfresco
Top Community Contributor @ Crafter
Project Leader / Committer @ JBoss
Technology Advisory Team Member @ Microsoft
http://www.open4dev.com

Re: Using Solr/Lucene to provide our own site search?

Posted by Grant Ingersoll <gs...@apache.org>.
https://github.com/lucidworks/searchhub has all the crawlers/setup already
setup for a number of ASF projects (email, Github, websites, wikis, Stack
Overflow) and a pretty easy framework for specifying others (I looked at
the FOAF stuff, but it wasn't consistent enough to automate).  Lucidworks
(my employer/company) is happy to donate licenses of Fusion, our commercial
product on top of Solr and Spark, if the ASF will provide hardware.  Or, if
someone will put up the Pull Request to add all the projects, we can host
it, as we already have a multinode cluster setup and we have read only APIs
available, so it would just take UI integration.

-Grant


On Tue, Mar 28, 2017 at 1:16 PM Dave Fisher <da...@comcast.net> wrote:

> Hi -
>
> I’ve got knowledge too and I also have some ideas I am thinking about. I
> also have some bandwidth now that I am going into job search mode.
>
> I think an important step is to think through what the taxonomy should be
> as that will help inform the common schema.
>
> Regards,
> Dave
>
> > On Mar 28, 2017, at 9:34 AM, Alexandre Rafalovitch <ar...@gmail.com>
> wrote:
> >
> > Just to provide links:
> > http://jirasearch.mikemccandless.com/search.py?index=jira - Lucene
> > (not Solr) based search of issues for several projects. Very deep
> > understanding of the domain. Adding more is probably not that hard.
> > http://search-lucene.com/ - Solr-based, search over mailing lists,
> > wikis, issues, etc for a bunch (a larger number) of projects. Run by
> > Sematext (Otis' company)
> > http://find.searchhub.org/ - commercial LucidWorks' Fusion-based IIRC
> > (though some bits are open-source). Lots of projects and sources. But
> > it feels a bit dogfoody, so the attention it gets is uneven.
> >
> > So, I think Nick/Chris' point is valid that the definition of the
> > project may need to take this into account and it is entirely possible
> > that expanding these (if the project owners would agree) might be
> > actually the easiest path forward.
> >
> >
> > Regards,
> >   Alex.
> > ----
> > http://www.solr-start.com/ - Resources for Solr users, new and
> experienced
> >
> >
> > On 28 March 2017 at 12:20, Chris Mattmann <ma...@apache.org> wrote:
> >> +1 I think that minimizing the requirement to run specific
> infrastructure, and trying
> >> to convince those already running such services I believe like Otis and
> Grant/others
> >> from Lucid are optimal choices.
> >>
> >> Cheers,
> >> Chris
> >>
> >>
> >>
> >>
> >> On 3/28/17, 12:19 PM, "Nick Burch" <ni...@apache.org> wrote:
> >>
> >>    On Tue, 28 Mar 2017, Shane Curcuru wrote:
> >>> As has been pondered many times (recently by Rich and Sally, among many
> >>> others), it would be really nice to better help newcomers find the
> right
> >>> information at the ASF or our projects.  We have one of the industry's
> >>> leading search tools right here: why aren't we using it, and even
> >>> better, semi-consistently across apache.org sites that want to?
> >>
> >>    Some Apache projects do have externally hosted instances of SOLR
> indexing
> >>    and searching their project sites. Tika and Lucene are two such
> sites, off
> >>    the top of my head. Would asking the committers maintaining those
> about
> >>    adding some more sites be an option?
> >>
> >>    Nick
> >>
> >>    ---------------------------------------------------------------------
> >>    To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> >>    For additional commands, e-mail: dev-help@community.apache.org
> >>
> >>
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> >> For additional commands, e-mail: dev-help@community.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> > For additional commands, e-mail: dev-help@community.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org
>
>

Re: Using Solr/Lucene to provide our own site search?

Posted by Dave Fisher <da...@comcast.net>.
Hi -

I’ve got knowledge too and I also have some ideas I am thinking about. I also have some bandwidth now that I am going into job search mode.

I think an important step is to think through what the taxonomy should be as that will help inform the common schema.

Regards,
Dave

> On Mar 28, 2017, at 9:34 AM, Alexandre Rafalovitch <ar...@gmail.com> wrote:
> 
> Just to provide links:
> http://jirasearch.mikemccandless.com/search.py?index=jira - Lucene
> (not Solr) based search of issues for several projects. Very deep
> understanding of the domain. Adding more is probably not that hard.
> http://search-lucene.com/ - Solr-based, search over mailing lists,
> wikis, issues, etc for a bunch (a larger number) of projects. Run by
> Sematext (Otis' company)
> http://find.searchhub.org/ - commercial LucidWorks' Fusion-based IIRC
> (though some bits are open-source). Lots of projects and sources. But
> it feels a bit dogfoody, so the attention it gets is uneven.
> 
> So, I think Nick/Chris' point is valid that the definition of the
> project may need to take this into account and it is entirely possible
> that expanding these (if the project owners would agree) might be
> actually the easiest path forward.
> 
> 
> Regards,
>   Alex.
> ----
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
> 
> 
> On 28 March 2017 at 12:20, Chris Mattmann <ma...@apache.org> wrote:
>> +1 I think that minimizing the requirement to run specific infrastructure, and trying
>> to convince those already running such services I believe like Otis and Grant/others
>> from Lucid are optimal choices.
>> 
>> Cheers,
>> Chris
>> 
>> 
>> 
>> 
>> On 3/28/17, 12:19 PM, "Nick Burch" <ni...@apache.org> wrote:
>> 
>>    On Tue, 28 Mar 2017, Shane Curcuru wrote:
>>> As has been pondered many times (recently by Rich and Sally, among many
>>> others), it would be really nice to better help newcomers find the right
>>> information at the ASF or our projects.  We have one of the industry's
>>> leading search tools right here: why aren't we using it, and even
>>> better, semi-consistently across apache.org sites that want to?
>> 
>>    Some Apache projects do have externally hosted instances of SOLR indexing
>>    and searching their project sites. Tika and Lucene are two such sites, off
>>    the top of my head. Would asking the committers maintaining those about
>>    adding some more sites be an option?
>> 
>>    Nick
>> 
>>    ---------------------------------------------------------------------
>>    To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
>>    For additional commands, e-mail: dev-help@community.apache.org
>> 
>> 
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
>> For additional commands, e-mail: dev-help@community.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Using Solr/Lucene to provide our own site search?

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Just to provide links:
http://jirasearch.mikemccandless.com/search.py?index=jira - Lucene
(not Solr) based search of issues for several projects. Very deep
understanding of the domain. Adding more is probably not that hard.
http://search-lucene.com/ - Solr-based, search over mailing lists,
wikis, issues, etc for a bunch (a larger number) of projects. Run by
Sematext (Otis' company)
http://find.searchhub.org/ - commercial LucidWorks' Fusion-based IIRC
(though some bits are open-source). Lots of projects and sources. But
it feels a bit dogfoody, so the attention it gets is uneven.

So, I think Nick/Chris' point is valid that the definition of the
project may need to take this into account and it is entirely possible
that expanding these (if the project owners would agree) might be
actually the easiest path forward.


Regards,
   Alex.
----
http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 28 March 2017 at 12:20, Chris Mattmann <ma...@apache.org> wrote:
> +1 I think that minimizing the requirement to run specific infrastructure, and trying
> to convince those already running such services I believe like Otis and Grant/others
> from Lucid are optimal choices.
>
> Cheers,
> Chris
>
>
>
>
> On 3/28/17, 12:19 PM, "Nick Burch" <ni...@apache.org> wrote:
>
>     On Tue, 28 Mar 2017, Shane Curcuru wrote:
>     > As has been pondered many times (recently by Rich and Sally, among many
>     > others), it would be really nice to better help newcomers find the right
>     > information at the ASF or our projects.  We have one of the industry's
>     > leading search tools right here: why aren't we using it, and even
>     > better, semi-consistently across apache.org sites that want to?
>
>     Some Apache projects do have externally hosted instances of SOLR indexing
>     and searching their project sites. Tika and Lucene are two such sites, off
>     the top of my head. Would asking the committers maintaining those about
>     adding some more sites be an option?
>
>     Nick
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
>     For additional commands, e-mail: dev-help@community.apache.org
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Using Solr/Lucene to provide our own site search?

Posted by Chris Mattmann <ma...@apache.org>.
+1 I think that minimizing the requirement to run specific infrastructure, and trying
to convince those already running such services I believe like Otis and Grant/others
from Lucid are optimal choices.

Cheers,
Chris




On 3/28/17, 12:19 PM, "Nick Burch" <ni...@apache.org> wrote:

    On Tue, 28 Mar 2017, Shane Curcuru wrote:
    > As has been pondered many times (recently by Rich and Sally, among many
    > others), it would be really nice to better help newcomers find the right
    > information at the ASF or our projects.  We have one of the industry's
    > leading search tools right here: why aren't we using it, and even
    > better, semi-consistently across apache.org sites that want to?
    
    Some Apache projects do have externally hosted instances of SOLR indexing 
    and searching their project sites. Tika and Lucene are two such sites, off 
    the top of my head. Would asking the committers maintaining those about 
    adding some more sites be an option?
    
    Nick
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
    For additional commands, e-mail: dev-help@community.apache.org
    
    



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Using Solr/Lucene to provide our own site search?

Posted by Nick Burch <ni...@apache.org>.
On Tue, 28 Mar 2017, Shane Curcuru wrote:
> As has been pondered many times (recently by Rich and Sally, among many
> others), it would be really nice to better help newcomers find the right
> information at the ASF or our projects.  We have one of the industry's
> leading search tools right here: why aren't we using it, and even
> better, semi-consistently across apache.org sites that want to?

Some Apache projects do have externally hosted instances of SOLR indexing 
and searching their project sites. Tika and Lucene are two such sites, off 
the top of my head. Would asking the committers maintaining those about 
adding some more sites be an option?

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Using Solr/Lucene to provide our own site search?

Posted by Ted Dunning <te...@gmail.com>.
Sounds like a roadmap more than a focus, but otherwise good.

On Mar 29, 2017 6:46 AM, "Shane Curcuru" <as...@shanecurcuru.org> wrote:

> Wow, so there are folks interested!  Great discussion, and good point:
> projects that span projects or are working on broader community aspects
> need clear definitions to keep disparate volunteers focused.
>
> There are two things *I* would be interested in helping with:
>
>
> 1- (Small concept) Super easy to use and maintain search widget and
> portal for apache.org and community.a.o sites, focused on helping
> newcomers find their way to the right "how do I do X" pages.  Value:
> Make ComDev efforts more useful by ensuring new users - who might not
> know what terminology we use, or where our docs are traditionally stored
> - to find the how to information they need.
>
> In terms of structure: it feels like there might be three general kinds
> of people searching on a.o/c.a.o:
>
> - General information about Apache (what it is, structure, donate, etc.)
> - How To information, like where SVN is, how to change websites, where
> different email lists are, etc.
> - Project information: either about a specific project, or more likely
> "Which projects might I want to contribute to".
>
>
> 2- (Big concept) Creating a simple, drop-in, Apache branded search
> widget that we encourage projects to consider adopting on their
> homepages.  Value: better search for individual project's users; plus
> showcasing some cross-project links and interest.
>
> Obviously, there are several existing setups that provide a lot of this
> functionality already (more than I realized).  But it would be really
> nice to have a solution that is:
>
> -- Super easy to drop into any existing site's design.
> Maven/Forrest/ghpages style sites would be the obvious targets.  (Do we
> have stats as to how many projects use each site builder tool?)
>
> -- Branded (primarily) as an Apache Lucene/Solr powered search tool.
> The Lucidworks site looks pretty powerful, and does cross-project stuff,
> but for me, the interest is providing something that is primarily
> powered by Apache tools (even if it uses someone's hosting or extra
> code, as a secondary thing).
>
> The search results/refinement pages in particular should match Apache
> site designs in general.  The Google custom search results are kinda
> ugly, now that I look at them.  The Lucidworks site is nice, but if
> we're going to feature this prominently, it would be nice to use Apache
> site design, with a text note "powered by LucidWorks Fusion, more
> details here" or the like.
>
> -- Maintenance documented.  ComDev already has a lot of code this PMC
> owns, and since we're not focused on code daily, we need to ensure
> whatever maintenance is easy to do.
>
>
> Is that a good focus for now?
>
> - Shane
>
>
> --
>
> - Shane
>   https://www.apache.org/foundation/marks/resources
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org
>
>

Re: Using Solr/Lucene to provide our own site search?

Posted by Grant Ingersoll <gs...@apache.org>.
On Fri, May 12, 2017 at 7:24 AM Shane Curcuru <as...@shanecurcuru.org> wrote:

> Grant Ingersoll wrote on 5/11/17 10:41 AM:
> > On Wed, May 10, 2017 at 9:01 AM Shane Curcuru <as...@shanecurcuru.org>
> wrote:
> ...snip...
> > I'll be at AC on Thursday (work schedule got screwed up, so I have to
> scale
> > back my participation).  Happy to discuss some there.  I can demo what we
> > have if that helps.
>
> Yes, please.  If I don't find you, I'm probably hanging out at the infra
> table in whatever hackathon space there is.
>
> At this point, either you could host (but we'd need to know how to keep
> the service up, or degrade nicely!) of ComDev could ask infra for a VM,
> and we could host there.  Will need to explore later if we can get this
> as an official infra SLA service or not (someone needs to formally
> request that).
>

Cool, I'll track you down on Thursday after my talk.


>
> >> I'm happy for the simplest tool to get working/maintained.  The only
> >> issue for me would be on the www.a.o site itself to keep the branding
> >> image focused on Apache on the page you start searching from.
> >>
> >
> > Of course.  We can host the backend and serve up the JSON payload or ASF
> > can host.  Overall branding is up to the ASF, but just like Atlassian for
> > JIRA, et. al, we'd at least ask for a Powered By badge somewhere on the
> > page, likely in the footer.
>
> OK, that should be fine.  My concern is the optics of anything we host
> on the www.a.o homepage itself.  When you get to search results, having
> the powered by footer and text is just fine, and makes sense.
>

Yep.  We'll just be sending back a JSON payload w/ the results.  Then it's
up to us (w/ me having my ASF hat on) to format/display as we see fit.

Re: Using Solr/Lucene to provide our own site search?

Posted by Shane Curcuru <as...@shanecurcuru.org>.
Grant Ingersoll wrote on 5/11/17 10:41 AM:
> On Wed, May 10, 2017 at 9:01 AM Shane Curcuru <as...@shanecurcuru.org> wrote:
...snip...
> I'll be at AC on Thursday (work schedule got screwed up, so I have to scale
> back my participation).  Happy to discuss some there.  I can demo what we
> have if that helps.

Yes, please.  If I don't find you, I'm probably hanging out at the infra
table in whatever hackathon space there is.

At this point, either you could host (but we'd need to know how to keep
the service up, or degrade nicely!) of ComDev could ask infra for a VM,
and we could host there.  Will need to explore later if we can get this
as an official infra SLA service or not (someone needs to formally
request that).

>> I'm happy for the simplest tool to get working/maintained.  The only
>> issue for me would be on the www.a.o site itself to keep the branding
>> image focused on Apache on the page you start searching from.
>>
> 
> Of course.  We can host the backend and serve up the JSON payload or ASF
> can host.  Overall branding is up to the ASF, but just like Atlassian for
> JIRA, et. al, we'd at least ask for a Powered By badge somewhere on the
> page, likely in the footer.

OK, that should be fine.  My concern is the optics of anything we host
on the www.a.o homepage itself.  When you get to search results, having
the powered by footer and text is just fine, and makes sense.

- Shane


-- 

- Shane
  https://www.apache.org/foundation/marks/resources

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Using Solr/Lucene to provide our own site search?

Posted by Grant Ingersoll <gs...@apache.org>.
On Wed, May 10, 2017 at 9:01 AM Shane Curcuru <as...@shanecurcuru.org> wrote:

> > So, just over another month has gone by w/ no movement.  Anyone still
> > interested in this?
>
>
> Yes, but not until after ApacheCon.  Hopefully we can find a few
> like-minded people to figure out a project plan in Miami?
>

I'll be at AC on Thursday (work schedule got screwed up, so I have to scale
back my participation).  Happy to discuss some there.  I can demo what we
have if that helps.


>
> I'm happy for the simplest tool to get working/maintained.  The only
> issue for me would be on the www.a.o site itself to keep the branding
> image focused on Apache on the page you start searching from.
>

Of course.  We can host the backend and serve up the JSON payload or ASF
can host.  Overall branding is up to the ASF, but just like Atlassian for
JIRA, et. al, we'd at least ask for a Powered By badge somewhere on the
page, likely in the footer.

Re: Using Solr/Lucene to provide our own site search?

Posted by Shane Curcuru <as...@shanecurcuru.org>.
Grant Ingersoll wrote on 5/10/17 8:49 AM:
> On Thu, Mar 30, 2017 at 2:36 PM Grant Ingersoll <gs...@apache.org> wrote:
> 
>> On Wed, Mar 29, 2017 at 9:46 AM Shane Curcuru <as...@shanecurcuru.org>
>> wrote:
>>
>>>
>>>
>>> Totally understand, although that will take a fair bit of custom coding
>> that people have to maintain, even if you are "just" wiring together all
>> ASF projects.  It's a great goal, but just like we didn't build or own bug
>> tracker even though all the "bits" are there at the ASF, we may not want to
>> build all of this either, esp. since there is an OOTB solution that is
>> already built and powered at it's core by ASF tech (and not hidden,
>> either).  If you look at how much our custom, Solr based, internal members
>> search site has lagged due to lack of maintenance (not ever sure if it is
>> even maintained anymore), you will know this is no small task.  Given the
>> number of times this has come up over the years at the ASF, me thinks the
>> shorter, easier to do path is the most likely to be successful.
>>
>> That being said, I'm happy to support at whatever level, as I think it's
>> long overdue.
>>
>>
> 
> So, just over another month has gone by w/ no movement.  Anyone still
> interested in this?


Yes, but not until after ApacheCon.  Hopefully we can find a few
like-minded people to figure out a project plan in Miami?

I'm happy for the simplest tool to get working/maintained.  The only
issue for me would be on the www.a.o site itself to keep the branding
image focused on Apache on the page you start searching from.
-- 

- Shane
  https://www.apache.org/foundation/marks/resources

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Using Solr/Lucene to provide our own site search?

Posted by Grant Ingersoll <gs...@apache.org>.
On Thu, Mar 30, 2017 at 2:36 PM Grant Ingersoll <gs...@apache.org> wrote:

> On Wed, Mar 29, 2017 at 9:46 AM Shane Curcuru <as...@shanecurcuru.org>
> wrote:
>
>>
>>
>> Totally understand, although that will take a fair bit of custom coding
> that people have to maintain, even if you are "just" wiring together all
> ASF projects.  It's a great goal, but just like we didn't build or own bug
> tracker even though all the "bits" are there at the ASF, we may not want to
> build all of this either, esp. since there is an OOTB solution that is
> already built and powered at it's core by ASF tech (and not hidden,
> either).  If you look at how much our custom, Solr based, internal members
> search site has lagged due to lack of maintenance (not ever sure if it is
> even maintained anymore), you will know this is no small task.  Given the
> number of times this has come up over the years at the ASF, me thinks the
> shorter, easier to do path is the most likely to be successful.
>
> That being said, I'm happy to support at whatever level, as I think it's
> long overdue.
>
>

So, just over another month has gone by w/ no movement.  Anyone still
interested in this?

Re: Using Solr/Lucene to provide our own site search?

Posted by Grant Ingersoll <gs...@apache.org>.
On Wed, Mar 29, 2017 at 9:46 AM Shane Curcuru <as...@shanecurcuru.org> wrote:

>
>
> -- Branded (primarily) as an Apache Lucene/Solr powered search tool.
>

Fusion is also powered by Spark, Tika and a number of other ASF projects,
FWIW.  You have direct, secured access to Solr and all of the usual Solr
constructs are available (although we have some higher order functions that
can make that stuff point and click, too)


> The Lucidworks site looks pretty powerful, and does cross-project stuff,
> but for me, the interest is providing something that is primarily
> powered by Apache tools (even if it uses someone's hosting or extra
> code, as a secondary thing).
>

Totally understand, although that will take a fair bit of custom coding
that people have to maintain, even if you are "just" wiring together all
ASF projects.  It's a great goal, but just like we didn't build or own bug
tracker even though all the "bits" are there at the ASF, we may not want to
build all of this either, esp. since there is an OOTB solution that is
already built and powered at it's core by ASF tech (and not hidden,
either).  If you look at how much our custom, Solr based, internal members
search site has lagged due to lack of maintenance (not ever sure if it is
even maintained anymore), you will know this is no small task.  Given the
number of times this has come up over the years at the ASF, me thinks the
shorter, easier to do path is the most likely to be successful.

That being said, I'm happy to support at whatever level, as I think it's
long overdue.


>
> The search results/refinement pages in particular should match Apache
> site designs in general.  The Google custom search results are kinda
> ugly, now that I look at them.  The Lucidworks site is nice, but if
> we're going to feature this prominently, it would be nice to use Apache
> site design,


Yeah, you can put whatever front end you want on it.  We just present a set
of REST APIs that you can hit via whatever language you want.  We've used
it in both Python Flask/Angular-based Apps (our simple demo site:
http://searchhub.lucidworks.com/search?query=(q:%27*%27,rows:10,start:0,wt:json))
 as well as Wordpress/Backbone apps (our company website search is done
this way: https://lucidworks.com/resources/#all/solr%20and%20lucene)

If someone gets me access to some hardware or a VM, I can have a POC up and
indexing/searching in a few hours.  If you want to do it in AWS, it's a few
minutes.


 -Grant

Re: Using Solr/Lucene to provide our own site search?

Posted by Shane Curcuru <as...@shanecurcuru.org>.
Wow, so there are folks interested!  Great discussion, and good point:
projects that span projects or are working on broader community aspects
need clear definitions to keep disparate volunteers focused.

There are two things *I* would be interested in helping with:


1- (Small concept) Super easy to use and maintain search widget and
portal for apache.org and community.a.o sites, focused on helping
newcomers find their way to the right "how do I do X" pages.  Value:
Make ComDev efforts more useful by ensuring new users - who might not
know what terminology we use, or where our docs are traditionally stored
- to find the how to information they need.

In terms of structure: it feels like there might be three general kinds
of people searching on a.o/c.a.o:

- General information about Apache (what it is, structure, donate, etc.)
- How To information, like where SVN is, how to change websites, where
different email lists are, etc.
- Project information: either about a specific project, or more likely
"Which projects might I want to contribute to".


2- (Big concept) Creating a simple, drop-in, Apache branded search
widget that we encourage projects to consider adopting on their
homepages.  Value: better search for individual project's users; plus
showcasing some cross-project links and interest.

Obviously, there are several existing setups that provide a lot of this
functionality already (more than I realized).  But it would be really
nice to have a solution that is:

-- Super easy to drop into any existing site's design.
Maven/Forrest/ghpages style sites would be the obvious targets.  (Do we
have stats as to how many projects use each site builder tool?)

-- Branded (primarily) as an Apache Lucene/Solr powered search tool.
The Lucidworks site looks pretty powerful, and does cross-project stuff,
but for me, the interest is providing something that is primarily
powered by Apache tools (even if it uses someone's hosting or extra
code, as a secondary thing).

The search results/refinement pages in particular should match Apache
site designs in general.  The Google custom search results are kinda
ugly, now that I look at them.  The Lucidworks site is nice, but if
we're going to feature this prominently, it would be nice to use Apache
site design, with a text note "powered by LucidWorks Fusion, more
details here" or the like.

-- Maintenance documented.  ComDev already has a lot of code this PMC
owns, and since we're not focused on code daily, we need to ensure
whatever maintenance is easy to do.


Is that a good focus for now?

- Shane


-- 

- Shane
  https://www.apache.org/foundation/marks/resources

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Using Solr/Lucene to provide our own site search?

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
I am an Apache Solr committer and would be happy to be part of such project.

My primary focus within Solr community is onboarding, training and
general beginner-orientation. I would be happy to do Solr intro course
to any Apache Solr committer and contribute to this project on Solr
issues such as schema design, search tuning, troubleshooting, and
other usual issues.

I am not a very good sysadmin though, even for Solr :-(

Regards,
   Alex.
----
http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 28 March 2017 at 11:15, Shane Curcuru <as...@shanecurcuru.org> wrote:
> As has been pondered many times (recently by Rich and Sally, among many
> others), it would be really nice to better help newcomers find the right
> information at the ASF or our projects.  We have one of the industry's
> leading search tools right here: why aren't we using it, and even
> better, semi-consistently across apache.org sites that want to?
>
> Is there *sustainable* interest here in ComDev to organize this work?
> There are a lot of moving bits, most of which need to be done by
> volunteers (infra typically only promises to maintain core services).
> So the issue is 1) having a great story why this would be useful, to
> attract the 2) volunteers to work on each of the parts.
>
> - Lucene PMC has a VM already; one hopes they could run an index of some
> size there already.
> -- Requires Lucene/Solr experience to configure, setup, and test.
>
> - Websites we want to search need to have search code integrated.  As
> @fluxo points out, it's simple to plop a Solr search box somewhere - but
> harder to make it fit site design, use correct config, etc.
>
> - Runbooks and maintenance - this is something that (likely) ComDev or
> someone not-infra would need to maintain.  In particular, if we wanted
> to add this to apache.org, we'd *really* need to ensure it can get fixed
> when it breaks.
>
> I think a good search index including apache.org and community.a.o would
> be very valuable to help newcomers find the *right* page where
> information is stored.  The question is, do we have the people to do all
> the bits to make it happen this year?
>
> --
>
> - Shane
>   https://www.apache.org/foundation/marks/resources
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org