You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Gabriele Kahlout <ga...@mysimpatico.com> on 2011/05/03 12:41:37 UTC

Re: Nutch Web Interface - not anymore in 1.3

Hello,

I'm also in favor of maintaing a web interface that ships with nutch. As has
been mentioned it say well be a bridge to Solr. If I find the time to
contribute my solution (and make it general enough), I'll happily do it.

Earlier I was wondering of actually using the previous nutch web interface
(not solritas/velocity) and integrate with solr index. I still find this
tempting, what's the motivation against it?


I've evaluated Ajax Solr but i didn't get it to work. Listening to Markus
I've tried Solritas I got it to work but w/o highlighting. Why?
Those are the relevant  solrconfig.xml sections:

    <queryResponseWriter name="velocity"
class="org.apache.solr.request.VelocityResponseWriter"/>

 <requestHandler name="/itas" class="solr.SearchHandler">
     <lst name="defaults">
       <str name="v.template">browse</str>
       <str name="v.properties">velocity.properties</str>
       <str name="v.contentType">text/html;charset=UTF-8</str>
       <str name="title">Solritas</str>

       <str name="hl.fl">*</str>
       <str name="qt">standard</str>
       <str name="wt">velocity</str>
       <str name="fq"/>
       <str name="rows">10000</str>
       <str name="hl">on</str>

       <str name="defType">dismax</str>
       <str name="q.alt">*:*</str>
       <str name="fl">*,score</str>
       <str name="facet">on</str>
       <str name="facet.field">title</str>
       <str name="facet.mincount">1</str>
     </lst>
     <!--<lst name="invariants">-->
       <!--<str
name="v.base_dir">/solr/contrib/velocity/src/main/templates</str>-->
     <!--</lst>-->
  </requestHandler>


This was already there:

  <highlighting>
   <!-- Configure the standard fragmenter -->
   <!-- This could most likely be commented out in the "default" case -->
   <fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter"
default="true">
    <lst name="defaults">
     <int name="hl.fragsize">100</int>
    </lst>
   </fragmenter>

   <!-- A regular-expression-based fragmenter (f.i., for sentence
extraction) -->
   <fragmenter name="regex"
class="org.apache.solr.highlight.RegexFragmenter">
    <lst name="defaults">
      <!-- slightly smaller fragsizes work better because of slop -->
      <int name="hl.fragsize">70</int>
      <!-- allow 50% slop on fragment sizes -->
      <float name="hl.regex.slop">0.5</float>
      <!-- a basic sentence pattern -->
      <str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
    </lst>
   </fragmenter>

   <!-- Configure the standard formatter -->
   <formatter name="html" class="org.apache.solr.highlight.HtmlFormatter"
default="true">
    <lst name="defaults">
     <str name="hl.simple.pre"><![CDATA[<em>]]></str>
     <str name="hl.simple.post"><![CDATA[</em>]]></str>
    </lst>
   </formatter>
  </highlighting>


Pointers:
http://stackoverflow.com/questions/5071675/ajax-solr-how-to-make-an-ajax-page-readable-by-google

On Mon, May 2, 2011 at 7:43 PM, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Hi Gabriele,
>
> I would have loved to have done this myself but haven't had the time. I
> also favored having a web interface still included as well.
>
> If you find time to port it to the 1.3 branch/framework I can tell you I'd
> happily devote my time towards a 1.4 release that includes it.
>
> Cheers,
> Chris
>
> On May 2, 2011, at 10:54 AM, Gabriele Kahlout wrote:
>
> > The reason I'm asking is because I had found the nutch webapp pretty neet
> > for a prototype interface (it even did highlighting).
> > I'm thinking of changing it so that it pulls the data from solr index,
> > updating this part in search.jsp:
> >
> > // perform query
> >    // NOTE by Dawid Weiss:
> >    // The 'clustering' window actually moves with the start
> >    // position.... this is good, bad?... ugly?....
> >   Hits hits;
> >   try{
> >      query.getParams().initFrom(start + hitsToRetrieve, hitsPerSite,
> > "site", sort, reverse);
> >     hits = bean.search(query);
> >   } catch (IOException e){
> >     hits = new Hits(0,new Hit[0]);
> >   }
> >
> >
> > Has someone gone through that already? Are there other alternatives you
> have
> > taken? I stumbled upon (w/o stumbledupon.com)
> > http://evolvingweb.github.com/ajax-solr/examples/reuters/index.htmlwhich is
> > quite sophisticated and doesn't do the highlighting!
> >
> >
> > On Mon, May 2, 2011 at 4:45 PM, Markus Jelsma <
> markus.jelsma@openindex.io>wrote:
> >
> >> Yes. It was removed. Indexing and searching is delegated to Solr for
> now.
> >>
> >> On Monday 02 May 2011 16:41:32 Gabriele Kahlout wrote:
> >>> Hello,
> >>>
> >>> Some time ago I was trying to use nutch/search.jsp to search my Solr
> >>> indexes. Trying to do that again I've noticed that in nutch-1.3 there
> is
> >> no
> >>> support for a Nutch web querying interface (presumably in favor of
> solr's
> >>> own). Is it?
> >>
> >> --
> >> Markus Jelsma - CTO - Openindex
> >> http://www.linkedin.com/in/markus17
> >> 050-8536620 / 06-50258350
> >>
> >
> >
> >
> > --
> > Regards,
> > K. Gabriele
> >
> > --- unchanged since 20/9/10 ---
> > P.S. If the subject contains "[LON]" or the addressee acknowledges the
> > receipt within 48 hours then I don't resend the email.
> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> time(x)
> > < Now + 48h) ⇒ ¬resend(I, this).
> >
> > If an email is sent by a sender that is not a trusted contact or the
> email
> > does not contain a valid code then the email is not received. A valid
> code
> > starts with a hyphen and ends with "X".
> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> > L(-[a-z]+[0-9]X)).
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>


-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).