You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Gabriele Kahlout <ga...@mysimpatico.com> on 2011/05/03 12:41:37 UTC
Re: Nutch Web Interface - not anymore in 1.3
Hello,
I'm also in favor of maintaing a web interface that ships with nutch. As has
been mentioned it say well be a bridge to Solr. If I find the time to
contribute my solution (and make it general enough), I'll happily do it.
Earlier I was wondering of actually using the previous nutch web interface
(not solritas/velocity) and integrate with solr index. I still find this
tempting, what's the motivation against it?
I've evaluated Ajax Solr but i didn't get it to work. Listening to Markus
I've tried Solritas I got it to work but w/o highlighting. Why?
Those are the relevant solrconfig.xml sections:
<queryResponseWriter name="velocity"
class="org.apache.solr.request.VelocityResponseWriter"/>
<requestHandler name="/itas" class="solr.SearchHandler">
<lst name="defaults">
<str name="v.template">browse</str>
<str name="v.properties">velocity.properties</str>
<str name="v.contentType">text/html;charset=UTF-8</str>
<str name="title">Solritas</str>
<str name="hl.fl">*</str>
<str name="qt">standard</str>
<str name="wt">velocity</str>
<str name="fq"/>
<str name="rows">10000</str>
<str name="hl">on</str>
<str name="defType">dismax</str>
<str name="q.alt">*:*</str>
<str name="fl">*,score</str>
<str name="facet">on</str>
<str name="facet.field">title</str>
<str name="facet.mincount">1</str>
</lst>
<!--<lst name="invariants">-->
<!--<str
name="v.base_dir">/solr/contrib/velocity/src/main/templates</str>-->
<!--</lst>-->
</requestHandler>
This was already there:
<highlighting>
<!-- Configure the standard fragmenter -->
<!-- This could most likely be commented out in the "default" case -->
<fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter"
default="true">
<lst name="defaults">
<int name="hl.fragsize">100</int>
</lst>
</fragmenter>
<!-- A regular-expression-based fragmenter (f.i., for sentence
extraction) -->
<fragmenter name="regex"
class="org.apache.solr.highlight.RegexFragmenter">
<lst name="defaults">
<!-- slightly smaller fragsizes work better because of slop -->
<int name="hl.fragsize">70</int>
<!-- allow 50% slop on fragment sizes -->
<float name="hl.regex.slop">0.5</float>
<!-- a basic sentence pattern -->
<str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
</lst>
</fragmenter>
<!-- Configure the standard formatter -->
<formatter name="html" class="org.apache.solr.highlight.HtmlFormatter"
default="true">
<lst name="defaults">
<str name="hl.simple.pre"><![CDATA[<em>]]></str>
<str name="hl.simple.post"><![CDATA[</em>]]></str>
</lst>
</formatter>
</highlighting>
Pointers:
http://stackoverflow.com/questions/5071675/ajax-solr-how-to-make-an-ajax-page-readable-by-google
On Mon, May 2, 2011 at 7:43 PM, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:
> Hi Gabriele,
>
> I would have loved to have done this myself but haven't had the time. I
> also favored having a web interface still included as well.
>
> If you find time to port it to the 1.3 branch/framework I can tell you I'd
> happily devote my time towards a 1.4 release that includes it.
>
> Cheers,
> Chris
>
> On May 2, 2011, at 10:54 AM, Gabriele Kahlout wrote:
>
> > The reason I'm asking is because I had found the nutch webapp pretty neet
> > for a prototype interface (it even did highlighting).
> > I'm thinking of changing it so that it pulls the data from solr index,
> > updating this part in search.jsp:
> >
> > // perform query
> > // NOTE by Dawid Weiss:
> > // The 'clustering' window actually moves with the start
> > // position.... this is good, bad?... ugly?....
> > Hits hits;
> > try{
> > query.getParams().initFrom(start + hitsToRetrieve, hitsPerSite,
> > "site", sort, reverse);
> > hits = bean.search(query);
> > } catch (IOException e){
> > hits = new Hits(0,new Hit[0]);
> > }
> >
> >
> > Has someone gone through that already? Are there other alternatives you
> have
> > taken? I stumbled upon (w/o stumbledupon.com)
> > http://evolvingweb.github.com/ajax-solr/examples/reuters/index.htmlwhich is
> > quite sophisticated and doesn't do the highlighting!
> >
> >
> > On Mon, May 2, 2011 at 4:45 PM, Markus Jelsma <
> markus.jelsma@openindex.io>wrote:
> >
> >> Yes. It was removed. Indexing and searching is delegated to Solr for
> now.
> >>
> >> On Monday 02 May 2011 16:41:32 Gabriele Kahlout wrote:
> >>> Hello,
> >>>
> >>> Some time ago I was trying to use nutch/search.jsp to search my Solr
> >>> indexes. Trying to do that again I've noticed that in nutch-1.3 there
> is
> >> no
> >>> support for a Nutch web querying interface (presumably in favor of
> solr's
> >>> own). Is it?
> >>
> >> --
> >> Markus Jelsma - CTO - Openindex
> >> http://www.linkedin.com/in/markus17
> >> 050-8536620 / 06-50258350
> >>
> >
> >
> >
> > --
> > Regards,
> > K. Gabriele
> >
> > --- unchanged since 20/9/10 ---
> > P.S. If the subject contains "[LON]" or the addressee acknowledges the
> > receipt within 48 hours then I don't resend the email.
> > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> time(x)
> > < Now + 48h) ⇒ ¬resend(I, this).
> >
> > If an email is sent by a sender that is not a trusted contact or the
> email
> > does not contain a valid code then the email is not received. A valid
> code
> > starts with a hyphen and ends with "X".
> > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
> > L(-[a-z]+[0-9]X)).
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW: http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
--
Regards,
K. Gabriele
--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).
If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).