You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by lu...@uol.com.br on 2005/04/10 17:06:29 UTC

XML OUTPUT

Hi!

 Does anybody knows how to output search results in XML format?
 I would like to provide my data like Google/Yahoo do with their API's.

Thanks!

Re: XML OUTPUT

Posted by Jack Tang <hi...@gmail.com>.
Guys

Please join "Nutch-39 issue" thread in nutch-dev maillist discussion. Thanks

/Jack 

On Apr 12, 2005 8:28 AM, zhang jin <pr...@gmail.com> wrote:
> Thanks very much,that's very good!
> 
> On Apr 12, 2005 12:56 AM, Orlando Tempobono - AtlasVision <
> orlando.tempobono@atlasvision.com> wrote:
> >
> > Hi,
> >
> > We are working in a network of search websites here in Brazil called
> > www.sitedebusca.com <http://www.sitedebusca.com> the complete list are in
> > http://www.servicodebusca.com/sitesdebusca.php and we add some patchs on
> > search.jsp to show the results in a
> > simple XML format, to read in your own application actually write in PHP.
> > We already are using nutch in a "beta" environment. We have plans to use
> > only nutch on a network of more than
> > 50 regional search web sites.
> > The code of search.jsp are in the next lines, i hope you can understand
> > my email and i hope this code are useful for
> > you.
> >
> > Regards,
> > AtlasVision - Team
> >
> > <%@ page
> > contentType="text/xml; charset=ISO-8859-1"
> > pageEncoding="ISO-8859-1"
> >
> > import="javax.servlet.*"
> > import="javax.servlet.http.*"
> > import="java.io.*"
> > import="java.util.*"
> > import="java.net.*"
> >
> > import="net.nutch.html.Entities"
> > import="net.nutch.searcher.*"
> > %><%
> >
> > NutchBean bean = NutchBean.get(application);
> >
> > // set the character encoding to use when interpreting request values
> > request.setCharacterEncoding("ISO-8859-1");
> >
> > bean.LOG.info("query request from " + request.getRemoteAddr());
> >
> > // get query from request
> > String queryString = request.getParameter("query");
> > if (queryString == null) queryString = "";
> >
> > // first hit to display
> > int start = 0;
> > String startString = request.getParameter("start");
> > if (startString != null) start = Integer.parseInt(startString);
> >
> > // number of hits to display
> > int hitsPerPage = 10;
> > String hitsString = request.getParameter("hitsPerPage");
> > if (hitsString != null) hitsPerPage = Integer.parseInt(hitsString);
> >
> > // max hits per site
> > int hitsPerSite = 2;
> > String hitsPerSiteString = request.getParameter("hitsPerSite");
> > if (hitsPerSiteString != null) hitsPerSite =
> > Integer.parseInt(hitsPerSiteString);
> >
> > Query query = Query.parse(queryString);
> > bean.LOG.info("query: " + queryString);
> >
> > // perform query
> > // Hits hits = bean.search(query, start + 1000, hitsPerSite); // FIXME
> > esta
> > linha estava provocando erros na query: linux
> > Hits hits = bean.search(query, start + hitsPerPage, hitsPerSite);
> >
> > // Last hit in the page
> > int end = start + hitsPerPage - 1;
> > if (end > hits.getLength() - 1) end = hits.getLength() - 1;
> >
> > // Total length in the page
> > int length = 0;
> >
> > if (start < end)
> > length = end - start + 1;
> >
> > bean.LOG.info("total hits: " + hits.getTotal());
> >
> > %><?xml version="1.0" encoding="ISO-8859-1"?>
> > <%
> > // To prevent the character encoding declared with 'contentType' page
> > // directive from being overriden by JSTL (apache i18n), we freeze it
> > // by flushing the output buffer.
> > // see
> > http://java.sun.com/developer/technicalArticles/Intl/MultilingualJSP/
> > out.flush();
> > %>
> > <nutchSearch>
> > <querystring><%=Entities.encode(queryString)%></querystring>
> >
> > <hitsInfo>
> > <hitsPerPage><%=hitsPerPage%></hitsPerPage>
> > <hitsPerSite><%=hitsPerSite%></hitsPerSite>
> > <start><%=new Long(start)%></start>
> > <end><%=new Long(end)%></end>
> > <total><%=new Long(hits.getTotal())%></total>
> > <totalIsExact><%=new Boolean(hits.totalIsExact())%></totalIsExact>
> > <length><%=new Integer(hits.getLength())%></length>
> > <lengthInPage><%=length%></lengthInPage>
> > </hitsInfo>
> >
> > <%
> > if (length > 0) {
> > %>
> > <hitsData>
> > <%
> > Hit[] show = hits.getHits(start, length);
> > HitDetails[] details = bean.getDetails(show);
> > String[] summaries = bean.getSummary(details, query);
> >
> > // display the hits
> > for (int i = 0; i < length; i++) {
> >
> > Hit hit = show[i];
> > HitDetails detail = details[i];
> > String title = detail.getValue("title");
> > String url = detail.getValue("url");
> > String summary = summaries[i].replaceAll("([ \t\n\r]|&nbsp;){2,}", "
> > ");
> > String id = "idx=" + hit.getIndexNo() + "&id=" +
> > hit.getIndexDocNo();
> >
> > // use url for docs w/o title
> > if (title == null || title.equals("")) title = url;
> > %>
> > <hit>
> > <title><![CDATA[<%=title%>]]></title>
> > <summary><![CDATA[<%=summary%>]]></summary>
> > <url><![CDATA[<%=url%>]]></url>
> > <indexNo><%=hit.getIndexNo()%></indexNo>
> > <docNo><%=hit.getIndexDocNo()%></docNo>
> > <moreFromSite><%=(hit.moreFromSiteExcluded())%></moreFromSite>
> > <site><![CDATA[<%=hit.getSite()%>]]></site>
> > </hit>
> > <%
> > }
> > %>
> > </hitsData>
> > <%
> > }
> > %>
> >
> > </nutchSearch>
> >
> >
> > ----- Original Message -----
> > From: <lu...@uol.com.br>
> > To: <nu...@incubator.apache.org>
> > Sent: Sunday, April 10, 2005 12:06 PM
> > Subject: XML OUTPUT
> >
> > Hi!
> >
> > Does anybody knows how to output search results in XML format?
> > I would like to provide my data like Google/Yahoo do with their API's.
> >
> > Thanks!
> >
> >
> 
>

Re: XML OUTPUT

Posted by zhang jin <pr...@gmail.com>.
Thanks very much,that's very good!

On Apr 12, 2005 12:56 AM, Orlando Tempobono - AtlasVision <
orlando.tempobono@atlasvision.com> wrote:
> 
> Hi,
> 
> We are working in a network of search websites here in Brazil called
> www.sitedebusca.com <http://www.sitedebusca.com> the complete list are in
> http://www.servicodebusca.com/sitesdebusca.php and we add some patchs on
> search.jsp to show the results in a
> simple XML format, to read in your own application actually write in PHP.
> We already are using nutch in a "beta" environment. We have plans to use
> only nutch on a network of more than
> 50 regional search web sites.
> The code of search.jsp are in the next lines, i hope you can understand
> my email and i hope this code are useful for
> you.
> 
> Regards,
> AtlasVision - Team
> 
> <%@ page
> contentType="text/xml; charset=ISO-8859-1"
> pageEncoding="ISO-8859-1"
> 
> import="javax.servlet.*"
> import="javax.servlet.http.*"
> import="java.io.*"
> import="java.util.*"
> import="java.net.*"
> 
> import="net.nutch.html.Entities"
> import="net.nutch.searcher.*"
> %><%
> 
> NutchBean bean = NutchBean.get(application);
> 
> // set the character encoding to use when interpreting request values
> request.setCharacterEncoding("ISO-8859-1");
> 
> bean.LOG.info("query request from " + request.getRemoteAddr());
> 
> // get query from request
> String queryString = request.getParameter("query");
> if (queryString == null) queryString = "";
> 
> // first hit to display
> int start = 0;
> String startString = request.getParameter("start");
> if (startString != null) start = Integer.parseInt(startString);
> 
> // number of hits to display
> int hitsPerPage = 10;
> String hitsString = request.getParameter("hitsPerPage");
> if (hitsString != null) hitsPerPage = Integer.parseInt(hitsString);
> 
> // max hits per site
> int hitsPerSite = 2;
> String hitsPerSiteString = request.getParameter("hitsPerSite");
> if (hitsPerSiteString != null) hitsPerSite =
> Integer.parseInt(hitsPerSiteString);
> 
> Query query = Query.parse(queryString);
> bean.LOG.info("query: " + queryString);
> 
> // perform query
> // Hits hits = bean.search(query, start + 1000, hitsPerSite); // FIXME 
> esta
> linha estava provocando erros na query: linux
> Hits hits = bean.search(query, start + hitsPerPage, hitsPerSite);
> 
> // Last hit in the page
> int end = start + hitsPerPage - 1;
> if (end > hits.getLength() - 1) end = hits.getLength() - 1;
> 
> // Total length in the page
> int length = 0;
> 
> if (start < end)
> length = end - start + 1;
> 
> bean.LOG.info("total hits: " + hits.getTotal());
> 
> %><?xml version="1.0" encoding="ISO-8859-1"?>
> <%
> // To prevent the character encoding declared with 'contentType' page
> // directive from being overriden by JSTL (apache i18n), we freeze it
> // by flushing the output buffer.
> // see
> http://java.sun.com/developer/technicalArticles/Intl/MultilingualJSP/
> out.flush();
> %>
> <nutchSearch>
> <querystring><%=Entities.encode(queryString)%></querystring>
> 
> <hitsInfo>
> <hitsPerPage><%=hitsPerPage%></hitsPerPage>
> <hitsPerSite><%=hitsPerSite%></hitsPerSite>
> <start><%=new Long(start)%></start>
> <end><%=new Long(end)%></end>
> <total><%=new Long(hits.getTotal())%></total>
> <totalIsExact><%=new Boolean(hits.totalIsExact())%></totalIsExact>
> <length><%=new Integer(hits.getLength())%></length>
> <lengthInPage><%=length%></lengthInPage>
> </hitsInfo>
> 
> <%
> if (length > 0) {
> %>
> <hitsData>
> <%
> Hit[] show = hits.getHits(start, length);
> HitDetails[] details = bean.getDetails(show);
> String[] summaries = bean.getSummary(details, query);
> 
> // display the hits
> for (int i = 0; i < length; i++) {
> 
> Hit hit = show[i];
> HitDetails detail = details[i];
> String title = detail.getValue("title");
> String url = detail.getValue("url");
> String summary = summaries[i].replaceAll("([ \t\n\r]|&nbsp;){2,}", "
> ");
> String id = "idx=" + hit.getIndexNo() + "&id=" +
> hit.getIndexDocNo();
> 
> // use url for docs w/o title
> if (title == null || title.equals("")) title = url;
> %>
> <hit>
> <title><![CDATA[<%=title%>]]></title>
> <summary><![CDATA[<%=summary%>]]></summary>
> <url><![CDATA[<%=url%>]]></url>
> <indexNo><%=hit.getIndexNo()%></indexNo>
> <docNo><%=hit.getIndexDocNo()%></docNo>
> <moreFromSite><%=(hit.moreFromSiteExcluded())%></moreFromSite>
> <site><![CDATA[<%=hit.getSite()%>]]></site>
> </hit>
> <%
> }
> %>
> </hitsData>
> <%
> }
> %>
> 
> </nutchSearch>
> 
> 
> ----- Original Message -----
> From: <lu...@uol.com.br>
> To: <nu...@incubator.apache.org>
> Sent: Sunday, April 10, 2005 12:06 PM
> Subject: XML OUTPUT
> 
> Hi!
> 
> Does anybody knows how to output search results in XML format?
> I would like to provide my data like Google/Yahoo do with their API's.
> 
> Thanks!
> 
>

Re: XML OUTPUT

Posted by Orlando Tempobono - AtlasVision <or...@atlasvision.com>.
Hi,

    We are working in a network of search websites here in Brazil called
www.sitedebusca.com the complete list are in
http://www.servicodebusca.com/sitesdebusca.php and we add some patchs on
search.jsp to show the results in a
 simple XML format, to read in your own application actually write in PHP.
    We already are using nutch in a "beta" environment. We have plans to use
only nutch on a network of more than
50 regional search web sites.
    The code of search.jsp are in the next lines, i hope you can understand
my email and i hope this code are useful for
you.

Regards,
AtlasVision - Team

<%@ page
contentType="text/xml; charset=ISO-8859-1"
pageEncoding="ISO-8859-1"

import="javax.servlet.*"
import="javax.servlet.http.*"
import="java.io.*"
import="java.util.*"
import="java.net.*"

import="net.nutch.html.Entities"
import="net.nutch.searcher.*"
%><%

NutchBean bean = NutchBean.get(application);

// set the character encoding to use when interpreting request values
request.setCharacterEncoding("ISO-8859-1");

bean.LOG.info("query request from " + request.getRemoteAddr());

// get query from request
String queryString = request.getParameter("query");
if (queryString == null) queryString = "";

// first hit to display
int start = 0;
String startString = request.getParameter("start");
if (startString != null) start = Integer.parseInt(startString);

// number of hits to display
int hitsPerPage = 10;
String hitsString = request.getParameter("hitsPerPage");
if (hitsString != null) hitsPerPage = Integer.parseInt(hitsString);

// max hits per site
int hitsPerSite = 2;
String hitsPerSiteString = request.getParameter("hitsPerSite");
if (hitsPerSiteString != null) hitsPerSite =
Integer.parseInt(hitsPerSiteString);

Query query = Query.parse(queryString);
bean.LOG.info("query: " + queryString);

// perform query
// Hits hits = bean.search(query, start + 1000, hitsPerSite); // FIXME esta
linha estava provocando erros na query: linux
Hits hits = bean.search(query, start + hitsPerPage, hitsPerSite);

// Last hit in the page
int end = start + hitsPerPage - 1;
if (end > hits.getLength() - 1) end = hits.getLength() - 1;

// Total length in the page
int length = 0;

if (start < end)
    length = end - start + 1;

bean.LOG.info("total hits: " + hits.getTotal());

%><?xml version="1.0" encoding="ISO-8859-1"?>
<%
  // To prevent the character encoding declared with 'contentType' page
  // directive from being overriden by JSTL (apache i18n), we freeze it
  // by flushing the output buffer.
  // see
http://java.sun.com/developer/technicalArticles/Intl/MultilingualJSP/
  out.flush();
%>
<nutchSearch>
    <querystring><%=Entities.encode(queryString)%></querystring>

    <hitsInfo>
        <hitsPerPage><%=hitsPerPage%></hitsPerPage>
        <hitsPerSite><%=hitsPerSite%></hitsPerSite>
        <start><%=new Long(start)%></start>
        <end><%=new Long(end)%></end>
        <total><%=new Long(hits.getTotal())%></total>
        <totalIsExact><%=new Boolean(hits.totalIsExact())%></totalIsExact>
        <length><%=new Integer(hits.getLength())%></length>
        <lengthInPage><%=length%></lengthInPage>
    </hitsInfo>

<%
if (length > 0) {
%>
    <hitsData>
<%
    Hit[] show = hits.getHits(start, length);
    HitDetails[] details = bean.getDetails(show);
    String[] summaries = bean.getSummary(details, query);

    // display the hits
    for (int i = 0; i < length; i++) {

        Hit hit = show[i];
        HitDetails detail = details[i];
        String title = detail.getValue("title");
        String url = detail.getValue("url");
        String summary = summaries[i].replaceAll("([ \t\n\r]|&nbsp;){2,}", "
");
        String id = "idx=" + hit.getIndexNo() + "&id=" +
hit.getIndexDocNo();

        // use url for docs w/o title
        if (title == null || title.equals("")) title = url;
%>
        <hit>
            <title><![CDATA[<%=title%>]]></title>
            <summary><![CDATA[<%=summary%>]]></summary>
            <url><![CDATA[<%=url%>]]></url>
            <indexNo><%=hit.getIndexNo()%></indexNo>
            <docNo><%=hit.getIndexDocNo()%></docNo>
            <moreFromSite><%=(hit.moreFromSiteExcluded())%></moreFromSite>
            <site><![CDATA[<%=hit.getSite()%>]]></site>
        </hit>
<%
    }
%>
    </hitsData>
<%
}
%>

</nutchSearch>






----- Original Message -----
From: <lu...@uol.com.br>
To: <nu...@incubator.apache.org>
Sent: Sunday, April 10, 2005 12:06 PM
Subject: XML OUTPUT


Hi!

 Does anybody knows how to output search results in XML format?
 I would like to provide my data like Google/Yahoo do with their API's.

Thanks!