You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by ".: Abhishek :." <ab...@gmail.com> on 2011/01/28 06:56:12 UTC

Difference in hit count in nutch webapp and java code using nutch bean

Hi all,

 I did a nutch crawl on a website and I am using the nutch war file to set
up a simple search interface in Tomcat server. When I use the nutch web
interface to search for a keyword, I see it first shows two results and says
"1-2(out of about 400 total matching pages)" when I hit on the "show all
hits" it shows all the results in paginated format.

 Now, when I use the NutchBean to query for the same keyword, it just shows
me the number of hits as 2. The code is as follows,

 public static void main(String[] args) {

        String searchString = "food";
        Configuration nutchConfig = null;
        NutchBean nutchBean = null;
        Query nutchQuery = null;
        Hits nutchHits = null;
        try{
            nutchConfig = NutchConfiguration.create();
            nutchBean = new NutchBean(nutchConfig);
            nutchQuery = Query.parse(searchString, nutchConfig);
            nutchHits = nutchBean.search(nutchQuery);
            System.out.println("Hits : "+nutchHits.getLength());
            nutchBean.close();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }finally{

        }

    }

 Is there something wrong in the above code? why is it not showing the hits
as 400 and just shows as 2?

Thanks,
Abhishek

Re: Difference in hit count in nutch webapp and java code using nutch bean

Posted by ".: Abhishek :." <ab...@gmail.com>.
Hi Alexander,

 Thanks a bunch. By the way what are the dedupes supposed to be?

./Abhi

On Fri, Jan 28, 2011 at 5:02 PM, Alexander Aristov <
alexander.aristov@gmail.com> wrote:

> Check your dedup params.
>
> It's ON by default and so it reduces number of total hits.
> Consider other nutchBean.search functions.
>
>
> Best Regards
> Alexander Aristov
>
>
> On 28 January 2011 08:56, .: Abhishek :. <ab...@gmail.com> wrote:
>
> > Hi all,
> >
> >  I did a nutch crawl on a website and I am using the nutch war file to
> set
> > up a simple search interface in Tomcat server. When I use the nutch web
> > interface to search for a keyword, I see it first shows two results and
> > says
> > "1-2(out of about 400 total matching pages)" when I hit on the "show all
> > hits" it shows all the results in paginated format.
> >
> >  Now, when I use the NutchBean to query for the same keyword, it just
> shows
> > me the number of hits as 2. The code is as follows,
> >
> >  public static void main(String[] args) {
> >
> >        String searchString = "food";
> >        Configuration nutchConfig = null;
> >        NutchBean nutchBean = null;
> >        Query nutchQuery = null;
> >        Hits nutchHits = null;
> >        try{
> >            nutchConfig = NutchConfiguration.create();
> >            nutchBean = new NutchBean(nutchConfig);
> >            nutchQuery = Query.parse(searchString, nutchConfig);
> >            nutchHits = nutchBean.search(nutchQuery);
> >            System.out.println("Hits : "+nutchHits.getLength());
> >            nutchBean.close();
> >        } catch (IOException e) {
> >            // TODO Auto-generated catch block
> >            e.printStackTrace();
> >        }finally{
> >
> >        }
> >
> >    }
> >
> >  Is there something wrong in the above code? why is it not showing the
> hits
> > as 400 and just shows as 2?
> >
> > Thanks,
> > Abhishek
> >
>

Re: Difference in hit count in nutch webapp and java code using nutch bean

Posted by Alexander Aristov <al...@gmail.com>.
Check your dedup params.

It's ON by default and so it reduces number of total hits.
Consider other nutchBean.search functions.


Best Regards
Alexander Aristov


On 28 January 2011 08:56, .: Abhishek :. <ab...@gmail.com> wrote:

> Hi all,
>
>  I did a nutch crawl on a website and I am using the nutch war file to set
> up a simple search interface in Tomcat server. When I use the nutch web
> interface to search for a keyword, I see it first shows two results and
> says
> "1-2(out of about 400 total matching pages)" when I hit on the "show all
> hits" it shows all the results in paginated format.
>
>  Now, when I use the NutchBean to query for the same keyword, it just shows
> me the number of hits as 2. The code is as follows,
>
>  public static void main(String[] args) {
>
>        String searchString = "food";
>        Configuration nutchConfig = null;
>        NutchBean nutchBean = null;
>        Query nutchQuery = null;
>        Hits nutchHits = null;
>        try{
>            nutchConfig = NutchConfiguration.create();
>            nutchBean = new NutchBean(nutchConfig);
>            nutchQuery = Query.parse(searchString, nutchConfig);
>            nutchHits = nutchBean.search(nutchQuery);
>            System.out.println("Hits : "+nutchHits.getLength());
>            nutchBean.close();
>        } catch (IOException e) {
>            // TODO Auto-generated catch block
>            e.printStackTrace();
>        }finally{
>
>        }
>
>    }
>
>  Is there something wrong in the above code? why is it not showing the hits
> as 400 and just shows as 2?
>
> Thanks,
> Abhishek
>