You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by ".: Abhishek :." <ab...@gmail.com> on 2011/01/28 06:56:12 UTC
Difference in hit count in nutch webapp and java code using nutch bean
Hi all,
I did a nutch crawl on a website and I am using the nutch war file to set
up a simple search interface in Tomcat server. When I use the nutch web
interface to search for a keyword, I see it first shows two results and says
"1-2(out of about 400 total matching pages)" when I hit on the "show all
hits" it shows all the results in paginated format.
Now, when I use the NutchBean to query for the same keyword, it just shows
me the number of hits as 2. The code is as follows,
public static void main(String[] args) {
String searchString = "food";
Configuration nutchConfig = null;
NutchBean nutchBean = null;
Query nutchQuery = null;
Hits nutchHits = null;
try{
nutchConfig = NutchConfiguration.create();
nutchBean = new NutchBean(nutchConfig);
nutchQuery = Query.parse(searchString, nutchConfig);
nutchHits = nutchBean.search(nutchQuery);
System.out.println("Hits : "+nutchHits.getLength());
nutchBean.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}finally{
}
}
Is there something wrong in the above code? why is it not showing the hits
as 400 and just shows as 2?
Thanks,
Abhishek
Re: Difference in hit count in nutch webapp and java code using nutch bean
Posted by ".: Abhishek :." <ab...@gmail.com>.
Hi Alexander,
Thanks a bunch. By the way what are the dedupes supposed to be?
./Abhi
On Fri, Jan 28, 2011 at 5:02 PM, Alexander Aristov <
alexander.aristov@gmail.com> wrote:
> Check your dedup params.
>
> It's ON by default and so it reduces number of total hits.
> Consider other nutchBean.search functions.
>
>
> Best Regards
> Alexander Aristov
>
>
> On 28 January 2011 08:56, .: Abhishek :. <ab...@gmail.com> wrote:
>
> > Hi all,
> >
> > I did a nutch crawl on a website and I am using the nutch war file to
> set
> > up a simple search interface in Tomcat server. When I use the nutch web
> > interface to search for a keyword, I see it first shows two results and
> > says
> > "1-2(out of about 400 total matching pages)" when I hit on the "show all
> > hits" it shows all the results in paginated format.
> >
> > Now, when I use the NutchBean to query for the same keyword, it just
> shows
> > me the number of hits as 2. The code is as follows,
> >
> > public static void main(String[] args) {
> >
> > String searchString = "food";
> > Configuration nutchConfig = null;
> > NutchBean nutchBean = null;
> > Query nutchQuery = null;
> > Hits nutchHits = null;
> > try{
> > nutchConfig = NutchConfiguration.create();
> > nutchBean = new NutchBean(nutchConfig);
> > nutchQuery = Query.parse(searchString, nutchConfig);
> > nutchHits = nutchBean.search(nutchQuery);
> > System.out.println("Hits : "+nutchHits.getLength());
> > nutchBean.close();
> > } catch (IOException e) {
> > // TODO Auto-generated catch block
> > e.printStackTrace();
> > }finally{
> >
> > }
> >
> > }
> >
> > Is there something wrong in the above code? why is it not showing the
> hits
> > as 400 and just shows as 2?
> >
> > Thanks,
> > Abhishek
> >
>
Re: Difference in hit count in nutch webapp and java code using nutch bean
Posted by Alexander Aristov <al...@gmail.com>.
Check your dedup params.
It's ON by default and so it reduces number of total hits.
Consider other nutchBean.search functions.
Best Regards
Alexander Aristov
On 28 January 2011 08:56, .: Abhishek :. <ab...@gmail.com> wrote:
> Hi all,
>
> I did a nutch crawl on a website and I am using the nutch war file to set
> up a simple search interface in Tomcat server. When I use the nutch web
> interface to search for a keyword, I see it first shows two results and
> says
> "1-2(out of about 400 total matching pages)" when I hit on the "show all
> hits" it shows all the results in paginated format.
>
> Now, when I use the NutchBean to query for the same keyword, it just shows
> me the number of hits as 2. The code is as follows,
>
> public static void main(String[] args) {
>
> String searchString = "food";
> Configuration nutchConfig = null;
> NutchBean nutchBean = null;
> Query nutchQuery = null;
> Hits nutchHits = null;
> try{
> nutchConfig = NutchConfiguration.create();
> nutchBean = new NutchBean(nutchConfig);
> nutchQuery = Query.parse(searchString, nutchConfig);
> nutchHits = nutchBean.search(nutchQuery);
> System.out.println("Hits : "+nutchHits.getLength());
> nutchBean.close();
> } catch (IOException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }finally{
>
> }
>
> }
>
> Is there something wrong in the above code? why is it not showing the hits
> as 400 and just shows as 2?
>
> Thanks,
> Abhishek
>