You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by xu nutch <nu...@gmail.com> on 2006/10/11 04:26:30 UTC

I can not query myplugin in field category:test

I have a question about myplugin for indexfilter and queryfilter.
Can u Help me !
-------------------------------------
MoreIndexingFilter.java in add
doc.add(new Field("category", "test", false, true, false));
-------------------------------------

--------------------------------------


package org.apache.nutch.searcher.more;

import org.apache.nutch.searcher.RawFieldQueryFilter;

/** Handles "category:" query clauses, causing them to search the
field indexed by
 * BasicIndexingFilter. */
public class CategoryQueryFilter extends RawFieldQueryFilter {
 public CategoryQueryFilter() {
   super("category");
 }
}
-----------------------------------------------
-----------------------------------------------

<property>
 <name>plugin.includes</name>
 <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value>
 <description>Regular expression naming plugin directory names to
 include.  Any plugin not matching this expression is excluded.
 In any case you need at least include the nutch-extensionpoints plugin. By
 default Nutch includes crawling just HTML and plain text via HTTP,
 and basic indexing and search plugins.
 </description>
</property>

<property>
 <name>plugin.includes</name>
 <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value>
 <description>Regular expression naming plugin directory names to
 include.  Any plugin not matching this expression is excluded.
 In any case you need at least include the nutch-extensionpoints plugin. By
 default Nutch includes crawling just HTML and plain text via HTTP,
 and basic indexing and search plugins.
 </description>
</property>
-----------------------------------------------

I use luke to query "category:test" is ok!
but I use tomcat webstie to query "category:test" ,
no return result.

Re: I can not query myplugin in field category:test

Posted by Stefan Neufeind <ap...@stefan-neufeind.de>.
Please do share it. I'd appreciate it, and I guess a lot of others as
well. And I bet it could even be enhanced by the community. :-)


Regards,
 Stefan

Ernesto De Santis wrote:
> I did a url-category-indexer.
> 
> It works with a .properties file that map urls writed as regexp and
> categories.
> example:
> 
> http://www.misite.com/videos/.*=videos
> 
> If it seems useful, I can share it.
> 
> Maybe, it could be better config it in a .xml file.
> 
> Regards,
> Ernesto.
> 
> Stefan Neufeind escribió:
>> Alvaro Cabrerizo wrote:
>>  
>>> Have you included a node to describe your new searcher filter into
>>> plugin.xml?
>>>
>>> 2006/10/11, xu nutch <nu...@gmail.com>:
>>>    
>>>> I have a question about myplugin for indexfilter and queryfilter.
>>>> Can u Help me !
>>>> -------------------------------------
>>>> MoreIndexingFilter.java in add
>>>> doc.add(new Field("category", "test", false, true, false));
>>>> -------------------------------------
>>>>
>>>> --------------------------------------
>>>>
>>>>
>>>> package org.apache.nutch.searcher.more;
>>>>
>>>> import org.apache.nutch.searcher.RawFieldQueryFilter;
>>>>
>>>> /** Handles "category:" query clauses, causing them to search the
>>>> field indexed by
>>>>  * BasicIndexingFilter. */
>>>> public class CategoryQueryFilter extends RawFieldQueryFilter {
>>>>  public CategoryQueryFilter() {
>>>>    super("category");
>>>>  }
>>>> }
>>>> -----------------------------------------------
>>>> -----------------------------------------------
>>>>
>>>> <property>
>>>>  <name>plugin.includes</name>
>>>> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value>
>>>>
>>>>
>>>>  <description>Regular expression naming plugin directory names to
>>>>  include.  Any plugin not matching this expression is excluded.
>>>>  In any case you need at least include the nutch-extensionpoints
>>>> plugin. By
>>>>  default Nutch includes crawling just HTML and plain text via HTTP,
>>>>  and basic indexing and search plugins.
>>>>  </description>
>>>> </property>
>>>>
>>>> <property>
>>>>  <name>plugin.includes</name>
>>>> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value>
>>>>
>>>>
>>>>  <description>Regular expression naming plugin directory names to
>>>>  include.  Any plugin not matching this expression is excluded.
>>>>  In any case you need at least include the nutch-extensionpoints
>>>> plugin. By
>>>>  default Nutch includes crawling just HTML and plain text via HTTP,
>>>>  and basic indexing and search plugins.
>>>>  </description>
>>>> </property>
>>>> -----------------------------------------------
>>>>
>>>> I use luke to query "category:test" is ok!
>>>> but I use tomcat webstie to query "category:test" ,
>>>> no return result.
>>>>       
>>
>> In case you get the search working:
>> How do you plan to categorize URLs/sites? I'm looking for a solution
>> there, since I didn't yet manage to implement something
>> URL-prefix-filter based to map categories to URLs or so.
>>
>>
>> Regards,
>>  Stefan

Re: I can not query myplugin in field category:test

Posted by Ernesto De Santis <de...@yahoo.com.ar>.
Hi Alvaro

Very good, it seems can be extended to support both strategies, current 
and regexp.
Maybe using another tag to define url expressions.

                .....
               <whitelistexp>
                       http://www.categorySite.es/setcionA(.)*sectionB(.)*
               </whitelistexp>


Thanks,
Ernesto.

Alvaro Cabrerizo escribió:
> You can use the existing "subcollection" plugin in nutch 0.8.X and 
> extend it
> to use regular expressions. Basically you have to modify the class
> org.apache.nutch.collection.Subcollection. Change the method filter 
> (lines
> 146 154) and substitute if(urlString.indexOf(row) =! -1) with somethig 
> like
> if(Pattern.matches(row, urlString)).
>
> This approach lets you:
>
> -Use the existing file subcollection.xml to define your
> url-expression/categories
> -Use the package java.util.regex to define matching urls
>
>
> Here is a sample of subcollection.xml, after modifying subcollection 
> plugin.
>
> <subcollection>
>                <name>myCategory</name>
>                <id>myCategory</id>
>                <whitelist>
>                        
> http://www.categorySite.es/setcionA(.)*sectionB(.)*
>                </whitelist>
>                <blacklist>
>
> http://www.categorySite.es/setcionA(.)*sectionB(.)*sectionC(.)*
>                </blacklist>
> </subcollection>
>
>
>
>
> 2006/10/14, Ernesto De Santis <de...@yahoo.com.ar>:
>>
>> Hi Chad
>>
>> The link was a configuration example.
>>
>> more explained example:
>> http://www.misite.com/videos/.*=videos  (rule A)
>>
>> if the url fetched match which rule A, then index a Field named =
>> 'category' with value = 'videos'.
>>
>> Later you can search over this field category to filter yours searches.
>>
>> I will send this plugin in another new thread mail. I post the plugin
>> here, in the list. I don't know another way to share it with you.
>>
>> Regards
>> Ernesto.
>>
>>
>>
>>
>>
>> csavage@activeathletemedia.com escribió:
>> > couldn't get the link to work but yes if you could share that would be
>> > great.
>> >
>> > Chad Savage
>> >
>> >
>> >
>> >
>> > Ernesto De Santis wrote:
>> >> I did a url-category-indexer.
>> >>
>> >> It works with a .properties file that map urls writed as regexp and
>> >> categories.
>> >> example:
>> >>
>> >> http://www.misite.com/videos/.*=videos
>> >>
>> >> If it seems useful, I can share it.
>> >>
>> >> Maybe, it could be better config it in a .xml file.
>> >>
>> >> Regards,
>> >> Ernesto.
>> >>
>> >> Stefan Neufeind escribió:
>> >>> Alvaro Cabrerizo wrote:
>> >>>
>> >>>> Have you included a node to describe your new searcher filter into
>> >>>> plugin.xml?
>> >>>>
>> >>>> 2006/10/11, xu nutch <nu...@gmail.com>:
>> >>>>
>> >>>>> I have a question about myplugin for indexfilter and queryfilter.
>> >>>>> Can u Help me !
>> >>>>> -------------------------------------
>> >>>>> MoreIndexingFilter.java in add
>> >>>>> doc.add(new Field("category", "test", false, true, false));
>> >>>>> -------------------------------------
>> >>>>>
>> >>>>> --------------------------------------
>> >>>>>
>> >>>>>
>> >>>>> package org.apache.nutch.searcher.more;
>> >>>>>
>> >>>>> import org.apache.nutch.searcher.RawFieldQueryFilter;
>> >>>>>
>> >>>>> /** Handles "category:" query clauses, causing them to search the
>> >>>>> field indexed by
>> >>>>>  * BasicIndexingFilter. */
>> >>>>> public class CategoryQueryFilter extends RawFieldQueryFilter {
>> >>>>>  public CategoryQueryFilter() {
>> >>>>>    super("category");
>> >>>>>  }
>> >>>>> }
>> >>>>> -----------------------------------------------
>> >>>>> -----------------------------------------------
>> >>>>>
>> >>>>> <property>
>> >>>>>  <name>plugin.includes</name>
>> >>>>>
>> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value> 
>>
>> >>>>>
>> >>>>>
>> >>>>>  <description>Regular expression naming plugin directory names to
>> >>>>>  include.  Any plugin not matching this expression is excluded.
>> >>>>>  In any case you need at least include the nutch-extensionpoints
>> >>>>> plugin. By
>> >>>>>  default Nutch includes crawling just HTML and plain text via 
>> HTTP,
>> >>>>>  and basic indexing and search plugins.
>> >>>>>  </description>
>> >>>>> </property>
>> >>>>>
>> >>>>> <property>
>> >>>>>  <name>plugin.includes</name>
>> >>>>>
>> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value> 
>>
>> >>>>>
>> >>>>>
>> >>>>>  <description>Regular expression naming plugin directory names to
>> >>>>>  include.  Any plugin not matching this expression is excluded.
>> >>>>>  In any case you need at least include the nutch-extensionpoints
>> >>>>> plugin. By
>> >>>>>  default Nutch includes crawling just HTML and plain text via 
>> HTTP,
>> >>>>>  and basic indexing and search plugins.
>> >>>>>  </description>
>> >>>>> </property>
>> >>>>> -----------------------------------------------
>> >>>>>
>> >>>>> I use luke to query "category:test" is ok!
>> >>>>> but I use tomcat webstie to query "category:test" ,
>> >>>>> no return result.
>> >>>>>
>> >>>
>> >>> In case you get the search working:
>> >>> How do you plan to categorize URLs/sites? I'm looking for a solution
>> >>> there, since I didn't yet manage to implement something
>> >>> URL-prefix-filter based to map categories to URLs or so.
>> >>>
>> >>>
>> >>> Regards,
>> >>>  Stefan
>> >>>
>> >>>
>> >>>
>> >>
>> >>                __________________________________________________
>> >> Preguntá. Respondé. Descubrí.
>> >> Todo lo que querías saber, y lo que ni imaginabas,
>> >> está en Yahoo! Respuestas (Beta).
>> >> ¡Probalo ya! http://www.yahoo.com.ar/respuestas
>> >>
>> >>
>> >
>>
>>
>>
>>
>> __________________________________________________
>> Preguntá. Respondé. Descubrí.
>> Todo lo que querías saber, y lo que ni imaginabas,
>> está en Yahoo! Respuestas (Beta).
>> ¡Probalo ya!
>> http://www.yahoo.com.ar/respuestas
>>
>>
>

	
	
		
__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya! 
http://www.yahoo.com.ar/respuestas


Re: I can not query myplugin in field category:test

Posted by Alvaro Cabrerizo <to...@gmail.com>.
You can use the existing "subcollection" plugin in nutch 0.8.X and extend it
to use regular expressions. Basically you have to modify the class
org.apache.nutch.collection.Subcollection. Change the method filter (lines
146 154) and substitute if(urlString.indexOf(row) =! -1) with somethig like
if(Pattern.matches(row, urlString)).

This approach lets you:

 -Use the existing file subcollection.xml to define your
url-expression/categories
 -Use the package java.util.regex to define matching urls


Here is a sample of subcollection.xml, after modifying subcollection plugin.

<subcollection>
                <name>myCategory</name>
                <id>myCategory</id>
                <whitelist>
                        http://www.categorySite.es/setcionA(.)*sectionB(.)*
                </whitelist>
                <blacklist>

http://www.categorySite.es/setcionA(.)*sectionB(.)*sectionC(.)*
                </blacklist>
 </subcollection>




2006/10/14, Ernesto De Santis <de...@yahoo.com.ar>:
>
> Hi Chad
>
> The link was a configuration example.
>
> more explained example:
> http://www.misite.com/videos/.*=videos  (rule A)
>
> if the url fetched match which rule A, then index a Field named =
> 'category' with value = 'videos'.
>
> Later you can search over this field category to filter yours searches.
>
> I will send this plugin in another new thread mail. I post the plugin
> here, in the list. I don't know another way to share it with you.
>
> Regards
> Ernesto.
>
>
>
>
>
> csavage@activeathletemedia.com escribió:
> > couldn't get the link to work but yes if you could share that would be
> > great.
> >
> > Chad Savage
> >
> >
> >
> >
> > Ernesto De Santis wrote:
> >> I did a url-category-indexer.
> >>
> >> It works with a .properties file that map urls writed as regexp and
> >> categories.
> >> example:
> >>
> >> http://www.misite.com/videos/.*=videos
> >>
> >> If it seems useful, I can share it.
> >>
> >> Maybe, it could be better config it in a .xml file.
> >>
> >> Regards,
> >> Ernesto.
> >>
> >> Stefan Neufeind escribió:
> >>> Alvaro Cabrerizo wrote:
> >>>
> >>>> Have you included a node to describe your new searcher filter into
> >>>> plugin.xml?
> >>>>
> >>>> 2006/10/11, xu nutch <nu...@gmail.com>:
> >>>>
> >>>>> I have a question about myplugin for indexfilter and queryfilter.
> >>>>> Can u Help me !
> >>>>> -------------------------------------
> >>>>> MoreIndexingFilter.java in add
> >>>>> doc.add(new Field("category", "test", false, true, false));
> >>>>> -------------------------------------
> >>>>>
> >>>>> --------------------------------------
> >>>>>
> >>>>>
> >>>>> package org.apache.nutch.searcher.more;
> >>>>>
> >>>>> import org.apache.nutch.searcher.RawFieldQueryFilter;
> >>>>>
> >>>>> /** Handles "category:" query clauses, causing them to search the
> >>>>> field indexed by
> >>>>>  * BasicIndexingFilter. */
> >>>>> public class CategoryQueryFilter extends RawFieldQueryFilter {
> >>>>>  public CategoryQueryFilter() {
> >>>>>    super("category");
> >>>>>  }
> >>>>> }
> >>>>> -----------------------------------------------
> >>>>> -----------------------------------------------
> >>>>>
> >>>>> <property>
> >>>>>  <name>plugin.includes</name>
> >>>>>
> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value>
> >>>>>
> >>>>>
> >>>>>  <description>Regular expression naming plugin directory names to
> >>>>>  include.  Any plugin not matching this expression is excluded.
> >>>>>  In any case you need at least include the nutch-extensionpoints
> >>>>> plugin. By
> >>>>>  default Nutch includes crawling just HTML and plain text via HTTP,
> >>>>>  and basic indexing and search plugins.
> >>>>>  </description>
> >>>>> </property>
> >>>>>
> >>>>> <property>
> >>>>>  <name>plugin.includes</name>
> >>>>>
> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value>
> >>>>>
> >>>>>
> >>>>>  <description>Regular expression naming plugin directory names to
> >>>>>  include.  Any plugin not matching this expression is excluded.
> >>>>>  In any case you need at least include the nutch-extensionpoints
> >>>>> plugin. By
> >>>>>  default Nutch includes crawling just HTML and plain text via HTTP,
> >>>>>  and basic indexing and search plugins.
> >>>>>  </description>
> >>>>> </property>
> >>>>> -----------------------------------------------
> >>>>>
> >>>>> I use luke to query "category:test" is ok!
> >>>>> but I use tomcat webstie to query "category:test" ,
> >>>>> no return result.
> >>>>>
> >>>
> >>> In case you get the search working:
> >>> How do you plan to categorize URLs/sites? I'm looking for a solution
> >>> there, since I didn't yet manage to implement something
> >>> URL-prefix-filter based to map categories to URLs or so.
> >>>
> >>>
> >>> Regards,
> >>>  Stefan
> >>>
> >>>
> >>>
> >>
> >>                __________________________________________________
> >> Preguntá. Respondé. Descubrí.
> >> Todo lo que querías saber, y lo que ni imaginabas,
> >> está en Yahoo! Respuestas (Beta).
> >> ¡Probalo ya! http://www.yahoo.com.ar/respuestas
> >>
> >>
> >
>
>
>
>
> __________________________________________________
> Preguntá. Respondé. Descubrí.
> Todo lo que querías saber, y lo que ni imaginabas,
> está en Yahoo! Respuestas (Beta).
> ¡Probalo ya!
> http://www.yahoo.com.ar/respuestas
>
>

Re: I can not query myplugin in field category:test

Posted by Ernesto De Santis <de...@yahoo.com.ar>.
Hi Chad

The link was a configuration example.

more explained example:
http://www.misite.com/videos/.*=videos  (rule A)

if the url fetched match which rule A, then index a Field named = 
'category' with value = 'videos'.

Later you can search over this field category to filter yours searches.

I will send this plugin in another new thread mail. I post the plugin 
here, in the list. I don't know another way to share it with you.

Regards
Ernesto.





csavage@activeathletemedia.com escribió:
> couldn't get the link to work but yes if you could share that would be 
> great.
>
> Chad Savage
>
>
>
>
> Ernesto De Santis wrote:
>> I did a url-category-indexer.
>>
>> It works with a .properties file that map urls writed as regexp and 
>> categories.
>> example:
>>
>> http://www.misite.com/videos/.*=videos
>>
>> If it seems useful, I can share it.
>>
>> Maybe, it could be better config it in a .xml file.
>>
>> Regards,
>> Ernesto.
>>
>> Stefan Neufeind escribió:
>>> Alvaro Cabrerizo wrote:
>>>  
>>>> Have you included a node to describe your new searcher filter into
>>>> plugin.xml?
>>>>
>>>> 2006/10/11, xu nutch <nu...@gmail.com>:
>>>>   
>>>>> I have a question about myplugin for indexfilter and queryfilter.
>>>>> Can u Help me !
>>>>> -------------------------------------
>>>>> MoreIndexingFilter.java in add
>>>>> doc.add(new Field("category", "test", false, true, false));
>>>>> -------------------------------------
>>>>>
>>>>> --------------------------------------
>>>>>
>>>>>
>>>>> package org.apache.nutch.searcher.more;
>>>>>
>>>>> import org.apache.nutch.searcher.RawFieldQueryFilter;
>>>>>
>>>>> /** Handles "category:" query clauses, causing them to search the
>>>>> field indexed by
>>>>>  * BasicIndexingFilter. */
>>>>> public class CategoryQueryFilter extends RawFieldQueryFilter {
>>>>>  public CategoryQueryFilter() {
>>>>>    super("category");
>>>>>  }
>>>>> }
>>>>> -----------------------------------------------
>>>>> -----------------------------------------------
>>>>>
>>>>> <property>
>>>>>  <name>plugin.includes</name>
>>>>> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value> 
>>>>>
>>>>>
>>>>>  <description>Regular expression naming plugin directory names to
>>>>>  include.  Any plugin not matching this expression is excluded.
>>>>>  In any case you need at least include the nutch-extensionpoints
>>>>> plugin. By
>>>>>  default Nutch includes crawling just HTML and plain text via HTTP,
>>>>>  and basic indexing and search plugins.
>>>>>  </description>
>>>>> </property>
>>>>>
>>>>> <property>
>>>>>  <name>plugin.includes</name>
>>>>> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value> 
>>>>>
>>>>>
>>>>>  <description>Regular expression naming plugin directory names to
>>>>>  include.  Any plugin not matching this expression is excluded.
>>>>>  In any case you need at least include the nutch-extensionpoints
>>>>> plugin. By
>>>>>  default Nutch includes crawling just HTML and plain text via HTTP,
>>>>>  and basic indexing and search plugins.
>>>>>  </description>
>>>>> </property>
>>>>> -----------------------------------------------
>>>>>
>>>>> I use luke to query "category:test" is ok!
>>>>> but I use tomcat webstie to query "category:test" ,
>>>>> no return result.
>>>>>       
>>>
>>> In case you get the search working:
>>> How do you plan to categorize URLs/sites? I'm looking for a solution
>>> there, since I didn't yet manage to implement something
>>> URL-prefix-filter based to map categories to URLs or so.
>>>
>>>
>>> Regards,
>>>  Stefan
>>>
>>>
>>>   
>>
>>                __________________________________________________
>> Preguntá. Respondé. Descubrí.
>> Todo lo que querías saber, y lo que ni imaginabas,
>> está en Yahoo! Respuestas (Beta).
>> ¡Probalo ya! http://www.yahoo.com.ar/respuestas
>>
>>
>

	
	
		
__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya! 
http://www.yahoo.com.ar/respuestas


Re: I can not query myplugin in field category:test

Posted by "csavage@activeathletemedia.com" <cs...@activeathletemedia.com>.
couldn't get the link to work but yes if you could share that would be 
great.

Chad Savage




Ernesto De Santis wrote:
> I did a url-category-indexer.
>
> It works with a .properties file that map urls writed as regexp and 
> categories.
> example:
>
> http://www.misite.com/videos/.*=videos
>
> If it seems useful, I can share it.
>
> Maybe, it could be better config it in a .xml file.
>
> Regards,
> Ernesto.
>
> Stefan Neufeind escribió:
>> Alvaro Cabrerizo wrote:
>>  
>>> Have you included a node to describe your new searcher filter into
>>> plugin.xml?
>>>
>>> 2006/10/11, xu nutch <nu...@gmail.com>:
>>>    
>>>> I have a question about myplugin for indexfilter and queryfilter.
>>>> Can u Help me !
>>>> -------------------------------------
>>>> MoreIndexingFilter.java in add
>>>> doc.add(new Field("category", "test", false, true, false));
>>>> -------------------------------------
>>>>
>>>> --------------------------------------
>>>>
>>>>
>>>> package org.apache.nutch.searcher.more;
>>>>
>>>> import org.apache.nutch.searcher.RawFieldQueryFilter;
>>>>
>>>> /** Handles "category:" query clauses, causing them to search the
>>>> field indexed by
>>>>  * BasicIndexingFilter. */
>>>> public class CategoryQueryFilter extends RawFieldQueryFilter {
>>>>  public CategoryQueryFilter() {
>>>>    super("category");
>>>>  }
>>>> }
>>>> -----------------------------------------------
>>>> -----------------------------------------------
>>>>
>>>> <property>
>>>>  <name>plugin.includes</name>
>>>> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value> 
>>>>
>>>>
>>>>  <description>Regular expression naming plugin directory names to
>>>>  include.  Any plugin not matching this expression is excluded.
>>>>  In any case you need at least include the nutch-extensionpoints
>>>> plugin. By
>>>>  default Nutch includes crawling just HTML and plain text via HTTP,
>>>>  and basic indexing and search plugins.
>>>>  </description>
>>>> </property>
>>>>
>>>> <property>
>>>>  <name>plugin.includes</name>
>>>> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value> 
>>>>
>>>>
>>>>  <description>Regular expression naming plugin directory names to
>>>>  include.  Any plugin not matching this expression is excluded.
>>>>  In any case you need at least include the nutch-extensionpoints
>>>> plugin. By
>>>>  default Nutch includes crawling just HTML and plain text via HTTP,
>>>>  and basic indexing and search plugins.
>>>>  </description>
>>>> </property>
>>>> -----------------------------------------------
>>>>
>>>> I use luke to query "category:test" is ok!
>>>> but I use tomcat webstie to query "category:test" ,
>>>> no return result.
>>>>       
>>
>> In case you get the search working:
>> How do you plan to categorize URLs/sites? I'm looking for a solution
>> there, since I didn't yet manage to implement something
>> URL-prefix-filter based to map categories to URLs or so.
>>
>>
>> Regards,
>>  Stefan
>>
>>
>>   
>
>     
>     
>        
> __________________________________________________
> Preguntá. Respondé. Descubrí.
> Todo lo que querías saber, y lo que ni imaginabas,
> está en Yahoo! Respuestas (Beta).
> ¡Probalo ya! http://www.yahoo.com.ar/respuestas
>
>

Re: I can not query myplugin in field category:test

Posted by Ernesto De Santis <de...@yahoo.com.ar>.
I did a url-category-indexer.

It works with a .properties file that map urls writed as regexp and 
categories.
example:

http://www.misite.com/videos/.*=videos

If it seems useful, I can share it.

Maybe, it could be better config it in a .xml file.

Regards,
Ernesto.

Stefan Neufeind escribió:
> Alvaro Cabrerizo wrote:
>   
>> Have you included a node to describe your new searcher filter into
>> plugin.xml?
>>
>> 2006/10/11, xu nutch <nu...@gmail.com>:
>>     
>>> I have a question about myplugin for indexfilter and queryfilter.
>>> Can u Help me !
>>> -------------------------------------
>>> MoreIndexingFilter.java in add
>>> doc.add(new Field("category", "test", false, true, false));
>>> -------------------------------------
>>>
>>> --------------------------------------
>>>
>>>
>>> package org.apache.nutch.searcher.more;
>>>
>>> import org.apache.nutch.searcher.RawFieldQueryFilter;
>>>
>>> /** Handles "category:" query clauses, causing them to search the
>>> field indexed by
>>>  * BasicIndexingFilter. */
>>> public class CategoryQueryFilter extends RawFieldQueryFilter {
>>>  public CategoryQueryFilter() {
>>>    super("category");
>>>  }
>>> }
>>> -----------------------------------------------
>>> -----------------------------------------------
>>>
>>> <property>
>>>  <name>plugin.includes</name>
>>> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value>
>>>
>>>  <description>Regular expression naming plugin directory names to
>>>  include.  Any plugin not matching this expression is excluded.
>>>  In any case you need at least include the nutch-extensionpoints
>>> plugin. By
>>>  default Nutch includes crawling just HTML and plain text via HTTP,
>>>  and basic indexing and search plugins.
>>>  </description>
>>> </property>
>>>
>>> <property>
>>>  <name>plugin.includes</name>
>>> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value>
>>>
>>>  <description>Regular expression naming plugin directory names to
>>>  include.  Any plugin not matching this expression is excluded.
>>>  In any case you need at least include the nutch-extensionpoints
>>> plugin. By
>>>  default Nutch includes crawling just HTML and plain text via HTTP,
>>>  and basic indexing and search plugins.
>>>  </description>
>>> </property>
>>> -----------------------------------------------
>>>
>>> I use luke to query "category:test" is ok!
>>> but I use tomcat webstie to query "category:test" ,
>>> no return result.
>>>       
>
> In case you get the search working:
> How do you plan to categorize URLs/sites? I'm looking for a solution
> there, since I didn't yet manage to implement something
> URL-prefix-filter based to map categories to URLs or so.
>
>
> Regards,
>  Stefan
>
>
>   

	
	
		
__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya! 
http://www.yahoo.com.ar/respuestas


Re: I can not query myplugin in field category:test

Posted by Stefan Neufeind <ap...@stefan-neufeind.de>.
Alvaro Cabrerizo wrote:
> Have you included a node to describe your new searcher filter into
> plugin.xml?
> 
> 2006/10/11, xu nutch <nu...@gmail.com>:
>> I have a question about myplugin for indexfilter and queryfilter.
>> Can u Help me !
>> -------------------------------------
>> MoreIndexingFilter.java in add
>> doc.add(new Field("category", "test", false, true, false));
>> -------------------------------------
>>
>> --------------------------------------
>>
>>
>> package org.apache.nutch.searcher.more;
>>
>> import org.apache.nutch.searcher.RawFieldQueryFilter;
>>
>> /** Handles "category:" query clauses, causing them to search the
>> field indexed by
>>  * BasicIndexingFilter. */
>> public class CategoryQueryFilter extends RawFieldQueryFilter {
>>  public CategoryQueryFilter() {
>>    super("category");
>>  }
>> }
>> -----------------------------------------------
>> -----------------------------------------------
>>
>> <property>
>>  <name>plugin.includes</name>
>> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value>
>>
>>  <description>Regular expression naming plugin directory names to
>>  include.  Any plugin not matching this expression is excluded.
>>  In any case you need at least include the nutch-extensionpoints
>> plugin. By
>>  default Nutch includes crawling just HTML and plain text via HTTP,
>>  and basic indexing and search plugins.
>>  </description>
>> </property>
>>
>> <property>
>>  <name>plugin.includes</name>
>> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value>
>>
>>  <description>Regular expression naming plugin directory names to
>>  include.  Any plugin not matching this expression is excluded.
>>  In any case you need at least include the nutch-extensionpoints
>> plugin. By
>>  default Nutch includes crawling just HTML and plain text via HTTP,
>>  and basic indexing and search plugins.
>>  </description>
>> </property>
>> -----------------------------------------------
>>
>> I use luke to query "category:test" is ok!
>> but I use tomcat webstie to query "category:test" ,
>> no return result.

In case you get the search working:
How do you plan to categorize URLs/sites? I'm looking for a solution
there, since I didn't yet manage to implement something
URL-prefix-filter based to map categories to URLs or so.


Regards,
 Stefan

Re: I can not query myplugin in field category:test

Posted by Alvaro Cabrerizo <to...@gmail.com>.
Have you included a node to describe your new searcher filter into plugin.xml?

2006/10/11, xu nutch <nu...@gmail.com>:
> I have a question about myplugin for indexfilter and queryfilter.
> Can u Help me !
> -------------------------------------
> MoreIndexingFilter.java in add
> doc.add(new Field("category", "test", false, true, false));
> -------------------------------------
>
> --------------------------------------
>
>
> package org.apache.nutch.searcher.more;
>
> import org.apache.nutch.searcher.RawFieldQueryFilter;
>
> /** Handles "category:" query clauses, causing them to search the
> field indexed by
>  * BasicIndexingFilter. */
> public class CategoryQueryFilter extends RawFieldQueryFilter {
>  public CategoryQueryFilter() {
>    super("category");
>  }
> }
> -----------------------------------------------
> -----------------------------------------------
>
> <property>
>  <name>plugin.includes</name>
> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value>
>  <description>Regular expression naming plugin directory names to
>  include.  Any plugin not matching this expression is excluded.
>  In any case you need at least include the nutch-extensionpoints plugin. By
>  default Nutch includes crawling just HTML and plain text via HTTP,
>  and basic indexing and search plugins.
>  </description>
> </property>
>
> <property>
>  <name>plugin.includes</name>
> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value>
>  <description>Regular expression naming plugin directory names to
>  include.  Any plugin not matching this expression is excluded.
>  In any case you need at least include the nutch-extensionpoints plugin. By
>  default Nutch includes crawling just HTML and plain text via HTTP,
>  and basic indexing and search plugins.
>  </description>
> </property>
> -----------------------------------------------
>
> I use luke to query "category:test" is ok!
> but I use tomcat webstie to query "category:test" ,
> no return result.
>