You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@jspwiki.apache.org by Rolf Schumacher <ma...@august.de> on 2011/01/10 17:16:47 UTC

searching

I am using JSPWiki 2.8.4

Is it possible to extend a search to attachments to some mime types, 
e.g. pdf?

Is it possible to extend a search to the comments given to an attachment?

kind regards

Rolf

Re: searching

Posted by Harry Metske <ha...@gmail.com>.
fixed in 3.0.0-svn-224 and 2.8.5-svn-5



2011/1/18 Rolf Schumacher <ma...@august.de>

> Yes, sounds great, Harry.
>
> The function getAttachmentContent(Attachment) is called whenever setupTask
> is executed.
>
> It would be another functionality to feed Lucene just after attachment gets
> ready, a good idea.
>
> What I meant is to make the text conversion dependent on the MIME type of
> the attachment instead of the filename extensions, however this is not
> really important in the first place.
>
> I would like to go after this immediately, however, due to overload in
> other areas, this will take a while. I will come back asap because
> accumulated knowledge is not only in wiki pages but in attachments as well.
>
> Rolf
>
>
> On 14.01.2011 20:30, Harry Metske wrote:
>
>> making a filter that processes "non plain text"  files like the ones you
>> mentioned sounds good.
>> If I understand it correctly it should be called when adding an
>> attachment,
>> it should process the file creating searchable text and hand them off to
>> lucene for indexing right ?
>> please also consider a unit test for it.
>>
>> adding a few more file-types for pure text files is a good quick-win,
>> starting with .mm .htm .xhtml .java .c .cpp .php .asm .sh .properties .kml
>> .gpx .loc
>>
>> anyone else opinions, suggestions ?
>>
>> regards,
>> Harry
>>
>> 2011/1/13 Rolf Schumacher<ma...@august.de>
>>
>>
>>
>>> ok, Harry, thank you for the link.
>>>
>>> My suggestions, please correct:
>>>
>>> - hard-coding of file types seems to me as not a problem: anything shall
>>> be
>>> searched
>>> - the list is too short, important types such as .doc, .odt, .pdf, .ppt,
>>> .odp are missing
>>> - am I right here?: If I can provide a filter that makes text out of this
>>> files it should not be as tough to add them
>>> - we may be better off if we have an attribute with each attachment
>>> telling
>>> its MIME type as far as detectable at attachment time, that way we are
>>> not
>>> as much dependent on correct file extentions
>>>
>>> - a quick suggestion: please add .mm as another xml type. The freemind
>>> plugin is of great value.
>>>
>>> kind regards
>>>
>>>
>>> Rolf
>>>
>>>
>>>
>>> On 11.01.2011 18:42, Harry Metske wrote:
>>>
>>>
>>>
>>>> Rolf,
>>>>
>>>> see the source
>>>>
>>>>
>>>> https://github.com/apache/jspwiki/blob/jspwiki_2_8_5/src/com/ecyrd/jspwiki/search/LuceneSearchProvider.java#L328
>>>>
>>>>
>>>> as you can see, currently the filetypes are hardcoded to just 4 types.
>>>> We could make this a configurable option, patches are welcome.
>>>>
>>>> You say "comments given to an Attachment", I assume you mean Change
>>>> Notes
>>>> entered while uploading an attachment (or saving an normal Wiki Page).
>>>> That is a bit more work I think.
>>>> Being a complete Lucene null, but looking at the code it looks like we
>>>> could
>>>> add another field (we already index the page author and page name) for
>>>> the
>>>> Change Note.
>>>>
>>>> regards,
>>>> Harry
>>>>
>>>>
>>>> 2011/1/10 Rolf Schumacher<ma...@august.de>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> I am using JSPWiki 2.8.4
>>>>>
>>>>> Is it possible to extend a search to attachments to some mime types,
>>>>> e.g.
>>>>> pdf?
>>>>>
>>>>> Is it possible to extend a search to the comments given to an
>>>>> attachment?
>>>>>
>>>>> kind regards
>>>>>
>>>>> Rolf
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>

Re: searching

Posted by Rolf Schumacher <ma...@august.de>.
Yes, sounds great, Harry.

The function getAttachmentContent(Attachment) is called whenever 
setupTask is executed.

It would be another functionality to feed Lucene just after attachment 
gets ready, a good idea.

What I meant is to make the text conversion dependent on the MIME type 
of the attachment instead of the filename extensions, however this is 
not really important in the first place.

I would like to go after this immediately, however, due to overload in 
other areas, this will take a while. I will come back asap because 
accumulated knowledge is not only in wiki pages but in attachments as well.

Rolf

On 14.01.2011 20:30, Harry Metske wrote:
> making a filter that processes "non plain text"  files like the ones you
> mentioned sounds good.
> If I understand it correctly it should be called when adding an attachment,
> it should process the file creating searchable text and hand them off to
> lucene for indexing right ?
> please also consider a unit test for it.
>
> adding a few more file-types for pure text files is a good quick-win,
> starting with .mm .htm .xhtml .java .c .cpp .php .asm .sh .properties .kml
> .gpx .loc
>
> anyone else opinions, suggestions ?
>
> regards,
> Harry
>
> 2011/1/13 Rolf Schumacher<ma...@august.de>
>
>    
>> ok, Harry, thank you for the link.
>>
>> My suggestions, please correct:
>>
>> - hard-coding of file types seems to me as not a problem: anything shall be
>> searched
>> - the list is too short, important types such as .doc, .odt, .pdf, .ppt,
>> .odp are missing
>> - am I right here?: If I can provide a filter that makes text out of this
>> files it should not be as tough to add them
>> - we may be better off if we have an attribute with each attachment telling
>> its MIME type as far as detectable at attachment time, that way we are not
>> as much dependent on correct file extentions
>>
>> - a quick suggestion: please add .mm as another xml type. The freemind
>> plugin is of great value.
>>
>> kind regards
>>
>>
>> Rolf
>>
>>
>>
>> On 11.01.2011 18:42, Harry Metske wrote:
>>
>>      
>>> Rolf,
>>>
>>> see the source
>>>
>>> https://github.com/apache/jspwiki/blob/jspwiki_2_8_5/src/com/ecyrd/jspwiki/search/LuceneSearchProvider.java#L328
>>>
>>>
>>> as you can see, currently the filetypes are hardcoded to just 4 types.
>>> We could make this a configurable option, patches are welcome.
>>>
>>> You say "comments given to an Attachment", I assume you mean Change Notes
>>> entered while uploading an attachment (or saving an normal Wiki Page).
>>> That is a bit more work I think.
>>> Being a complete Lucene null, but looking at the code it looks like we
>>> could
>>> add another field (we already index the page author and page name) for the
>>> Change Note.
>>>
>>> regards,
>>> Harry
>>>
>>>
>>> 2011/1/10 Rolf Schumacher<ma...@august.de>
>>>
>>>
>>>
>>>        
>>>> I am using JSPWiki 2.8.4
>>>>
>>>> Is it possible to extend a search to attachments to some mime types, e.g.
>>>> pdf?
>>>>
>>>> Is it possible to extend a search to the comments given to an attachment?
>>>>
>>>> kind regards
>>>>
>>>> Rolf
>>>>
>>>>
>>>>
>>>>          
>>>
>>>        
>>      
>    

Re: searching

Posted by Harry Metske <ha...@gmail.com>.
making a filter that processes "non plain text"  files like the ones you
mentioned sounds good.
If I understand it correctly it should be called when adding an attachment,
it should process the file creating searchable text and hand them off to
lucene for indexing right ?
please also consider a unit test for it.

adding a few more file-types for pure text files is a good quick-win,
starting with .mm .htm .xhtml .java .c .cpp .php .asm .sh .properties .kml
.gpx .loc

anyone else opinions, suggestions ?

regards,
Harry

2011/1/13 Rolf Schumacher <ma...@august.de>

> ok, Harry, thank you for the link.
>
> My suggestions, please correct:
>
> - hard-coding of file types seems to me as not a problem: anything shall be
> searched
> - the list is too short, important types such as .doc, .odt, .pdf, .ppt,
> .odp are missing
> - am I right here?: If I can provide a filter that makes text out of this
> files it should not be as tough to add them
> - we may be better off if we have an attribute with each attachment telling
> its MIME type as far as detectable at attachment time, that way we are not
> as much dependent on correct file extentions
>
> - a quick suggestion: please add .mm as another xml type. The freemind
> plugin is of great value.
>
> kind regards
>
>
> Rolf
>
>
>
> On 11.01.2011 18:42, Harry Metske wrote:
>
>> Rolf,
>>
>> see the source
>>
>> https://github.com/apache/jspwiki/blob/jspwiki_2_8_5/src/com/ecyrd/jspwiki/search/LuceneSearchProvider.java#L328
>>
>>
>> as you can see, currently the filetypes are hardcoded to just 4 types.
>> We could make this a configurable option, patches are welcome.
>>
>> You say "comments given to an Attachment", I assume you mean Change Notes
>> entered while uploading an attachment (or saving an normal Wiki Page).
>> That is a bit more work I think.
>> Being a complete Lucene null, but looking at the code it looks like we
>> could
>> add another field (we already index the page author and page name) for the
>> Change Note.
>>
>> regards,
>> Harry
>>
>>
>> 2011/1/10 Rolf Schumacher<ma...@august.de>
>>
>>
>>
>>> I am using JSPWiki 2.8.4
>>>
>>> Is it possible to extend a search to attachments to some mime types, e.g.
>>> pdf?
>>>
>>> Is it possible to extend a search to the comments given to an attachment?
>>>
>>> kind regards
>>>
>>> Rolf
>>>
>>>
>>>
>>
>>
>

Re: searching

Posted by Rolf Schumacher <ma...@august.de>.
ok, Harry, thank you for the link.

My suggestions, please correct:

- hard-coding of file types seems to me as not a problem: anything shall 
be searched
- the list is too short, important types such as .doc, .odt, .pdf, .ppt, 
.odp are missing
- am I right here?: If I can provide a filter that makes text out of 
this files it should not be as tough to add them
- we may be better off if we have an attribute with each attachment 
telling its MIME type as far as detectable at attachment time, that way 
we are not as much dependent on correct file extentions

- a quick suggestion: please add .mm as another xml type. The freemind 
plugin is of great value.

kind regards


Rolf


On 11.01.2011 18:42, Harry Metske wrote:
> Rolf,
>
> see the source
> https://github.com/apache/jspwiki/blob/jspwiki_2_8_5/src/com/ecyrd/jspwiki/search/LuceneSearchProvider.java#L328
>
>
> as you can see, currently the filetypes are hardcoded to just 4 types.
> We could make this a configurable option, patches are welcome.
>
> You say "comments given to an Attachment", I assume you mean Change Notes
> entered while uploading an attachment (or saving an normal Wiki Page).
> That is a bit more work I think.
> Being a complete Lucene null, but looking at the code it looks like we could
> add another field (we already index the page author and page name) for the
> Change Note.
>
> regards,
> Harry
>
>
> 2011/1/10 Rolf Schumacher<ma...@august.de>
>
>    
>> I am using JSPWiki 2.8.4
>>
>> Is it possible to extend a search to attachments to some mime types, e.g.
>> pdf?
>>
>> Is it possible to extend a search to the comments given to an attachment?
>>
>> kind regards
>>
>> Rolf
>>
>>      
>    

Re: searching

Posted by Harry Metske <ha...@gmail.com>.
Rolf,

see the source
https://github.com/apache/jspwiki/blob/jspwiki_2_8_5/src/com/ecyrd/jspwiki/search/LuceneSearchProvider.java#L328


as you can see, currently the filetypes are hardcoded to just 4 types.
We could make this a configurable option, patches are welcome.

You say "comments given to an Attachment", I assume you mean Change Notes
entered while uploading an attachment (or saving an normal Wiki Page).
That is a bit more work I think.
Being a complete Lucene null, but looking at the code it looks like we could
add another field (we already index the page author and page name) for the
Change Note.

regards,
Harry


2011/1/10 Rolf Schumacher <ma...@august.de>

> I am using JSPWiki 2.8.4
>
> Is it possible to extend a search to attachments to some mime types, e.g.
> pdf?
>
> Is it possible to extend a search to the comments given to an attachment?
>
> kind regards
>
> Rolf
>