You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bloodhound.apache.org by Andrej Golcov <an...@digiverse.si> on 2013/02/14 13:18:17 UTC

Need advice: stripping of wiki syntax from Bloodhound Search results

HI,

Branko suggested to strip wiki syntax from Bloodhound Search results
that is IMHO quite reasonable suggestion. That featue will give us
better search scoring and better highlighting.

I have one question regarding implementation of this feature.
I far as I can see, existing formatters (e.g. trac.wiki.formatter.*
classes) provide wiki to html formatting but not wiki to stripped
text. Do I missed something?

One of the possibility, that I see, is to convert wiki to html and
than convert html to text. That does not look like the most optimal
solution.

Any alternatives, ideas?

Regards, Andrej

Re: Need advice: stripping of wiki syntax from Bloodhound Search results

Posted by Andrej Golcov <an...@digiverse.si>.
Thanks everybody for the valuable feedback.

I have played with solution based on the existing HtmlFormatter in
order to get text output. Current formatter implementation is so
tightly connected to the HTML markup and current user security context
that it can bring us a lot of complexity and potential bugs. So, I
decided proceed with very, very naive find-and-replace formatter that
should cover major cases. It's definitely not perfect but can be
improved later. So far, search output looks quite better to me.

Please, try and comment.
Regards, Andrej

Re: Need advice: stripping of wiki syntax from Bloodhound Search results

Posted by Olemis Lang <ol...@gmail.com>.
On 2/14/13, Matevž Bradač <ma...@gmail.com> wrote:
> On 14. Feb, 2013, at 15:33, Gary Martin wrote:
>> On 14/02/13 12:18, Andrej Golcov wrote:
>
>>> HI,
>>>

:)

>>> Branko suggested to strip wiki syntax from Bloodhound Search results
>>> that is IMHO quite reasonable suggestion. That featue will give us
>>> better search scoring and better highlighting.
>>>
>>> I have one question regarding implementation of this feature.
>>> I far as I can see, existing formatters (e.g. trac.wiki.formatter.*
>>> classes) provide wiki to html formatting but not wiki to stripped
>>> text. Do I missed something?
>>>

Well now I see that Trac search handler emits formatted text (i.e.
highlights search keywords using `.searchword#` classes) for this
purpose . I also noticed that we don't highlight those as we removed
Trac css in theme plugin .

>>> One of the possibility, that I see, is to convert wiki to html and
>>> than convert html to text. That does not look like the most optimal
>>> solution.
>>>
>>> Any alternatives, ideas?

Could you figure out how it does such a thing ?

>>
>> I should find out more about how the formatters work! My first thought
>> would be to look at creating a new formatter that strips out syntax but I
>> am not sure how big a job that will be.
>>

should be similar to link extraction formatter , but instead of
processing links and ignoring everything else , just process text and
ignore everything else .

[...]
>
> Since the formatters return a Markup, perhaps a quick workaround would be to
> use
> something like Markup's stripentities() and striptags() to get the "raw"
> text back?
>

afaict this should work too .

-- 
Regards,

Olemis.

Re: Need advice: stripping of wiki syntax from Bloodhound Search results

Posted by Matevž Bradač <ma...@gmail.com>.
Since the formatters return a Markup, perhaps a quick workaround would be to use
something like Markup's stripentities() and striptags() to get the "raw" text back?

--
matevz

On 14. Feb, 2013, at 15:33, Gary Martin wrote:

> On 14/02/13 12:18, Andrej Golcov wrote:
>> HI,
>> 
>> Branko suggested to strip wiki syntax from Bloodhound Search results
>> that is IMHO quite reasonable suggestion. That featue will give us
>> better search scoring and better highlighting.
>> 
>> I have one question regarding implementation of this feature.
>> I far as I can see, existing formatters (e.g. trac.wiki.formatter.*
>> classes) provide wiki to html formatting but not wiki to stripped
>> text. Do I missed something?
>> 
>> One of the possibility, that I see, is to convert wiki to html and
>> than convert html to text. That does not look like the most optimal
>> solution.
>> 
>> Any alternatives, ideas?
>> 
>> Regards, Andrej
> 
> I should find out more about how the formatters work! My first thought would be to look at creating a new formatter that strips out syntax but I am not sure how big a job that will be.
> 
> I don't mind seeing a sub-optimal solution, particularly if it is likely to be quick enough and quick to create. I think that the double conversion would be giving us the correct results - the rendered html must be considered what the user will want to be able to search, right?
> 
> Cheers,
>    Gary


Re: Need advice: stripping of wiki syntax from Bloodhound Search results

Posted by Gary Martin <ga...@wandisco.com>.
On 14/02/13 12:18, Andrej Golcov wrote:
> HI,
>
> Branko suggested to strip wiki syntax from Bloodhound Search results
> that is IMHO quite reasonable suggestion. That featue will give us
> better search scoring and better highlighting.
>
> I have one question regarding implementation of this feature.
> I far as I can see, existing formatters (e.g. trac.wiki.formatter.*
> classes) provide wiki to html formatting but not wiki to stripped
> text. Do I missed something?
>
> One of the possibility, that I see, is to convert wiki to html and
> than convert html to text. That does not look like the most optimal
> solution.
>
> Any alternatives, ideas?
>
> Regards, Andrej

I should find out more about how the formatters work! My first thought 
would be to look at creating a new formatter that strips out syntax but 
I am not sure how big a job that will be.

I don't mind seeing a sub-optimal solution, particularly if it is likely 
to be quick enough and quick to create. I think that the double 
conversion would be giving us the correct results - the rendered html 
must be considered what the user will want to be able to search, right?

Cheers,
     Gary