You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@jspwiki.apache.org by pu...@gmail.com on 2009/04/21 17:39:47 UTC

stripping the page of wiki markup

Hi experts,

I am new to JSPWIKI.
I want to do Indexing based on page contents. But I want to remove the wiki  
mark-ups from the page and index only plain textual data. Please let me  
know how this can be done.

Thanks and regards,
Pushker Chaubey

Re: stripping the page of wiki markup

Posted by pushker chaubey <pu...@gmail.com>.
Hi Janne,

Thanks a lot!

regards,
Pushker Chaubey

On Fri, Apr 24, 2009 at 12:02 AM, Janne Jalkanen
<ja...@ecyrd.com>wrote:

>
> Yupyup, you're quite correct.  Plugins always output XHTML, so you need to
> turn off plugin rendering with
>
> wikiContext.setVariable( RenderingManager.VAR_EXECUTE_PLUGINS,
> Boolean.FALSE );
>
> You can also experiment with setVariable(
> RenderingManager.WYSIWYG_EDITOR_MODE, Boolean.FALSE ), but that works only
> for a certain set of plugins.
>
> /Janne
>
>
> On 23 Apr 2009, at 15:11, pushker.chaubey@gmail.com wrote:
>
> Hi Janne, Experts
>>
>> I beleive this '[{TableOfContents }]' is handled by a plugin which
>> generates and appends a table of content at the top. And may be this is why
>> it has not been handled by cleanTextRenderer
>>
>> I took a log of text generated by CleanTextRenderer and this is what I
>> found
>>
>> <div class="toc">
>> <div class="collapsebox">
>> <h4>Table of Contents</h4>
>> <ul>
>> <li class="toclevel-1"><a class="wikipage"
>> href="Wiki.jsp?page=Main#section-Main-Chapter1">Chapter 1</a></li>
>> <li class="toclevel-2"><a class="wikipage"
>> href="Wiki.jsp?page=Main#section-Main-Chapter1.1">Chapter 1.1</a></li>
>> </ul>
>> </div>
>> </div>
>>
>> !!! Chapter 1
>> Test page without any helpful info
>> !! Chapter 1.1
>> hi
>>
>> So, I observe that apart from this additional html block at the top, rest
>> is what is actually expected.
>>
>>
>> Thanks,
>> Pushker Chaubey
>>
>> On Apr 23, 2009 3:42pm, pushker.chaubey@gmail.com wrote:
>>
>>> Hi Janne,
>>>
>>
>>
>>
>>
>>
>> I added code as your suggestions. So now I am using text generated from
>>> cleanTextRender to buld index on the page content.
>>>
>>
>>
>> However there seems to be some problem whenever I have '[{TableOfContents
>>> }]' mark up in my page then CleanTextRender possibly generates HTML code or
>>> somethig similar. For other markups like %%strike and !!!, the html code is
>>> not generated and I get pure text. Its just '[{TableOfContents }]' that
>>> seems to be translated to html (or something similar) code.
>>>
>>
>>
>>
>>
>>
>> How I produce this issue:
>>>
>>
>>
>> 1) I create a page named TestPage with the following content
>>>
>>
>>
>> [{TableOfContents }]
>>>
>>
>>
>> !!! Chapter 1
>>>
>>
>>
>> Test page without any helpful info
>>>
>>
>>
>> !! Chapter 1.1
>>>
>>
>>
>> hi
>>>
>>
>>
>>
>>
>>
>> 2) I save the page and then I search for keywords like 'href' and 'div'
>>>
>>
>>
>> 3) The page that I just created comes as search result
>>>
>>
>>
>>
>>
>>
>> However when I edit TestPage and remove the above contents and just put
>>> following code
>>>
>>
>>
>> %%strike God is great!!/%?
>>>
>>
>>
>> and save and Then search using 'div' keyword, it does not show any result
>>> and works correctly.That confirms CleanTextRender did NOT return html
>>> equivalent for %%strike and thus no html code was used for index creation.
>>> Note that '' is html translation for %%strike
>>>
>>
>>
>>
>>
>>
>> So, the problem seems to be with pure text conversion of -
>>> [{TableOfContents }]
>>>
>>
>>
>>
>>
>>
>> Is this some bug with Clean text renderer, or am I making a mistake
>>> somewhere.
>>>
>>
>>
>>
>>
>>
>> Please suggest how to resolve it.
>>>
>>
>>
>>
>>
>>
>> Thanks and regards!
>>>
>>
>>
>> Pushker Chaubey
>>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Apr 21, 2009 10:06pm, Janne Jalkanen janne.jalkanen@ecyrd.com> wrote:
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> > Try the CleanTextRenderer. Get a WikiDocument from the
>>> JSPWikiMarkupParser, then create a CleanTextRenderer instance and pass the
>>> document to it. Look at RenderingManager for help.
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> > /Janne
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> > On 21 Apr 2009, at 18:39, pushker.chaubey@gmail.com wrote:
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> > Hi experts,
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> > I am new to JSPWIKI.
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> > I want to do Indexing based on page contents. But I want to remove the
>>> wiki mark-ups from the page and index only plain textual data. Please let me
>>> know how this can be done.
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> > Thanks and regards,
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> > Pushker Chaubey
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>


-- 
Thanks n Regards,
Pushker

Re: stripping the page of wiki markup

Posted by Janne Jalkanen <ja...@ecyrd.com>.
Yupyup, you're quite correct.  Plugins always output XHTML, so you  
need to turn off plugin rendering with

wikiContext.setVariable( RenderingManager.VAR_EXECUTE_PLUGINS,  
Boolean.FALSE );

You can also experiment with  
setVariable( RenderingManager.WYSIWYG_EDITOR_MODE, Boolean.FALSE ),  
but that works only for a certain set of plugins.

/Janne

On 23 Apr 2009, at 15:11, pushker.chaubey@gmail.com wrote:

> Hi Janne, Experts
>
> I beleive this '[{TableOfContents }]' is handled by a plugin which  
> generates and appends a table of content at the top. And may be this  
> is why it has not been handled by cleanTextRenderer
>
> I took a log of text generated by CleanTextRenderer and this is what  
> I found
>
> <div class="toc">
> <div class="collapsebox">
> <h4>Table of Contents</h4>
> <ul>
> <li class="toclevel-1"><a class="wikipage" href="Wiki.jsp? 
> page=Main#section-Main-Chapter1">Chapter 1</a></li>
> <li class="toclevel-2"><a class="wikipage" href="Wiki.jsp? 
> page=Main#section-Main-Chapter1.1">Chapter 1.1</a></li>
> </ul>
> </div>
> </div>
>
> !!! Chapter 1
> Test page without any helpful info
> !! Chapter 1.1
> hi
>
> So, I observe that apart from this additional html block at the top,  
> rest is what is actually expected.
>
>
> Thanks,
> Pushker Chaubey
>
> On Apr 23, 2009 3:42pm, pushker.chaubey@gmail.com wrote:
>> Hi Janne,
>
>
>
>
>
>> I added code as your suggestions. So now I am using text generated  
>> from cleanTextRender to buld index on the page content.
>
>
>> However there seems to be some problem whenever I have  
>> '[{TableOfContents }]' mark up in my page then CleanTextRender  
>> possibly generates HTML code or somethig similar. For other markups  
>> like %%strike and !!!, the html code is not generated and I get  
>> pure text. Its just '[{TableOfContents }]' that seems to be  
>> translated to html (or something similar) code.
>
>
>
>
>
>> How I produce this issue:
>
>
>> 1) I create a page named TestPage with the following content
>
>
>> [{TableOfContents }]
>
>
>> !!! Chapter 1
>
>
>> Test page without any helpful info
>
>
>> !! Chapter 1.1
>
>
>> hi
>
>
>
>
>
>> 2) I save the page and then I search for keywords like 'href' and  
>> 'div'
>
>
>> 3) The page that I just created comes as search result
>
>
>
>
>
>> However when I edit TestPage and remove the above contents and just  
>> put following code
>
>
>> %%strike God is great!!/%?
>
>
>> and save and Then search using 'div' keyword, it does not show any  
>> result and works correctly.That confirms CleanTextRender did NOT  
>> return html equivalent for %%strike and thus no html code was used  
>> for index creation. Note that '' is html translation for %%strike
>
>
>
>
>
>> So, the problem seems to be with pure text conversion of -  
>> [{TableOfContents }]
>
>
>
>
>
>> Is this some bug with Clean text renderer, or am I making a mistake  
>> somewhere.
>
>
>
>
>
>> Please suggest how to resolve it.
>
>
>
>
>
>> Thanks and regards!
>
>
>> Pushker Chaubey
>
>
>
>
>
>
>
>
>
>
>
>> On Apr 21, 2009 10:06pm, Janne Jalkanen janne.jalkanen@ecyrd.com>  
>> wrote:
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> > Try the CleanTextRenderer. Get a WikiDocument from the  
>> JSPWikiMarkupParser, then create a CleanTextRenderer instance and  
>> pass the document to it. Look at RenderingManager for help.
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> > /Janne
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> > On 21 Apr 2009, at 18:39, pushker.chaubey@gmail.com wrote:
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> > Hi experts,
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> > I am new to JSPWIKI.
>
>
>> >
>
>
>> >
>
>
>> > I want to do Indexing based on page contents. But I want to  
>> remove the wiki mark-ups from the page and index only plain textual  
>> data. Please let me know how this can be done.
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> > Thanks and regards,
>
>
>> >
>
>
>> >
>
>
>> > Pushker Chaubey
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >


Re: Re: Re: stripping the page of wiki markup

Posted by pu...@gmail.com.
Hi Janne, Experts

I beleive this '[{TableOfContents }]' is handled by a plugin which  
generates and appends a table of content at the top. And may be this is why  
it has not been handled by cleanTextRenderer

I took a log of text generated by CleanTextRenderer and this is what I found

<div class="toc">
<div class="collapsebox">
<h4>Table of Contents</h4>
<ul>
<li class="toclevel-1"><a class="wikipage"  
href="Wiki.jsp?page=Main#section-Main-Chapter1">Chapter 1</a></li>
<li class="toclevel-2"><a class="wikipage"  
href="Wiki.jsp?page=Main#section-Main-Chapter1.1">Chapter 1.1</a></li>
</ul>
</div>
</div>

!!! Chapter 1
Test page without any helpful info
!! Chapter 1.1
hi

So, I observe that apart from this additional html block at the top, rest  
is what is actually expected.


Thanks,
Pushker Chaubey

On Apr 23, 2009 3:42pm, pushker.chaubey@gmail.com wrote:
> Hi Janne,





> I added code as your suggestions. So now I am using text generated from  
> cleanTextRender to buld index on the page content.


> However there seems to be some problem whenever I have '[{TableOfContents  
> }]' mark up in my page then CleanTextRender possibly generates HTML code  
> or somethig similar. For other markups like %%strike and !!!, the html  
> code is not generated and I get pure text. Its just '[{TableOfContents  
> }]' that seems to be translated to html (or something similar) code.





> How I produce this issue:


> 1) I create a page named TestPage with the following content


> [{TableOfContents }]


> !!! Chapter 1


> Test page without any helpful info


> !! Chapter 1.1


> hi





> 2) I save the page and then I search for keywords like 'href' and 'div'


> 3) The page that I just created comes as search result





> However when I edit TestPage and remove the above contents and just put  
> following code


> %%strike God is great!!/%?


> and save and Then search using 'div' keyword, it does not show any result  
> and works correctly.That confirms CleanTextRender did NOT return html  
> equivalent for %%strike and thus no html code was used for index  
> creation. Note that '' is html translation for %%strike





> So, the problem seems to be with pure text conversion of -  
> [{TableOfContents }]





> Is this some bug with Clean text renderer, or am I making a mistake  
> somewhere.





> Please suggest how to resolve it.





> Thanks and regards!


> Pushker Chaubey











> On Apr 21, 2009 10:06pm, Janne Jalkanen janne.jalkanen@ecyrd.com> wrote:


> >


> >


> >


> > Try the CleanTextRenderer. Get a WikiDocument from the  
> JSPWikiMarkupParser, then create a CleanTextRenderer instance and pass  
> the document to it. Look at RenderingManager for help.


> >


> >


> >


> >


> >


> > /Janne


> >


> >


> >


> >


> >


> > On 21 Apr 2009, at 18:39, pushker.chaubey@gmail.com wrote:


> >


> >


> >


> >


> >


> >


> >


> > Hi experts,


> >


> >


> >


> >


> >


> > I am new to JSPWIKI.


> >


> >


> > I want to do Indexing based on page contents. But I want to remove the  
> wiki mark-ups from the page and index only plain textual data. Please let  
> me know how this can be done.


> >


> >


> >


> >


> >


> > Thanks and regards,


> >


> >


> > Pushker Chaubey


> >


> >


> >


> >


> >


> >


> >


> >

Re: Re: stripping the page of wiki markup

Posted by pu...@gmail.com.
Hi Janne,

I added code as your suggestions. So now I am using text generated from  
cleanTextRender to buld index on the page content.
However there seems to be some problem whenever I have '[{TableOfContents  
}]' mark up in my page then CleanTextRender possibly generates HTML code or  
somethig similar. For other markups like %%strike and !!!, the html code is  
not generated and I get pure text. Its just '[{TableOfContents }]' that  
seems to be translated to html (or something similar) code.

How I produce this issue:
1) I create a page named TestPage with the following content
[{TableOfContents }]
!!! Chapter 1
Test page without any helpful info
!! Chapter 1.1
hi

2) I save the page and then I search for keywords like 'href' and 'div'
3) The page that I just created comes as search result

However when I edit TestPage and remove the above contents and just put  
following code
%%strike God is great!!/%?
and save and Then search using 'div' keyword, it does not show any result  
and works correctly.That confirms CleanTextRender did NOT return html  
equivalent for %%strike and thus no html code was used for index creation.  
Note that '<div class="strike">' is html translation for %%strike

So, the problem seems to be with pure text conversion of -  
[{TableOfContents }]

Is this some bug with Clean text renderer, or am I making a mistake  
somewhere.

Please suggest how to resolve it.

Thanks and regards!
Pushker Chaubey



On Apr 21, 2009 10:06pm, Janne Jalkanen <ja...@ecyrd.com> wrote:



> Try the CleanTextRenderer. Get a WikiDocument from the  
> JSPWikiMarkupParser, then create a CleanTextRenderer instance and pass  
> the document to it. Look at RenderingManager for help.





> /Janne





> On 21 Apr 2009, at 18:39, pushker.chaubey@gmail.com wrote:







> Hi experts,





> I am new to JSPWIKI.


> I want to do Indexing based on page contents. But I want to remove the  
> wiki mark-ups from the page and index only plain textual data. Please let  
> me know how this can be done.





> Thanks and regards,


> Pushker Chaubey









Re: stripping the page of wiki markup

Posted by Janne Jalkanen <ja...@ecyrd.com>.
Try the CleanTextRenderer.  Get a WikiDocument from the  
JSPWikiMarkupParser, then create a CleanTextRenderer instance and pass  
the document to it. Look at RenderingManager for help.

/Janne

On 21 Apr 2009, at 18:39, pushker.chaubey@gmail.com wrote:

> Hi experts,
>
> I am new to JSPWIKI.
> I want to do Indexing based on page contents. But I want to remove  
> the wiki mark-ups from the page and index only plain textual data.  
> Please let me know how this can be done.
>
> Thanks and regards,
> Pushker Chaubey