You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@jspwiki.apache.org by pu...@gmail.com on 2009/04/21 17:39:47 UTC
stripping the page of wiki markup
Hi experts,
I am new to JSPWIKI.
I want to do Indexing based on page contents. But I want to remove the wiki
mark-ups from the page and index only plain textual data. Please let me
know how this can be done.
Thanks and regards,
Pushker Chaubey
Re: stripping the page of wiki markup
Posted by pushker chaubey <pu...@gmail.com>.
Hi Janne,
Thanks a lot!
regards,
Pushker Chaubey
On Fri, Apr 24, 2009 at 12:02 AM, Janne Jalkanen
<ja...@ecyrd.com>wrote:
>
> Yupyup, you're quite correct. Plugins always output XHTML, so you need to
> turn off plugin rendering with
>
> wikiContext.setVariable( RenderingManager.VAR_EXECUTE_PLUGINS,
> Boolean.FALSE );
>
> You can also experiment with setVariable(
> RenderingManager.WYSIWYG_EDITOR_MODE, Boolean.FALSE ), but that works only
> for a certain set of plugins.
>
> /Janne
>
>
> On 23 Apr 2009, at 15:11, pushker.chaubey@gmail.com wrote:
>
> Hi Janne, Experts
>>
>> I beleive this '[{TableOfContents }]' is handled by a plugin which
>> generates and appends a table of content at the top. And may be this is why
>> it has not been handled by cleanTextRenderer
>>
>> I took a log of text generated by CleanTextRenderer and this is what I
>> found
>>
>> <div class="toc">
>> <div class="collapsebox">
>> <h4>Table of Contents</h4>
>> <ul>
>> <li class="toclevel-1"><a class="wikipage"
>> href="Wiki.jsp?page=Main#section-Main-Chapter1">Chapter 1</a></li>
>> <li class="toclevel-2"><a class="wikipage"
>> href="Wiki.jsp?page=Main#section-Main-Chapter1.1">Chapter 1.1</a></li>
>> </ul>
>> </div>
>> </div>
>>
>> !!! Chapter 1
>> Test page without any helpful info
>> !! Chapter 1.1
>> hi
>>
>> So, I observe that apart from this additional html block at the top, rest
>> is what is actually expected.
>>
>>
>> Thanks,
>> Pushker Chaubey
>>
>> On Apr 23, 2009 3:42pm, pushker.chaubey@gmail.com wrote:
>>
>>> Hi Janne,
>>>
>>
>>
>>
>>
>>
>> I added code as your suggestions. So now I am using text generated from
>>> cleanTextRender to buld index on the page content.
>>>
>>
>>
>> However there seems to be some problem whenever I have '[{TableOfContents
>>> }]' mark up in my page then CleanTextRender possibly generates HTML code or
>>> somethig similar. For other markups like %%strike and !!!, the html code is
>>> not generated and I get pure text. Its just '[{TableOfContents }]' that
>>> seems to be translated to html (or something similar) code.
>>>
>>
>>
>>
>>
>>
>> How I produce this issue:
>>>
>>
>>
>> 1) I create a page named TestPage with the following content
>>>
>>
>>
>> [{TableOfContents }]
>>>
>>
>>
>> !!! Chapter 1
>>>
>>
>>
>> Test page without any helpful info
>>>
>>
>>
>> !! Chapter 1.1
>>>
>>
>>
>> hi
>>>
>>
>>
>>
>>
>>
>> 2) I save the page and then I search for keywords like 'href' and 'div'
>>>
>>
>>
>> 3) The page that I just created comes as search result
>>>
>>
>>
>>
>>
>>
>> However when I edit TestPage and remove the above contents and just put
>>> following code
>>>
>>
>>
>> %%strike God is great!!/%?
>>>
>>
>>
>> and save and Then search using 'div' keyword, it does not show any result
>>> and works correctly.That confirms CleanTextRender did NOT return html
>>> equivalent for %%strike and thus no html code was used for index creation.
>>> Note that '' is html translation for %%strike
>>>
>>
>>
>>
>>
>>
>> So, the problem seems to be with pure text conversion of -
>>> [{TableOfContents }]
>>>
>>
>>
>>
>>
>>
>> Is this some bug with Clean text renderer, or am I making a mistake
>>> somewhere.
>>>
>>
>>
>>
>>
>>
>> Please suggest how to resolve it.
>>>
>>
>>
>>
>>
>>
>> Thanks and regards!
>>>
>>
>>
>> Pushker Chaubey
>>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Apr 21, 2009 10:06pm, Janne Jalkanen janne.jalkanen@ecyrd.com> wrote:
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> > Try the CleanTextRenderer. Get a WikiDocument from the
>>> JSPWikiMarkupParser, then create a CleanTextRenderer instance and pass the
>>> document to it. Look at RenderingManager for help.
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> > /Janne
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> > On 21 Apr 2009, at 18:39, pushker.chaubey@gmail.com wrote:
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> > Hi experts,
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> > I am new to JSPWIKI.
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> > I want to do Indexing based on page contents. But I want to remove the
>>> wiki mark-ups from the page and index only plain textual data. Please let me
>>> know how this can be done.
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> > Thanks and regards,
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> > Pushker Chaubey
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>>
>> >
>>>
>>
>
--
Thanks n Regards,
Pushker
Re: stripping the page of wiki markup
Posted by Janne Jalkanen <ja...@ecyrd.com>.
Yupyup, you're quite correct. Plugins always output XHTML, so you
need to turn off plugin rendering with
wikiContext.setVariable( RenderingManager.VAR_EXECUTE_PLUGINS,
Boolean.FALSE );
You can also experiment with
setVariable( RenderingManager.WYSIWYG_EDITOR_MODE, Boolean.FALSE ),
but that works only for a certain set of plugins.
/Janne
On 23 Apr 2009, at 15:11, pushker.chaubey@gmail.com wrote:
> Hi Janne, Experts
>
> I beleive this '[{TableOfContents }]' is handled by a plugin which
> generates and appends a table of content at the top. And may be this
> is why it has not been handled by cleanTextRenderer
>
> I took a log of text generated by CleanTextRenderer and this is what
> I found
>
> <div class="toc">
> <div class="collapsebox">
> <h4>Table of Contents</h4>
> <ul>
> <li class="toclevel-1"><a class="wikipage" href="Wiki.jsp?
> page=Main#section-Main-Chapter1">Chapter 1</a></li>
> <li class="toclevel-2"><a class="wikipage" href="Wiki.jsp?
> page=Main#section-Main-Chapter1.1">Chapter 1.1</a></li>
> </ul>
> </div>
> </div>
>
> !!! Chapter 1
> Test page without any helpful info
> !! Chapter 1.1
> hi
>
> So, I observe that apart from this additional html block at the top,
> rest is what is actually expected.
>
>
> Thanks,
> Pushker Chaubey
>
> On Apr 23, 2009 3:42pm, pushker.chaubey@gmail.com wrote:
>> Hi Janne,
>
>
>
>
>
>> I added code as your suggestions. So now I am using text generated
>> from cleanTextRender to buld index on the page content.
>
>
>> However there seems to be some problem whenever I have
>> '[{TableOfContents }]' mark up in my page then CleanTextRender
>> possibly generates HTML code or somethig similar. For other markups
>> like %%strike and !!!, the html code is not generated and I get
>> pure text. Its just '[{TableOfContents }]' that seems to be
>> translated to html (or something similar) code.
>
>
>
>
>
>> How I produce this issue:
>
>
>> 1) I create a page named TestPage with the following content
>
>
>> [{TableOfContents }]
>
>
>> !!! Chapter 1
>
>
>> Test page without any helpful info
>
>
>> !! Chapter 1.1
>
>
>> hi
>
>
>
>
>
>> 2) I save the page and then I search for keywords like 'href' and
>> 'div'
>
>
>> 3) The page that I just created comes as search result
>
>
>
>
>
>> However when I edit TestPage and remove the above contents and just
>> put following code
>
>
>> %%strike God is great!!/%?
>
>
>> and save and Then search using 'div' keyword, it does not show any
>> result and works correctly.That confirms CleanTextRender did NOT
>> return html equivalent for %%strike and thus no html code was used
>> for index creation. Note that '' is html translation for %%strike
>
>
>
>
>
>> So, the problem seems to be with pure text conversion of -
>> [{TableOfContents }]
>
>
>
>
>
>> Is this some bug with Clean text renderer, or am I making a mistake
>> somewhere.
>
>
>
>
>
>> Please suggest how to resolve it.
>
>
>
>
>
>> Thanks and regards!
>
>
>> Pushker Chaubey
>
>
>
>
>
>
>
>
>
>
>
>> On Apr 21, 2009 10:06pm, Janne Jalkanen janne.jalkanen@ecyrd.com>
>> wrote:
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> > Try the CleanTextRenderer. Get a WikiDocument from the
>> JSPWikiMarkupParser, then create a CleanTextRenderer instance and
>> pass the document to it. Look at RenderingManager for help.
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> > /Janne
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> > On 21 Apr 2009, at 18:39, pushker.chaubey@gmail.com wrote:
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> > Hi experts,
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> > I am new to JSPWIKI.
>
>
>> >
>
>
>> >
>
>
>> > I want to do Indexing based on page contents. But I want to
>> remove the wiki mark-ups from the page and index only plain textual
>> data. Please let me know how this can be done.
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> > Thanks and regards,
>
>
>> >
>
>
>> >
>
>
>> > Pushker Chaubey
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
>
>
>> >
Re: Re: Re: stripping the page of wiki markup
Posted by pu...@gmail.com.
Hi Janne, Experts
I beleive this '[{TableOfContents }]' is handled by a plugin which
generates and appends a table of content at the top. And may be this is why
it has not been handled by cleanTextRenderer
I took a log of text generated by CleanTextRenderer and this is what I found
<div class="toc">
<div class="collapsebox">
<h4>Table of Contents</h4>
<ul>
<li class="toclevel-1"><a class="wikipage"
href="Wiki.jsp?page=Main#section-Main-Chapter1">Chapter 1</a></li>
<li class="toclevel-2"><a class="wikipage"
href="Wiki.jsp?page=Main#section-Main-Chapter1.1">Chapter 1.1</a></li>
</ul>
</div>
</div>
!!! Chapter 1
Test page without any helpful info
!! Chapter 1.1
hi
So, I observe that apart from this additional html block at the top, rest
is what is actually expected.
Thanks,
Pushker Chaubey
On Apr 23, 2009 3:42pm, pushker.chaubey@gmail.com wrote:
> Hi Janne,
> I added code as your suggestions. So now I am using text generated from
> cleanTextRender to buld index on the page content.
> However there seems to be some problem whenever I have '[{TableOfContents
> }]' mark up in my page then CleanTextRender possibly generates HTML code
> or somethig similar. For other markups like %%strike and !!!, the html
> code is not generated and I get pure text. Its just '[{TableOfContents
> }]' that seems to be translated to html (or something similar) code.
> How I produce this issue:
> 1) I create a page named TestPage with the following content
> [{TableOfContents }]
> !!! Chapter 1
> Test page without any helpful info
> !! Chapter 1.1
> hi
> 2) I save the page and then I search for keywords like 'href' and 'div'
> 3) The page that I just created comes as search result
> However when I edit TestPage and remove the above contents and just put
> following code
> %%strike God is great!!/%?
> and save and Then search using 'div' keyword, it does not show any result
> and works correctly.That confirms CleanTextRender did NOT return html
> equivalent for %%strike and thus no html code was used for index
> creation. Note that '' is html translation for %%strike
> So, the problem seems to be with pure text conversion of -
> [{TableOfContents }]
> Is this some bug with Clean text renderer, or am I making a mistake
> somewhere.
> Please suggest how to resolve it.
> Thanks and regards!
> Pushker Chaubey
> On Apr 21, 2009 10:06pm, Janne Jalkanen janne.jalkanen@ecyrd.com> wrote:
> >
> >
> >
> > Try the CleanTextRenderer. Get a WikiDocument from the
> JSPWikiMarkupParser, then create a CleanTextRenderer instance and pass
> the document to it. Look at RenderingManager for help.
> >
> >
> >
> >
> >
> > /Janne
> >
> >
> >
> >
> >
> > On 21 Apr 2009, at 18:39, pushker.chaubey@gmail.com wrote:
> >
> >
> >
> >
> >
> >
> >
> > Hi experts,
> >
> >
> >
> >
> >
> > I am new to JSPWIKI.
> >
> >
> > I want to do Indexing based on page contents. But I want to remove the
> wiki mark-ups from the page and index only plain textual data. Please let
> me know how this can be done.
> >
> >
> >
> >
> >
> > Thanks and regards,
> >
> >
> > Pushker Chaubey
> >
> >
> >
> >
> >
> >
> >
> >
Re: Re: stripping the page of wiki markup
Posted by pu...@gmail.com.
Hi Janne,
I added code as your suggestions. So now I am using text generated from
cleanTextRender to buld index on the page content.
However there seems to be some problem whenever I have '[{TableOfContents
}]' mark up in my page then CleanTextRender possibly generates HTML code or
somethig similar. For other markups like %%strike and !!!, the html code is
not generated and I get pure text. Its just '[{TableOfContents }]' that
seems to be translated to html (or something similar) code.
How I produce this issue:
1) I create a page named TestPage with the following content
[{TableOfContents }]
!!! Chapter 1
Test page without any helpful info
!! Chapter 1.1
hi
2) I save the page and then I search for keywords like 'href' and 'div'
3) The page that I just created comes as search result
However when I edit TestPage and remove the above contents and just put
following code
%%strike God is great!!/%?
and save and Then search using 'div' keyword, it does not show any result
and works correctly.That confirms CleanTextRender did NOT return html
equivalent for %%strike and thus no html code was used for index creation.
Note that '<div class="strike">' is html translation for %%strike
So, the problem seems to be with pure text conversion of -
[{TableOfContents }]
Is this some bug with Clean text renderer, or am I making a mistake
somewhere.
Please suggest how to resolve it.
Thanks and regards!
Pushker Chaubey
On Apr 21, 2009 10:06pm, Janne Jalkanen <ja...@ecyrd.com> wrote:
> Try the CleanTextRenderer. Get a WikiDocument from the
> JSPWikiMarkupParser, then create a CleanTextRenderer instance and pass
> the document to it. Look at RenderingManager for help.
> /Janne
> On 21 Apr 2009, at 18:39, pushker.chaubey@gmail.com wrote:
> Hi experts,
> I am new to JSPWIKI.
> I want to do Indexing based on page contents. But I want to remove the
> wiki mark-ups from the page and index only plain textual data. Please let
> me know how this can be done.
> Thanks and regards,
> Pushker Chaubey
Re: stripping the page of wiki markup
Posted by Janne Jalkanen <ja...@ecyrd.com>.
Try the CleanTextRenderer. Get a WikiDocument from the
JSPWikiMarkupParser, then create a CleanTextRenderer instance and pass
the document to it. Look at RenderingManager for help.
/Janne
On 21 Apr 2009, at 18:39, pushker.chaubey@gmail.com wrote:
> Hi experts,
>
> I am new to JSPWIKI.
> I want to do Indexing based on page contents. But I want to remove
> the wiki mark-ups from the page and index only plain textual data.
> Please let me know how this can be done.
>
> Thanks and regards,
> Pushker Chaubey