You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Benson Margulies <bi...@gmail.com> on 2012/09/02 13:53:38 UTC

Circular link in documentation

On this page [1]


the first link under 'Getting Started' points back to the page it is
on, rather than to a page that actually documents how to use tika.



http://tika.apache.org/1.2/parser_guide.html#gettingstarted.html

Re: Circular link in documentation

Posted by Michael McCandless <lu...@mikemccandless.com>.
Aha!  Nevermind :)  I was in fact failing to save the file in UTF8 correctly.

Now it looks like it's working ... so I'll go back to unescaped unicode chars.

Thanks Benson.

Mike McCandless

http://blog.mikemccandless.com

On Wed, Sep 5, 2012 at 2:18 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Sep 5, 2012 at 11:08 AM, Benson Margulies <bi...@gmail.com> wrote:
>> On Wed, Sep 5, 2012 at 10:32 AM, Michael McCandless
>> <lu...@mikemccandless.com> wrote:
>>> I'm actually not sure which version we are using: we don't specify a
>>> <version> inside the <plugin> for maven-site-plugin in the pom.xml (am
>>> I looking at the right place?):
>>
>>
>> mvn help:effective-pom
>
> Aha, that's useful :)
>
> But alas forcing version=3.1 and adding the input/outputEncoding still
> remaps the characters as U+FFFD ... maybe I'm putting the
> configuration in the wrong place?:
>
> Index: src/site/apt/1.2/index.apt
> ===================================================================
> --- src/site/apt/1.2/index.apt  (revision 1381198)
> +++ src/site/apt/1.2/index.apt  (working copy)
> @@ -126,7 +126,7 @@
>
>        * Ingo Renner
>
> -      * Jan H\u00F8ydahl
> +      * Jan Høydahl
>
>        * Jeremy Anderson
>
> Index: pom.xml
> ===================================================================
> --- pom.xml     (revision 1381197)
> +++ pom.xml     (working copy)
> @@ -71,9 +71,12 @@
>      <plugins>
>        <plugin>
>          <artifactId>maven-site-plugin</artifactId>
> +        <version>3.1</version>
>          <configuration>
>            <templateDirectory>src/site</templateDirectory>
>            <template>site.vm</template>
> +         <inputEncoding>UTF-8</inputEncoding>
> +         <outputEncoding>UTF-8</outputEncoding>
>          </configuration>
>          <executions>
>            <execution>
>
> If you apply that, run "mvn clean site", and open
> target/site/1.2/index.html, you should see that it incorrectly mapped
> to "Jan H&#xfffd;ydahl".
>
> Mike McCandless
>
> http://blog.mikemccandless.com

Re: Circular link in documentation

Posted by Michael McCandless <lu...@mikemccandless.com>.
Mike McCandless

http://blog.mikemccandless.com


On Wed, Sep 5, 2012 at 11:08 AM, Benson Margulies <bi...@gmail.com> wrote:
> On Wed, Sep 5, 2012 at 10:32 AM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> I'm actually not sure which version we are using: we don't specify a
>> <version> inside the <plugin> for maven-site-plugin in the pom.xml (am
>> I looking at the right place?):
>
>
> mvn help:effective-pom

Aha, that's useful :)

But alas forcing version=3.1 and adding the input/outputEncoding still
remaps the characters as U+FFFD ... maybe I'm putting the
configuration in the wrong place?:

Index: src/site/apt/1.2/index.apt
===================================================================
--- src/site/apt/1.2/index.apt	(revision 1381198)
+++ src/site/apt/1.2/index.apt	(working copy)
@@ -126,7 +126,7 @@

       * Ingo Renner

-      * Jan H\u00F8ydahl
+      * Jan Høydahl

       * Jeremy Anderson

Index: pom.xml
===================================================================
--- pom.xml	(revision 1381197)
+++ pom.xml	(working copy)
@@ -71,9 +71,12 @@
     <plugins>
       <plugin>
         <artifactId>maven-site-plugin</artifactId>
+        <version>3.1</version>
         <configuration>
           <templateDirectory>src/site</templateDirectory>
           <template>site.vm</template>
+	  <inputEncoding>UTF-8</inputEncoding>
+	  <outputEncoding>UTF-8</outputEncoding>
         </configuration>
         <executions>
           <execution>

If you apply that, run "mvn clean site", and open
target/site/1.2/index.html, you should see that it incorrectly mapped
to "Jan H&#xfffd;ydahl".

Mike McCandless

http://blog.mikemccandless.com

Re: Circular link in documentation

Posted by Benson Margulies <bi...@gmail.com>.
On Wed, Sep 5, 2012 at 10:32 AM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> I'm actually not sure which version we are using: we don't specify a
> <version> inside the <plugin> for maven-site-plugin in the pom.xml (am
> I looking at the right place?):


mvn help:effective-pom

reveals that you are using version 3.0. I recommend adding an explicit
version element for 3.1 into the site pom, rather than inheriting it
from the main parent, and then set up those params, and see what you
see.


>
>     http://svn.apache.org/repos/asf/tika/site/pom.xml
>
> I had tried both inputEncoding and outputEncoding but it didn't seem to work...
>
> Also: nevermind on the karma: I had an http:// checkout not https://
> (sorry for the noise, again!, Dave).  I just committed.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Sep 5, 2012 at 10:20 AM, Benson Margulies <bi...@gmail.com> wrote:
>> What version of the site plugin is in use? According to
>> http://maven.apache.org/plugins/maven-site-plugin/site-mojo.html, you
>> would need both input and output encoding with 3.1.
>>
>>
>> On Wed, Sep 5, 2012 at 10:14 AM, Michael McCandless
>> <lu...@mikemccandless.com> wrote:
>>> OK I think I have these issues fixed...
>>>
>>> However: I was unable to get Maven to respect UTF-8 encoding of the
>>> *.apt sources: no matter what I tried (the various encoding
>>> configuration options in pom.xml) it would always replace characters
>>> with U+FFFD.  So I did the workaround instead: it turns out you can
>>> specify \UXXXX in *.apt.
>>>
>>> However: I don't have the necessary karma to commit to
>>> http://svn.apache.org/repos/asf/tika/site.  Dave can you please fix
>>> that?  Thanks!
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>> On Wed, Sep 5, 2012 at 9:10 AM, Michael McCandless
>>> <lu...@mikemccandless.com> wrote:
>>>> Thanks Benson, I'll fix.
>>>>
>>>> Looks like a number of other links are also not working ... when we do
>>>> {{{api/org/apache/tika/...}}} we apparently must make that
>>>> {{{./api/org/apache/tika/...}}} instead (ie add the ./ prefix).  Maven
>>>> prints a warning when it's wrong ...
>>>>
>>>> I'm also trying to fix the broken UTF-8 encoding, eg see Jan Høydahl
>>>> in http://tika.apache.org/1.2/index.html: the ø is replaced with the
>>>> unicode replacement char (U+FFFD)...
>>>>
>>>> Mike McCandless
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>> On Tue, Sep 4, 2012 at 1:55 PM, Benson Margulies <bi...@gmail.com> wrote:
>>>>> http://svn.apache.org/repos/asf/tika/site
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 4, 2012 at 1:22 PM, Michael McCandless
>>>>> <lu...@mikemccandless.com> wrote:
>>>>>> We should fix that.
>>>>>>
>>>>>> How can I update the web site...?  Is this documented somewhere...?
>>>>>>
>>>>>> I see that http://wiki.apache.org/tika/ReleaseProcess refers to
>>>>>> src/site/src/documentation/content/xdocs/index.xml but that doesn't
>>>>>> exist in svn (I see a bunch of .apt sources that seem to correspond to
>>>>>> what's live on the site).  Can someone provide some pointers...?
>>>>>>
>>>>>> Mike McCandless
>>>>>>
>>>>>> http://blog.mikemccandless.com
>>>>>>
>>>>>>
>>>>>> On Sun, Sep 2, 2012 at 7:53 AM, Benson Margulies <bi...@gmail.com> wrote:
>>>>>>> On this page [1]
>>>>>>>
>>>>>>>
>>>>>>> the first link under 'Getting Started' points back to the page it is
>>>>>>> on, rather than to a page that actually documents how to use tika.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://tika.apache.org/1.2/parser_guide.html#gettingstarted.html

Re: Circular link in documentation

Posted by Michael McCandless <lu...@mikemccandless.com>.
I'm actually not sure which version we are using: we don't specify a
<version> inside the <plugin> for maven-site-plugin in the pom.xml (am
I looking at the right place?):

    http://svn.apache.org/repos/asf/tika/site/pom.xml

I had tried both inputEncoding and outputEncoding but it didn't seem to work...

Also: nevermind on the karma: I had an http:// checkout not https://
(sorry for the noise, again!, Dave).  I just committed.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Sep 5, 2012 at 10:20 AM, Benson Margulies <bi...@gmail.com> wrote:
> What version of the site plugin is in use? According to
> http://maven.apache.org/plugins/maven-site-plugin/site-mojo.html, you
> would need both input and output encoding with 3.1.
>
>
> On Wed, Sep 5, 2012 at 10:14 AM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> OK I think I have these issues fixed...
>>
>> However: I was unable to get Maven to respect UTF-8 encoding of the
>> *.apt sources: no matter what I tried (the various encoding
>> configuration options in pom.xml) it would always replace characters
>> with U+FFFD.  So I did the workaround instead: it turns out you can
>> specify \UXXXX in *.apt.
>>
>> However: I don't have the necessary karma to commit to
>> http://svn.apache.org/repos/asf/tika/site.  Dave can you please fix
>> that?  Thanks!
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Wed, Sep 5, 2012 at 9:10 AM, Michael McCandless
>> <lu...@mikemccandless.com> wrote:
>>> Thanks Benson, I'll fix.
>>>
>>> Looks like a number of other links are also not working ... when we do
>>> {{{api/org/apache/tika/...}}} we apparently must make that
>>> {{{./api/org/apache/tika/...}}} instead (ie add the ./ prefix).  Maven
>>> prints a warning when it's wrong ...
>>>
>>> I'm also trying to fix the broken UTF-8 encoding, eg see Jan Høydahl
>>> in http://tika.apache.org/1.2/index.html: the ø is replaced with the
>>> unicode replacement char (U+FFFD)...
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>> On Tue, Sep 4, 2012 at 1:55 PM, Benson Margulies <bi...@gmail.com> wrote:
>>>> http://svn.apache.org/repos/asf/tika/site
>>>>
>>>>
>>>>
>>>> On Tue, Sep 4, 2012 at 1:22 PM, Michael McCandless
>>>> <lu...@mikemccandless.com> wrote:
>>>>> We should fix that.
>>>>>
>>>>> How can I update the web site...?  Is this documented somewhere...?
>>>>>
>>>>> I see that http://wiki.apache.org/tika/ReleaseProcess refers to
>>>>> src/site/src/documentation/content/xdocs/index.xml but that doesn't
>>>>> exist in svn (I see a bunch of .apt sources that seem to correspond to
>>>>> what's live on the site).  Can someone provide some pointers...?
>>>>>
>>>>> Mike McCandless
>>>>>
>>>>> http://blog.mikemccandless.com
>>>>>
>>>>>
>>>>> On Sun, Sep 2, 2012 at 7:53 AM, Benson Margulies <bi...@gmail.com> wrote:
>>>>>> On this page [1]
>>>>>>
>>>>>>
>>>>>> the first link under 'Getting Started' points back to the page it is
>>>>>> on, rather than to a page that actually documents how to use tika.
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://tika.apache.org/1.2/parser_guide.html#gettingstarted.html

Re: Circular link in documentation

Posted by Benson Margulies <bi...@gmail.com>.
What version of the site plugin is in use? According to
http://maven.apache.org/plugins/maven-site-plugin/site-mojo.html, you
would need both input and output encoding with 3.1.


On Wed, Sep 5, 2012 at 10:14 AM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> OK I think I have these issues fixed...
>
> However: I was unable to get Maven to respect UTF-8 encoding of the
> *.apt sources: no matter what I tried (the various encoding
> configuration options in pom.xml) it would always replace characters
> with U+FFFD.  So I did the workaround instead: it turns out you can
> specify \UXXXX in *.apt.
>
> However: I don't have the necessary karma to commit to
> http://svn.apache.org/repos/asf/tika/site.  Dave can you please fix
> that?  Thanks!
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed, Sep 5, 2012 at 9:10 AM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> Thanks Benson, I'll fix.
>>
>> Looks like a number of other links are also not working ... when we do
>> {{{api/org/apache/tika/...}}} we apparently must make that
>> {{{./api/org/apache/tika/...}}} instead (ie add the ./ prefix).  Maven
>> prints a warning when it's wrong ...
>>
>> I'm also trying to fix the broken UTF-8 encoding, eg see Jan Høydahl
>> in http://tika.apache.org/1.2/index.html: the ø is replaced with the
>> unicode replacement char (U+FFFD)...
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Tue, Sep 4, 2012 at 1:55 PM, Benson Margulies <bi...@gmail.com> wrote:
>>> http://svn.apache.org/repos/asf/tika/site
>>>
>>>
>>>
>>> On Tue, Sep 4, 2012 at 1:22 PM, Michael McCandless
>>> <lu...@mikemccandless.com> wrote:
>>>> We should fix that.
>>>>
>>>> How can I update the web site...?  Is this documented somewhere...?
>>>>
>>>> I see that http://wiki.apache.org/tika/ReleaseProcess refers to
>>>> src/site/src/documentation/content/xdocs/index.xml but that doesn't
>>>> exist in svn (I see a bunch of .apt sources that seem to correspond to
>>>> what's live on the site).  Can someone provide some pointers...?
>>>>
>>>> Mike McCandless
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>>
>>>> On Sun, Sep 2, 2012 at 7:53 AM, Benson Margulies <bi...@gmail.com> wrote:
>>>>> On this page [1]
>>>>>
>>>>>
>>>>> the first link under 'Getting Started' points back to the page it is
>>>>> on, rather than to a page that actually documents how to use tika.
>>>>>
>>>>>
>>>>>
>>>>> http://tika.apache.org/1.2/parser_guide.html#gettingstarted.html

Re: Circular link in documentation

Posted by Michael McCandless <lu...@mikemccandless.com>.
OK I think I have these issues fixed...

However: I was unable to get Maven to respect UTF-8 encoding of the
*.apt sources: no matter what I tried (the various encoding
configuration options in pom.xml) it would always replace characters
with U+FFFD.  So I did the workaround instead: it turns out you can
specify \UXXXX in *.apt.

However: I don't have the necessary karma to commit to
http://svn.apache.org/repos/asf/tika/site.  Dave can you please fix
that?  Thanks!

Mike McCandless

http://blog.mikemccandless.com

On Wed, Sep 5, 2012 at 9:10 AM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> Thanks Benson, I'll fix.
>
> Looks like a number of other links are also not working ... when we do
> {{{api/org/apache/tika/...}}} we apparently must make that
> {{{./api/org/apache/tika/...}}} instead (ie add the ./ prefix).  Maven
> prints a warning when it's wrong ...
>
> I'm also trying to fix the broken UTF-8 encoding, eg see Jan Høydahl
> in http://tika.apache.org/1.2/index.html: the ø is replaced with the
> unicode replacement char (U+FFFD)...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Tue, Sep 4, 2012 at 1:55 PM, Benson Margulies <bi...@gmail.com> wrote:
>> http://svn.apache.org/repos/asf/tika/site
>>
>>
>>
>> On Tue, Sep 4, 2012 at 1:22 PM, Michael McCandless
>> <lu...@mikemccandless.com> wrote:
>>> We should fix that.
>>>
>>> How can I update the web site...?  Is this documented somewhere...?
>>>
>>> I see that http://wiki.apache.org/tika/ReleaseProcess refers to
>>> src/site/src/documentation/content/xdocs/index.xml but that doesn't
>>> exist in svn (I see a bunch of .apt sources that seem to correspond to
>>> what's live on the site).  Can someone provide some pointers...?
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Sun, Sep 2, 2012 at 7:53 AM, Benson Margulies <bi...@gmail.com> wrote:
>>>> On this page [1]
>>>>
>>>>
>>>> the first link under 'Getting Started' points back to the page it is
>>>> on, rather than to a page that actually documents how to use tika.
>>>>
>>>>
>>>>
>>>> http://tika.apache.org/1.2/parser_guide.html#gettingstarted.html

Re: Circular link in documentation

Posted by Michael McCandless <lu...@mikemccandless.com>.
Thanks Benson, I'll fix.

Looks like a number of other links are also not working ... when we do
{{{api/org/apache/tika/...}}} we apparently must make that
{{{./api/org/apache/tika/...}}} instead (ie add the ./ prefix).  Maven
prints a warning when it's wrong ...

I'm also trying to fix the broken UTF-8 encoding, eg see Jan Høydahl
in http://tika.apache.org/1.2/index.html: the ø is replaced with the
unicode replacement char (U+FFFD)...

Mike McCandless

http://blog.mikemccandless.com

On Tue, Sep 4, 2012 at 1:55 PM, Benson Margulies <bi...@gmail.com> wrote:
> http://svn.apache.org/repos/asf/tika/site
>
>
>
> On Tue, Sep 4, 2012 at 1:22 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> We should fix that.
>>
>> How can I update the web site...?  Is this documented somewhere...?
>>
>> I see that http://wiki.apache.org/tika/ReleaseProcess refers to
>> src/site/src/documentation/content/xdocs/index.xml but that doesn't
>> exist in svn (I see a bunch of .apt sources that seem to correspond to
>> what's live on the site).  Can someone provide some pointers...?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Sun, Sep 2, 2012 at 7:53 AM, Benson Margulies <bi...@gmail.com> wrote:
>>> On this page [1]
>>>
>>>
>>> the first link under 'Getting Started' points back to the page it is
>>> on, rather than to a page that actually documents how to use tika.
>>>
>>>
>>>
>>> http://tika.apache.org/1.2/parser_guide.html#gettingstarted.html

Re: Circular link in documentation

Posted by Benson Margulies <bi...@gmail.com>.
http://svn.apache.org/repos/asf/tika/site



On Tue, Sep 4, 2012 at 1:22 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> We should fix that.
>
> How can I update the web site...?  Is this documented somewhere...?
>
> I see that http://wiki.apache.org/tika/ReleaseProcess refers to
> src/site/src/documentation/content/xdocs/index.xml but that doesn't
> exist in svn (I see a bunch of .apt sources that seem to correspond to
> what's live on the site).  Can someone provide some pointers...?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Sun, Sep 2, 2012 at 7:53 AM, Benson Margulies <bi...@gmail.com> wrote:
>> On this page [1]
>>
>>
>> the first link under 'Getting Started' points back to the page it is
>> on, rather than to a page that actually documents how to use tika.
>>
>>
>>
>> http://tika.apache.org/1.2/parser_guide.html#gettingstarted.html

Re: Circular link in documentation

Posted by Michael McCandless <lu...@mikemccandless.com>.
We should fix that.

How can I update the web site...?  Is this documented somewhere...?

I see that http://wiki.apache.org/tika/ReleaseProcess refers to
src/site/src/documentation/content/xdocs/index.xml but that doesn't
exist in svn (I see a bunch of .apt sources that seem to correspond to
what's live on the site).  Can someone provide some pointers...?

Mike McCandless

http://blog.mikemccandless.com


On Sun, Sep 2, 2012 at 7:53 AM, Benson Margulies <bi...@gmail.com> wrote:
> On this page [1]
>
>
> the first link under 'Getting Started' points back to the page it is
> on, rather than to a page that actually documents how to use tika.
>
>
>
> http://tika.apache.org/1.2/parser_guide.html#gettingstarted.html