You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/08/02 17:33:19 UTC

[jira] Commented: (TIKA-245) Support of CHM Format

    [ https://issues.apache.org/jira/browse/TIKA-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894574#action_12894574 ] 

Nick Burch commented on TIKA-245:
---------------------------------

JCHM seems to be under the CDDL license, so we're fine to use the Jar as a runtime dependency as per http://www.apache.org/legal/3party.html

However, before we can use it, we'll need to get the jar into maven central. One problem would appear to be the lack of response from the authors, as seen from the lack of recent commits, and the problems you've had with getting your bug fixes applied

Personally, I'd suggest you try a bit more to get hold of the original authors - try emails, sourceforge trackers etc. If you can get in touch, hopefully they'll make you a maintainer. That would allow you to apply patches, and request the maven central upload

Otherwise, I guess the only option is for you to fork the project, probably on sourceforge (the license doesn't permit it to be hosted by apache). You can apply the patches to your fork, and have it uploaded into maven central.

Once a patched version is in maven, we can add the dependency to Tika, and apply your parser patch!

> Support of CHM Format
> ---------------------
>
>                 Key: TIKA-245
>                 URL: https://issues.apache.org/jira/browse/TIKA-245
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>         Environment: All
>            Reporter: Karl Heinz Marbaise
>            Priority: Minor
>         Attachments: TIKA-245.tikhonov.20103107.patch.txt
>
>
> It might be a good idea to support the CHM File format of Windows. Some information about http://en.wikipedia.org/wiki/Microsoft_Compiled_HTML_Help#Extracting_to_HTML. The CHM format contains HTML files which can be parsed by Tika. So the "only" problem is to extract the data from the CHM file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (TIKA-245) Support of CHM Format

Posted by Oleg Tikhonov <ol...@gmail.com>.
I've written to the sourceforge.org's support having asked to take over the
project or help me to contact the owner. Here is their response:
"Comment:

 Hello,

 We have attempted contact with the current project administrator. We will
 let you know once they approve or reject this takeover request. If they
 have given approval, or if they have not responded within three months
 time, we will continue with the takeover.

 If you have any further questions, please don't hesitate to ask.

 Regards,
 Chris Tsai, !SourceForge.net Support"






On Mon, Aug 2, 2010 at 6:33 PM, Nick Burch (JIRA) <ji...@apache.org> wrote:

>
>    [
> https://issues.apache.org/jira/browse/TIKA-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894574#action_12894574]
>
> Nick Burch commented on TIKA-245:
> ---------------------------------
>
> JCHM seems to be under the CDDL license, so we're fine to use the Jar as a
> runtime dependency as per http://www.apache.org/legal/3party.html
>
> However, before we can use it, we'll need to get the jar into maven
> central. One problem would appear to be the lack of response from the
> authors, as seen from the lack of recent commits, and the problems you've
> had with getting your bug fixes applied
>
> Personally, I'd suggest you try a bit more to get hold of the original
> authors - try emails, sourceforge trackers etc. If you can get in touch,
> hopefully they'll make you a maintainer. That would allow you to apply
> patches, and request the maven central upload
>
> Otherwise, I guess the only option is for you to fork the project, probably
> on sourceforge (the license doesn't permit it to be hosted by apache). You
> can apply the patches to your fork, and have it uploaded into maven central.
>
> Once a patched version is in maven, we can add the dependency to Tika, and
> apply your parser patch!
>
> > Support of CHM Format
> > ---------------------
> >
> >                 Key: TIKA-245
> >                 URL: https://issues.apache.org/jira/browse/TIKA-245
> >             Project: Tika
> >          Issue Type: New Feature
> >          Components: parser
> >         Environment: All
> >            Reporter: Karl Heinz Marbaise
> >            Priority: Minor
> >         Attachments: TIKA-245.tikhonov.20103107.patch.txt
> >
> >
> > It might be a good idea to support the CHM File format of Windows. Some
> information about
> http://en.wikipedia.org/wiki/Microsoft_Compiled_HTML_Help#Extracting_to_HTML.
> The CHM format contains HTML files which can be parsed by Tika. So the
> "only" problem is to extract the data from the CHM file.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>


-- 
Best regards, Oleg.