You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Ferdy Galema (Created) (JIRA)" <ji...@apache.org> on 2012/02/16 11:43:05 UTC

[jira] [Created] (NUTCH-1280) language-identifier should have option to use detected value by Tika even when uncertain

language-identifier should have option to use detected value by Tika even when uncertain
----------------------------------------------------------------------------------------

                 Key: NUTCH-1280
                 URL: https://issues.apache.org/jira/browse/NUTCH-1280
             Project: Nutch
          Issue Type: New Feature
          Components: parser
            Reporter: Ferdy Galema
             Fix For: nutchgora


Nutchtrunk has an option "lang.identification.only.certain", this should be the case for Nutchgora too. Note that it is set default to false. So this changes the default behaviour somewhat.

Patch will be right up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1280) language-identifier should have option to use detected value by Tika even when uncertain

Posted by "Ferdy Galema (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ferdy Galema updated NUTCH-1280:
--------------------------------

    Attachment: NUTCH-1280.txt
    
> language-identifier should have option to use detected value by Tika even when uncertain
> ----------------------------------------------------------------------------------------
>
>                 Key: NUTCH-1280
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1280
>             Project: Nutch
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Ferdy Galema
>            Priority: Minor
>             Fix For: nutchgora
>
>         Attachments: NUTCH-1280.txt
>
>
> Nutchtrunk has an option "lang.identification.only.certain", this should be the case for Nutchgora too. Note that it is set default to false. So this changes the default behaviour somewhat.
> Patch will be right up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1280) language-identifier should have option to use detected value by Tika even when uncertain

Posted by "Ferdy Galema (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ferdy Galema updated NUTCH-1280:
--------------------------------

    Priority: Minor  (was: Major)
    
> language-identifier should have option to use detected value by Tika even when uncertain
> ----------------------------------------------------------------------------------------
>
>                 Key: NUTCH-1280
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1280
>             Project: Nutch
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Ferdy Galema
>            Priority: Minor
>             Fix For: nutchgora
>
>
> Nutchtrunk has an option "lang.identification.only.certain", this should be the case for Nutchgora too. Note that it is set default to false. So this changes the default behaviour somewhat.
> Patch will be right up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1280) language-identifier should have option to use detected value by Tika even when uncertain

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212350#comment-13212350 ] 

Hudson commented on NUTCH-1280:
-------------------------------

Integrated in Nutch-nutchgora #168 (See [https://builds.apache.org/job/Nutch-nutchgora/168/])
    NUTCH-1280 language-identifier should have option to use detected value by Tika even when uncertain (Revision 1291165)

     Result = SUCCESS
ferdy : 
Files : 
* /nutch/branches/nutchgora/CHANGES.txt
* /nutch/branches/nutchgora/conf/nutch-default.xml
* /nutch/branches/nutchgora/src/plugin/language-identifier/src/java/org/apache/nutch/analysis/lang/HTMLLanguageParser.java

                
> language-identifier should have option to use detected value by Tika even when uncertain
> ----------------------------------------------------------------------------------------
>
>                 Key: NUTCH-1280
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1280
>             Project: Nutch
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Ferdy Galema
>            Priority: Minor
>             Fix For: nutchgora
>
>         Attachments: NUTCH-1280.txt
>
>
> Nutchtrunk has an option "lang.identification.only.certain", this should be the case for Nutchgora too. Note that it is set default to false. So this changes the default behaviour somewhat.
> Patch will be right up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (NUTCH-1280) language-identifier should have option to use detected value by Tika even when uncertain

Posted by "Ferdy Galema (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ferdy Galema resolved NUTCH-1280.
---------------------------------

    Resolution: Fixed

committed
                
> language-identifier should have option to use detected value by Tika even when uncertain
> ----------------------------------------------------------------------------------------
>
>                 Key: NUTCH-1280
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1280
>             Project: Nutch
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Ferdy Galema
>            Priority: Minor
>             Fix For: nutchgora
>
>         Attachments: NUTCH-1280.txt
>
>
> Nutchtrunk has an option "lang.identification.only.certain", this should be the case for Nutchgora too. Note that it is set default to false. So this changes the default behaviour somewhat.
> Patch will be right up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira