You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Ferdy Galema (Created) (JIRA)" <ji...@apache.org> on 2012/02/16 11:43:05 UTC
[jira] [Created] (NUTCH-1280) language-identifier should have
option to use detected value by Tika even when uncertain
language-identifier should have option to use detected value by Tika even when uncertain
----------------------------------------------------------------------------------------
Key: NUTCH-1280
URL: https://issues.apache.org/jira/browse/NUTCH-1280
Project: Nutch
Issue Type: New Feature
Components: parser
Reporter: Ferdy Galema
Fix For: nutchgora
Nutchtrunk has an option "lang.identification.only.certain", this should be the case for Nutchgora too. Note that it is set default to false. So this changes the default behaviour somewhat.
Patch will be right up.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1280) language-identifier should have
option to use detected value by Tika even when uncertain
Posted by "Ferdy Galema (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ferdy Galema updated NUTCH-1280:
--------------------------------
Attachment: NUTCH-1280.txt
> language-identifier should have option to use detected value by Tika even when uncertain
> ----------------------------------------------------------------------------------------
>
> Key: NUTCH-1280
> URL: https://issues.apache.org/jira/browse/NUTCH-1280
> Project: Nutch
> Issue Type: New Feature
> Components: parser
> Reporter: Ferdy Galema
> Priority: Minor
> Fix For: nutchgora
>
> Attachments: NUTCH-1280.txt
>
>
> Nutchtrunk has an option "lang.identification.only.certain", this should be the case for Nutchgora too. Note that it is set default to false. So this changes the default behaviour somewhat.
> Patch will be right up.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1280) language-identifier should have
option to use detected value by Tika even when uncertain
Posted by "Ferdy Galema (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ferdy Galema updated NUTCH-1280:
--------------------------------
Priority: Minor (was: Major)
> language-identifier should have option to use detected value by Tika even when uncertain
> ----------------------------------------------------------------------------------------
>
> Key: NUTCH-1280
> URL: https://issues.apache.org/jira/browse/NUTCH-1280
> Project: Nutch
> Issue Type: New Feature
> Components: parser
> Reporter: Ferdy Galema
> Priority: Minor
> Fix For: nutchgora
>
>
> Nutchtrunk has an option "lang.identification.only.certain", this should be the case for Nutchgora too. Note that it is set default to false. So this changes the default behaviour somewhat.
> Patch will be right up.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1280) language-identifier should have
option to use detected value by Tika even when uncertain
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212350#comment-13212350 ]
Hudson commented on NUTCH-1280:
-------------------------------
Integrated in Nutch-nutchgora #168 (See [https://builds.apache.org/job/Nutch-nutchgora/168/])
NUTCH-1280 language-identifier should have option to use detected value by Tika even when uncertain (Revision 1291165)
Result = SUCCESS
ferdy :
Files :
* /nutch/branches/nutchgora/CHANGES.txt
* /nutch/branches/nutchgora/conf/nutch-default.xml
* /nutch/branches/nutchgora/src/plugin/language-identifier/src/java/org/apache/nutch/analysis/lang/HTMLLanguageParser.java
> language-identifier should have option to use detected value by Tika even when uncertain
> ----------------------------------------------------------------------------------------
>
> Key: NUTCH-1280
> URL: https://issues.apache.org/jira/browse/NUTCH-1280
> Project: Nutch
> Issue Type: New Feature
> Components: parser
> Reporter: Ferdy Galema
> Priority: Minor
> Fix For: nutchgora
>
> Attachments: NUTCH-1280.txt
>
>
> Nutchtrunk has an option "lang.identification.only.certain", this should be the case for Nutchgora too. Note that it is set default to false. So this changes the default behaviour somewhat.
> Patch will be right up.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (NUTCH-1280) language-identifier should have
option to use detected value by Tika even when uncertain
Posted by "Ferdy Galema (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ferdy Galema resolved NUTCH-1280.
---------------------------------
Resolution: Fixed
committed
> language-identifier should have option to use detected value by Tika even when uncertain
> ----------------------------------------------------------------------------------------
>
> Key: NUTCH-1280
> URL: https://issues.apache.org/jira/browse/NUTCH-1280
> Project: Nutch
> Issue Type: New Feature
> Components: parser
> Reporter: Ferdy Galema
> Priority: Minor
> Fix For: nutchgora
>
> Attachments: NUTCH-1280.txt
>
>
> Nutchtrunk has an option "lang.identification.only.certain", this should be the case for Nutchgora too. Note that it is set default to false. So this changes the default behaviour somewhat.
> Patch will be right up.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira