You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2011/07/14 18:08:59 UTC

[jira] [Created] (NUTCH-1051) Export WebGraph node scores for solr.ExternalFileField

Export WebGraph node scores for solr.ExternalFileField
------------------------------------------------------

                 Key: NUTCH-1051
                 URL: https://issues.apache.org/jira/browse/NUTCH-1051
             Project: Nutch
          Issue Type: Improvement
            Reporter: Markus Jelsma
             Fix For: 1.4


The current webgraph.NodeDumper dumps a flat <url>\t<float>\n file, which is almost exactly what is needed for using ExternalFileField in Solr. This issue tracks the option to add to dump it in the proper format. Using EFF we can update scores without reindexing millions of documents. There's one caveat, Solr won't accept an equals-sign in the key but there's a small patch for this in SOLR-2545.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1051) Export WebGraph node scores for solr.ExternalFileField

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-1051:
---------------------------------

    Attachment: NUTCH-1051-1.4-1.patch

Patch for 1.4. Use the -asEff switch to output the scores in a format suitable for ExternalFileField. An improvement would be that this switch implies the -scores switch.

Uses mapred.textoutputformat.separator to get the equals-sign in place.

> Export WebGraph node scores for solr.ExternalFileField
> ------------------------------------------------------
>
>                 Key: NUTCH-1051
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1051
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4
>
>         Attachments: NUTCH-1051-1.4-1.patch
>
>
> The current webgraph.NodeDumper dumps a flat <url>\t<float>\n file, which is almost exactly what is needed for using ExternalFileField in Solr. This issue tracks the option to add to dump it in the proper format. Using EFF we can update scores without reindexing millions of documents. There's one caveat, Solr won't accept an equals-sign in the key but there's a small patch for this in SOLR-2545.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1051) Export WebGraph node scores for solr.ExternalFileField

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-1051:
---------------------------------

    Patch Info: [Patch Available]
      Assignee: Markus Jelsma

> Export WebGraph node scores for solr.ExternalFileField
> ------------------------------------------------------
>
>                 Key: NUTCH-1051
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1051
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4
>
>
> The current webgraph.NodeDumper dumps a flat <url>\t<float>\n file, which is almost exactly what is needed for using ExternalFileField in Solr. This issue tracks the option to add to dump it in the proper format. Using EFF we can update scores without reindexing millions of documents. There's one caveat, Solr won't accept an equals-sign in the key but there's a small patch for this in SOLR-2545.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1051) Export WebGraph node scores for solr.ExternalFileField

Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085800#comment-13085800 ] 

Julien Nioche commented on NUTCH-1051:
--------------------------------------

+1 Haven't tested it but it looks OK - not a very intrusive patch anyway

> Export WebGraph node scores for solr.ExternalFileField
> ------------------------------------------------------
>
>                 Key: NUTCH-1051
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1051
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4
>
>         Attachments: NUTCH-1051-1.4-1.patch
>
>
> The current webgraph.NodeDumper dumps a flat <url>\t<float>\n file, which is almost exactly what is needed for using ExternalFileField in Solr. This issue tracks the option to add to dump it in the proper format. Using EFF we can update scores without reindexing millions of documents. There's one caveat, Solr won't accept an equals-sign in the key but there's a small patch for this in SOLR-2545.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (NUTCH-1051) Export WebGraph node scores for solr.ExternalFileField

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma closed NUTCH-1051.
--------------------------------


> Export WebGraph node scores for solr.ExternalFileField
> ------------------------------------------------------
>
>                 Key: NUTCH-1051
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1051
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4
>
>         Attachments: NUTCH-1051-1.4-1.patch
>
>
> The current webgraph.NodeDumper dumps a flat <url>\t<float>\n file, which is almost exactly what is needed for using ExternalFileField in Solr. This issue tracks the option to add to dump it in the proper format. Using EFF we can update scores without reindexing millions of documents. There's one caveat, Solr won't accept an equals-sign in the key but there's a small patch for this in SOLR-2545.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1051) Export WebGraph node scores for solr.ExternalFileField

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085785#comment-13085785 ] 

Markus Jelsma commented on NUTCH-1051:
--------------------------------------

If there are no objection i'll commit this one shortly.

> Export WebGraph node scores for solr.ExternalFileField
> ------------------------------------------------------
>
>                 Key: NUTCH-1051
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1051
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4
>
>         Attachments: NUTCH-1051-1.4-1.patch
>
>
> The current webgraph.NodeDumper dumps a flat <url>\t<float>\n file, which is almost exactly what is needed for using ExternalFileField in Solr. This issue tracks the option to add to dump it in the proper format. Using EFF we can update scores without reindexing millions of documents. There's one caveat, Solr won't accept an equals-sign in the key but there's a small patch for this in SOLR-2545.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (NUTCH-1051) Export WebGraph node scores for solr.ExternalFileField

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma resolved NUTCH-1051.
----------------------------------

    Resolution: Fixed

Committed for 1.4 in rev. 1158357.
Thanks!

> Export WebGraph node scores for solr.ExternalFileField
> ------------------------------------------------------
>
>                 Key: NUTCH-1051
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1051
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.4
>
>         Attachments: NUTCH-1051-1.4-1.patch
>
>
> The current webgraph.NodeDumper dumps a flat <url>\t<float>\n file, which is almost exactly what is needed for using ExternalFileField in Solr. This issue tracks the option to add to dump it in the proper format. Using EFF we can update scores without reindexing millions of documents. There's one caveat, Solr won't accept an equals-sign in the key but there's a small patch for this in SOLR-2545.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira