You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Dan Rosher (Created) (JIRA)" <ji...@apache.org> on 2012/03/08 15:22:00 UTC

[jira] [Created] (NUTCH-1306) Commit after finished writing to solr index

Commit after finished writing to solr index
-------------------------------------------

                 Key: NUTCH-1306
                 URL: https://issues.apache.org/jira/browse/NUTCH-1306
             Project: Nutch
          Issue Type: Improvement
          Components: indexer
    Affects Versions: nutchgora
            Reporter: Dan Rosher
            Priority: Trivial
             Fix For: nutchgora
         Attachments: NUTCH-1306.patch

Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1306) Add option to not commit and clarify existing solr.commit.size

Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ferdy Galema updated NUTCH-1306:
--------------------------------

    Summary: Add option to not commit and clarify existing solr.commit.size  (was: Commit after finished writing to solr index)
    
> Add option to not commit and clarify existing solr.commit.size
> --------------------------------------------------------------
>
>                 Key: NUTCH-1306
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1306
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: nutchgora
>            Reporter: Dan Rosher
>            Priority: Trivial
>             Fix For: 2.1
>
>         Attachments: NUTCH-1306-trunk-v2.patch, NUTCH-1306-trunk.patch, NUTCH-1306-v2.patch, NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1306) Commit after finished writing to solr index

Posted by "Dan Rosher (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dan Rosher updated NUTCH-1306:
------------------------------

    Attachment: NUTCH-1306.patch
    
> Commit after finished writing to solr index
> -------------------------------------------
>
>                 Key: NUTCH-1306
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1306
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: nutchgora
>            Reporter: Dan Rosher
>            Priority: Trivial
>             Fix For: nutchgora
>
>         Attachments: NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1306) Commit after finished writing to solr index

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated NUTCH-1306:
----------------------------------------

    Fix Version/s:     (was: nutchgora)
                   2.1

Set and Classify
                
> Commit after finished writing to solr index
> -------------------------------------------
>
>                 Key: NUTCH-1306
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1306
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: nutchgora
>            Reporter: Dan Rosher
>            Priority: Trivial
>             Fix For: 2.1
>
>         Attachments: NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1306) Commit after finished writing to solr index

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272599#comment-13272599 ] 

Lewis John McGibbney commented on NUTCH-1306:
---------------------------------------------

I've just stumbled across NUTCH-1025
There seems to be a Gora import in your trunk patch... but that won't be ready for committing for a wee while anyway :0) 
                
> Commit after finished writing to solr index
> -------------------------------------------
>
>                 Key: NUTCH-1306
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1306
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: nutchgora
>            Reporter: Dan Rosher
>            Priority: Trivial
>             Fix For: 2.1
>
>         Attachments: NUTCH-1306-trunk.patch, NUTCH-1306-v2.patch, NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1306) Commit after finished writing to solr index

Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ferdy Galema updated NUTCH-1306:
--------------------------------

    Attachment: NUTCH-1306-trunk-v2.patch

Heh indeed that's not ready for committing yet. Weird though that my workspace did not get a compile error at first, only after refreshing the ivy deps. (Somehow it fetched a Gora library).

Anyway I've uploaded an updated patch.

I was not aware of NUTCH-1025. Is it ok if we incorporate that issue and rename this issue to "Add option to not commit and clarify existing solr.commit.size"?
                
> Commit after finished writing to solr index
> -------------------------------------------
>
>                 Key: NUTCH-1306
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1306
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: nutchgora
>            Reporter: Dan Rosher
>            Priority: Trivial
>             Fix For: 2.1
>
>         Attachments: NUTCH-1306-trunk-v2.patch, NUTCH-1306-trunk.patch, NUTCH-1306-v2.patch, NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1306) Add option to not commit and clarify existing solr.commit.size

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410229#comment-13410229 ] 

Hudson commented on NUTCH-1306:
-------------------------------

Integrated in Nutch-trunk #1892 (See [https://builds.apache.org/job/Nutch-trunk/1892/])
    NUTCH-1306 Add option to not commit and clarify existing solr.commit.size (Revision 1359073)

     Result = SUCCESS
ferdy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359073
Files : 
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/conf/nutch-default.xml
* /nutch/trunk/src/java/org/apache/nutch/indexer/solr/SolrConstants.java
* /nutch/trunk/src/java/org/apache/nutch/indexer/solr/SolrIndexer.java

                
> Add option to not commit and clarify existing solr.commit.size
> --------------------------------------------------------------
>
>                 Key: NUTCH-1306
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1306
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: nutchgora
>            Reporter: Dan Rosher
>            Priority: Trivial
>             Fix For: 2.1
>
>         Attachments: NUTCH-1306-trunk-v2.patch, NUTCH-1306-trunk-v3.patch, NUTCH-1306-trunk.patch, NUTCH-1306-v2.patch, NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1306) Commit after finished writing to solr index

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262592#comment-13262592 ] 

Lewis John McGibbney commented on NUTCH-1306:
---------------------------------------------

Having reviewed similar work that has been integrated in 1.X trunk (namely the issues I highlight above) we should remain consistent with principal that we should either commit always and have an option not to commit, or the other way around. Since NutchGora doesn't commit at all, we favour the option to commit instead a noCommit, then with the option to do a noCommit by configuration.

It's at least clear it should be configurable one way or the other. Never committing or always committing is bad.

(Thanks Markus for the input)
                
> Commit after finished writing to solr index
> -------------------------------------------
>
>                 Key: NUTCH-1306
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1306
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: nutchgora
>            Reporter: Dan Rosher
>            Priority: Trivial
>             Fix For: 2.1
>
>         Attachments: NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1306) Add option to not commit and clarify existing solr.commit.size

Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ferdy Galema updated NUTCH-1306:
--------------------------------

    Attachment: NUTCH-1306-trunk-v3.patch

minor bug in prev. patch. uploaded v3 of trunk patch.
                
> Add option to not commit and clarify existing solr.commit.size
> --------------------------------------------------------------
>
>                 Key: NUTCH-1306
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1306
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: nutchgora
>            Reporter: Dan Rosher
>            Priority: Trivial
>             Fix For: 2.1
>
>         Attachments: NUTCH-1306-trunk-v2.patch, NUTCH-1306-trunk-v3.patch, NUTCH-1306-trunk.patch, NUTCH-1306-v2.patch, NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1306) Commit after finished writing to solr index

Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ferdy Galema updated NUTCH-1306:
--------------------------------

    Attachment: NUTCH-1306-v2.patch
                NUTCH-1306-trunk.patch

Agree with trying to make both branches to match each other.

By the way there is a commit done after the whole job completes. (I previously thought there was no commit at all, but I was wrong). But, if this is the case, then the commit after closing a single indexwriter is not needed. (So the reason Dan is not seeing updates must have been a different problem).

Anyway, I've uploaded patches for making this committing after the job completes configurable. (But enabled by default). Let me know if there are comments.
                
> Commit after finished writing to solr index
> -------------------------------------------
>
>                 Key: NUTCH-1306
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1306
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: nutchgora
>            Reporter: Dan Rosher
>            Priority: Trivial
>             Fix For: 2.1
>
>         Attachments: NUTCH-1306-trunk.patch, NUTCH-1306-v2.patch, NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (NUTCH-1306) Add option to not commit and clarify existing solr.commit.size

Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ferdy Galema resolved NUTCH-1306.
---------------------------------

    Resolution: Fixed
    
> Add option to not commit and clarify existing solr.commit.size
> --------------------------------------------------------------
>
>                 Key: NUTCH-1306
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1306
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: nutchgora
>            Reporter: Dan Rosher
>            Priority: Trivial
>             Fix For: 2.1
>
>         Attachments: NUTCH-1306-trunk-v2.patch, NUTCH-1306-trunk-v3.patch, NUTCH-1306-trunk.patch, NUTCH-1306-v2.patch, NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1306) Add option to not commit and clarify existing solr.commit.size

Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409366#comment-13409366 ] 

Ferdy Galema commented on NUTCH-1306:
-------------------------------------

Committed in trunk and nutchgora. Thanks anyone for input.
                
> Add option to not commit and clarify existing solr.commit.size
> --------------------------------------------------------------
>
>                 Key: NUTCH-1306
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1306
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: nutchgora
>            Reporter: Dan Rosher
>            Priority: Trivial
>             Fix For: 2.1
>
>         Attachments: NUTCH-1306-trunk-v2.patch, NUTCH-1306-trunk-v3.patch, NUTCH-1306-trunk.patch, NUTCH-1306-v2.patch, NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1306) Commit after finished writing to solr index

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272341#comment-13272341 ] 

Lewis John McGibbney commented on NUTCH-1306:
---------------------------------------------

This is exactly the viewpoint I was coming from Ferdy. I set it for 2.1 as some (maybe minor)configuration had to be done to make this a more amiable solution. 

Regarding the second half of your comment, yes it is rather confusing if I'm honest. Currently in trunk, within the write() method we send an update request, which is not a commit. Generally speaking, it appears that on a number of issues we seem to be communicating with the Solr server via different means/syntax... 

It would be nice to atleast make an attempt to make Nutchgora and trunk see eye to eye on this one, as Trunk has some nice features w.r.t Solr which, over time, it would be nice to have in both versions.
                
> Commit after finished writing to solr index
> -------------------------------------------
>
>                 Key: NUTCH-1306
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1306
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: nutchgora
>            Reporter: Dan Rosher
>            Priority: Trivial
>             Fix For: 2.1
>
>         Attachments: NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1306) Add option to not commit and clarify existing solr.commit.size

Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406363#comment-13406363 ] 

Ferdy Galema commented on NUTCH-1306:
-------------------------------------

New option added solr.commit.index

Defaults to true: Commit after index. Will commit to trunk and nutchgora on no objection.
                
> Add option to not commit and clarify existing solr.commit.size
> --------------------------------------------------------------
>
>                 Key: NUTCH-1306
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1306
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: nutchgora
>            Reporter: Dan Rosher
>            Priority: Trivial
>             Fix For: 2.1
>
>         Attachments: NUTCH-1306-trunk-v2.patch, NUTCH-1306-trunk.patch, NUTCH-1306-v2.patch, NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1306) Commit after finished writing to solr index

Posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245798#comment-13245798 ] 

Lewis John McGibbney commented on NUTCH-1306:
---------------------------------------------

Hi Dan. In trunk, we have a number of nice features which I would like to bring to your attention. Maybe you can comment on whether you would like to see some of them go into Nutchgora?

Namely, NUTCH-1185, NUTCH-1000, NUTCH-996, NUTCH-991 and NUTCH-799

wdyt?
                
> Commit after finished writing to solr index
> -------------------------------------------
>
>                 Key: NUTCH-1306
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1306
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: nutchgora
>            Reporter: Dan Rosher
>            Priority: Trivial
>             Fix For: nutchgora
>
>         Attachments: NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1306) Commit after finished writing to solr index

Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272231#comment-13272231 ] 

Ferdy Galema commented on NUTCH-1306:
-------------------------------------

Lewis,

Do you suggest to add the commit as implemented by the fix but make it conditional? Something like this:

if (getConf().getBoolean("solr.commit", true)) {
  solr.commit()
}

This makes it enabled by default. I think it is a good idea.

Secondly, you say that Nutchgora does not commit at all. It looks like trunk does not commit either. I think it's a bit confusing the COMMIT_SIZE nutch property does no solr commit but rather 'flush' data to solr. Perhaps we could clarify this a bit more. (Update the property description by mentioning the fact that it does NOT trigger a solr commit.) Agree?
                
> Commit after finished writing to solr index
> -------------------------------------------
>
>                 Key: NUTCH-1306
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1306
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: nutchgora
>            Reporter: Dan Rosher
>            Priority: Trivial
>             Fix For: 2.1
>
>         Attachments: NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira