You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Dan Rosher (Created) (JIRA)" <ji...@apache.org> on 2012/03/08 15:22:00 UTC
[jira] [Created] (NUTCH-1306) Commit after finished writing to solr
index
Commit after finished writing to solr index
-------------------------------------------
Key: NUTCH-1306
URL: https://issues.apache.org/jira/browse/NUTCH-1306
Project: Nutch
Issue Type: Improvement
Components: indexer
Affects Versions: nutchgora
Reporter: Dan Rosher
Priority: Trivial
Fix For: nutchgora
Attachments: NUTCH-1306.patch
Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1306) Add option to not commit and clarify
existing solr.commit.size
Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ferdy Galema updated NUTCH-1306:
--------------------------------
Summary: Add option to not commit and clarify existing solr.commit.size (was: Commit after finished writing to solr index)
> Add option to not commit and clarify existing solr.commit.size
> --------------------------------------------------------------
>
> Key: NUTCH-1306
> URL: https://issues.apache.org/jira/browse/NUTCH-1306
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: nutchgora
> Reporter: Dan Rosher
> Priority: Trivial
> Fix For: 2.1
>
> Attachments: NUTCH-1306-trunk-v2.patch, NUTCH-1306-trunk.patch, NUTCH-1306-v2.patch, NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1306) Commit after finished writing to solr
index
Posted by "Dan Rosher (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dan Rosher updated NUTCH-1306:
------------------------------
Attachment: NUTCH-1306.patch
> Commit after finished writing to solr index
> -------------------------------------------
>
> Key: NUTCH-1306
> URL: https://issues.apache.org/jira/browse/NUTCH-1306
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: nutchgora
> Reporter: Dan Rosher
> Priority: Trivial
> Fix For: nutchgora
>
> Attachments: NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1306) Commit after finished writing to solr
index
Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney updated NUTCH-1306:
----------------------------------------
Fix Version/s: (was: nutchgora)
2.1
Set and Classify
> Commit after finished writing to solr index
> -------------------------------------------
>
> Key: NUTCH-1306
> URL: https://issues.apache.org/jira/browse/NUTCH-1306
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: nutchgora
> Reporter: Dan Rosher
> Priority: Trivial
> Fix For: 2.1
>
> Attachments: NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1306) Commit after finished writing to
solr index
Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272599#comment-13272599 ]
Lewis John McGibbney commented on NUTCH-1306:
---------------------------------------------
I've just stumbled across NUTCH-1025
There seems to be a Gora import in your trunk patch... but that won't be ready for committing for a wee while anyway :0)
> Commit after finished writing to solr index
> -------------------------------------------
>
> Key: NUTCH-1306
> URL: https://issues.apache.org/jira/browse/NUTCH-1306
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: nutchgora
> Reporter: Dan Rosher
> Priority: Trivial
> Fix For: 2.1
>
> Attachments: NUTCH-1306-trunk.patch, NUTCH-1306-v2.patch, NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1306) Commit after finished writing to solr
index
Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ferdy Galema updated NUTCH-1306:
--------------------------------
Attachment: NUTCH-1306-trunk-v2.patch
Heh indeed that's not ready for committing yet. Weird though that my workspace did not get a compile error at first, only after refreshing the ivy deps. (Somehow it fetched a Gora library).
Anyway I've uploaded an updated patch.
I was not aware of NUTCH-1025. Is it ok if we incorporate that issue and rename this issue to "Add option to not commit and clarify existing solr.commit.size"?
> Commit after finished writing to solr index
> -------------------------------------------
>
> Key: NUTCH-1306
> URL: https://issues.apache.org/jira/browse/NUTCH-1306
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: nutchgora
> Reporter: Dan Rosher
> Priority: Trivial
> Fix For: 2.1
>
> Attachments: NUTCH-1306-trunk-v2.patch, NUTCH-1306-trunk.patch, NUTCH-1306-v2.patch, NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1306) Add option to not commit and
clarify existing solr.commit.size
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410229#comment-13410229 ]
Hudson commented on NUTCH-1306:
-------------------------------
Integrated in Nutch-trunk #1892 (See [https://builds.apache.org/job/Nutch-trunk/1892/])
NUTCH-1306 Add option to not commit and clarify existing solr.commit.size (Revision 1359073)
Result = SUCCESS
ferdy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359073
Files :
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/conf/nutch-default.xml
* /nutch/trunk/src/java/org/apache/nutch/indexer/solr/SolrConstants.java
* /nutch/trunk/src/java/org/apache/nutch/indexer/solr/SolrIndexer.java
> Add option to not commit and clarify existing solr.commit.size
> --------------------------------------------------------------
>
> Key: NUTCH-1306
> URL: https://issues.apache.org/jira/browse/NUTCH-1306
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: nutchgora
> Reporter: Dan Rosher
> Priority: Trivial
> Fix For: 2.1
>
> Attachments: NUTCH-1306-trunk-v2.patch, NUTCH-1306-trunk-v3.patch, NUTCH-1306-trunk.patch, NUTCH-1306-v2.patch, NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1306) Commit after finished writing to
solr index
Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262592#comment-13262592 ]
Lewis John McGibbney commented on NUTCH-1306:
---------------------------------------------
Having reviewed similar work that has been integrated in 1.X trunk (namely the issues I highlight above) we should remain consistent with principal that we should either commit always and have an option not to commit, or the other way around. Since NutchGora doesn't commit at all, we favour the option to commit instead a noCommit, then with the option to do a noCommit by configuration.
It's at least clear it should be configurable one way or the other. Never committing or always committing is bad.
(Thanks Markus for the input)
> Commit after finished writing to solr index
> -------------------------------------------
>
> Key: NUTCH-1306
> URL: https://issues.apache.org/jira/browse/NUTCH-1306
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: nutchgora
> Reporter: Dan Rosher
> Priority: Trivial
> Fix For: 2.1
>
> Attachments: NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1306) Add option to not commit and clarify
existing solr.commit.size
Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ferdy Galema updated NUTCH-1306:
--------------------------------
Attachment: NUTCH-1306-trunk-v3.patch
minor bug in prev. patch. uploaded v3 of trunk patch.
> Add option to not commit and clarify existing solr.commit.size
> --------------------------------------------------------------
>
> Key: NUTCH-1306
> URL: https://issues.apache.org/jira/browse/NUTCH-1306
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: nutchgora
> Reporter: Dan Rosher
> Priority: Trivial
> Fix For: 2.1
>
> Attachments: NUTCH-1306-trunk-v2.patch, NUTCH-1306-trunk-v3.patch, NUTCH-1306-trunk.patch, NUTCH-1306-v2.patch, NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1306) Commit after finished writing to solr
index
Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ferdy Galema updated NUTCH-1306:
--------------------------------
Attachment: NUTCH-1306-v2.patch
NUTCH-1306-trunk.patch
Agree with trying to make both branches to match each other.
By the way there is a commit done after the whole job completes. (I previously thought there was no commit at all, but I was wrong). But, if this is the case, then the commit after closing a single indexwriter is not needed. (So the reason Dan is not seeing updates must have been a different problem).
Anyway, I've uploaded patches for making this committing after the job completes configurable. (But enabled by default). Let me know if there are comments.
> Commit after finished writing to solr index
> -------------------------------------------
>
> Key: NUTCH-1306
> URL: https://issues.apache.org/jira/browse/NUTCH-1306
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: nutchgora
> Reporter: Dan Rosher
> Priority: Trivial
> Fix For: 2.1
>
> Attachments: NUTCH-1306-trunk.patch, NUTCH-1306-v2.patch, NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (NUTCH-1306) Add option to not commit and clarify
existing solr.commit.size
Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ferdy Galema resolved NUTCH-1306.
---------------------------------
Resolution: Fixed
> Add option to not commit and clarify existing solr.commit.size
> --------------------------------------------------------------
>
> Key: NUTCH-1306
> URL: https://issues.apache.org/jira/browse/NUTCH-1306
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: nutchgora
> Reporter: Dan Rosher
> Priority: Trivial
> Fix For: 2.1
>
> Attachments: NUTCH-1306-trunk-v2.patch, NUTCH-1306-trunk-v3.patch, NUTCH-1306-trunk.patch, NUTCH-1306-v2.patch, NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1306) Add option to not commit and
clarify existing solr.commit.size
Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409366#comment-13409366 ]
Ferdy Galema commented on NUTCH-1306:
-------------------------------------
Committed in trunk and nutchgora. Thanks anyone for input.
> Add option to not commit and clarify existing solr.commit.size
> --------------------------------------------------------------
>
> Key: NUTCH-1306
> URL: https://issues.apache.org/jira/browse/NUTCH-1306
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: nutchgora
> Reporter: Dan Rosher
> Priority: Trivial
> Fix For: 2.1
>
> Attachments: NUTCH-1306-trunk-v2.patch, NUTCH-1306-trunk-v3.patch, NUTCH-1306-trunk.patch, NUTCH-1306-v2.patch, NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1306) Commit after finished writing to
solr index
Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272341#comment-13272341 ]
Lewis John McGibbney commented on NUTCH-1306:
---------------------------------------------
This is exactly the viewpoint I was coming from Ferdy. I set it for 2.1 as some (maybe minor)configuration had to be done to make this a more amiable solution.
Regarding the second half of your comment, yes it is rather confusing if I'm honest. Currently in trunk, within the write() method we send an update request, which is not a commit. Generally speaking, it appears that on a number of issues we seem to be communicating with the Solr server via different means/syntax...
It would be nice to atleast make an attempt to make Nutchgora and trunk see eye to eye on this one, as Trunk has some nice features w.r.t Solr which, over time, it would be nice to have in both versions.
> Commit after finished writing to solr index
> -------------------------------------------
>
> Key: NUTCH-1306
> URL: https://issues.apache.org/jira/browse/NUTCH-1306
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: nutchgora
> Reporter: Dan Rosher
> Priority: Trivial
> Fix For: 2.1
>
> Attachments: NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1306) Add option to not commit and
clarify existing solr.commit.size
Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406363#comment-13406363 ]
Ferdy Galema commented on NUTCH-1306:
-------------------------------------
New option added solr.commit.index
Defaults to true: Commit after index. Will commit to trunk and nutchgora on no objection.
> Add option to not commit and clarify existing solr.commit.size
> --------------------------------------------------------------
>
> Key: NUTCH-1306
> URL: https://issues.apache.org/jira/browse/NUTCH-1306
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: nutchgora
> Reporter: Dan Rosher
> Priority: Trivial
> Fix For: 2.1
>
> Attachments: NUTCH-1306-trunk-v2.patch, NUTCH-1306-trunk.patch, NUTCH-1306-v2.patch, NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1306) Commit after finished writing to
solr index
Posted by "Lewis John McGibbney (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245798#comment-13245798 ]
Lewis John McGibbney commented on NUTCH-1306:
---------------------------------------------
Hi Dan. In trunk, we have a number of nice features which I would like to bring to your attention. Maybe you can comment on whether you would like to see some of them go into Nutchgora?
Namely, NUTCH-1185, NUTCH-1000, NUTCH-996, NUTCH-991 and NUTCH-799
wdyt?
> Commit after finished writing to solr index
> -------------------------------------------
>
> Key: NUTCH-1306
> URL: https://issues.apache.org/jira/browse/NUTCH-1306
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: nutchgora
> Reporter: Dan Rosher
> Priority: Trivial
> Fix For: nutchgora
>
> Attachments: NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1306) Commit after finished writing to
solr index
Posted by "Ferdy Galema (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272231#comment-13272231 ]
Ferdy Galema commented on NUTCH-1306:
-------------------------------------
Lewis,
Do you suggest to add the commit as implemented by the fix but make it conditional? Something like this:
if (getConf().getBoolean("solr.commit", true)) {
solr.commit()
}
This makes it enabled by default. I think it is a good idea.
Secondly, you say that Nutchgora does not commit at all. It looks like trunk does not commit either. I think it's a bit confusing the COMMIT_SIZE nutch property does no solr commit but rather 'flush' data to solr. Perhaps we could clarify this a bit more. (Update the property description by mentioning the fact that it does NOT trigger a solr commit.) Agree?
> Commit after finished writing to solr index
> -------------------------------------------
>
> Key: NUTCH-1306
> URL: https://issues.apache.org/jira/browse/NUTCH-1306
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: nutchgora
> Reporter: Dan Rosher
> Priority: Trivial
> Fix For: 2.1
>
> Attachments: NUTCH-1306.patch
>
>
> Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira