You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Ishan Chattopadhyaya (JIRA)" <ji...@apache.org> on 2016/06/06 18:55:21 UTC

[jira] [Commented] (SOLR-5944) Support updates of numeric DocValues

[ https://issues.apache.org/jira/browse/SOLR-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316984#comment-15316984 ]

Ishan Chattopadhyaya commented on SOLR-5944:
--------------------------------------------

Fixed two bugs:
1. Added a check while writing a document to exclude anything to do with the id field.
2. Added an exception when a "set" or "inc" operation is attempted at a non-existent document.

Review comments:

bq. * in general, all these tests seem to depend on autoCommit being disabled, and use a config that is setup that way, but don't actaully assert that it's true in case someone changes the configs in the future
TODO

bq. * TestInPlaceUpdate
Renamed this test to TestInPlaceUpdatesStandalone.

bq. ** SuppressCodecs should be removed
Removed all non 3x, 4x codec suppression. They need to be suppressed as per a comment from Mikhail. https://issues.apache.org/jira/browse/LUCENE-5189?focusedCommentId=13958205&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13958205

bq. ** should at least have class level javadocs explaining what's being tested
TODO

** testUpdatingDocValues
bq. *** for addAndGetVersion calls where we don't care about the returned version, don't bother assigning it to a variable (distracting)
Fixed
bq. *** for addAndGetVersion calls where we do care about the returned version, we need check it for every update to that doc...
TODO
bq. **** currently version1 is compared to newVersion1 to assert that an update incrememnted the version, but in between those 2 updates are 4 other places where that document was updated -- we have to assert it has the expected value (either the same as before, or new - and if new record it) after all of those addAndGetVersion calls, or we can't be sure where/why/how a bug exists if that existing comparison fails.
TODO
bq. **** ideally we should be asserting the version of every doc when we query it right along side the assertion for it's updated "ratings" value
TODO
bq. *** most of the use of "field(ratings)" can probbaly just be replaced with "ratings" now that DV are returnable -- although it's nice to have both included in the test at least once to demo that both work, but when doing that there should be a comment making it clear why
Fixed
** testOnlyPartialUpdatesBetweenCommits
*** ditto comment about checking return value from addAndGetVersion
bq. *** this also seems like a good place to to test if doing a redundent atomic update (either via set to the same value or via inc=0) returns a new version or not -- should it?
As of now, both will generate a new version. I think "inc" 0 should be dropped, and "set" same value should be versioned. I'll check if behaviour in this patch is at par with regular atomic updates; and if so, will open a separate issue for this later.
bq. ** DocInfo should be a private static class and have some javadocs
Fixed
bq. ** based on how testing has gone so far, and the discover of LUCENE-7301 it seems clear that adding even single thread, single node, randomized testing of lots of diff types of add/update calls would be good to add
I think we can do the same in TestStressInPlaceUpdates, by randomly setting number of writer threads to 1 sometimes.
bq. *** we could refactor/improve the "checkReplay" function I added in the last patch to do more testing of a randomly generated Iterable of "commands" (commit, doc, doc+atomic mutation, etc...)
TODO
bq. *** and of course: improve checkReplay to verify RTG against hte uncommited model as well
I couldn't find a way to do this for the TestInPlaceUpdate (now called TestInPlaceUpdatesStandalone in this patch). This is based on SolrTestCaseJ4.

bq. *** testReplayFromFile and getSDdoc should probably go away once we have more structured tests for doing this
Fixed

bq. ** createMap can be elimianted -- callers can just use SolrTestCaseJ4.map(...)
Fixed

bq. ** In general the tests in this class should include more queries / sorting against the updated docvalues field after commits to ensure that the updated value is searchable & sortable
TODO

bq. ** Likewise the test methods in this class should probably have a lot more RTG checks -- with filter queries that constrain against the updated docvalues field, and checks of the expected version field -- to ensure that is all working properly.
Couldn't figure out how to do RTGs with this test, but will check RTGs + filter queries in the TestInPlaceUpdatesDistrib test (which was formerly InPlaceUpdateDistribTest)

bq. * InPlaceUpdateDistribTest
Renamed to TestInPlaceUpdatesDistrib now

bq. ** SuppressCodecs should be removed
3x and 4x codec suppressions cannot be removed.
bq. ** should at least have class level javadocs explaining what's being tested
TODO
bq. ** Once LUCENE-7301 is fixed and we can demonstate that this passes reliably all of the time, we should ideally refactor this to subclass SolrCloudTestCase
TODO
bq. ** in general, the "pick a random client" logic should be refactored so that sometimes it randomly picks a CloudSolrClient
TODO
bq. ** there should almost certianly be some "delete all docs and optimize" cleanup in between all of these tests
Added this to the beginning of every test. I couldn't see an easy way to refactor to SolrCloudTestCase immediately, but I can look more later.

** docValuesUpdateTest
bq. *** should randomize numdocs
Fixed
bq. *** we need to find away to eliminate the hardcoded "Thread.sleep(500);" calls...
Fixed (I think it is the correct fix, but I'll review it again, testing with a fast machine and beasting it).
bq. **** if initially no docs have a rating value, then make the (first) test query be for {{rating:\[\* TO \*\]}} and execute it in a rety loop until the numFound matches numDocs.
TODO
bq. **** likewise if we ensure all ratings have a value such that abs(ratings) < X, then the second update can use an increment such that abs(inc) > X\*3 and we can use {{-ratings:\[-X TO X\]}} as the query in a retry loop
TODO
** ensureRtgWorksWithPartialUpdatesTest
bq. *** even if we're only going to test one doc, we should ensure there are a random num docs in the index (some before the doc we're editing, and some after)
TODO
bq. *** if we're testing RTG, then we should be testing the version returned from every /get call against the last version returned from every update
TODO
** outOfOrderUpdatesIndividualReplicaTest
bq. *** ditto comments about only one doc
TODO
bq. *** ditto comments about testing the expected version in RTG requests
TODO
bq. *** if we're sending updates direct to replicas to test how they handle out of order updates, then something better assert exactly where the leader is hosted and ensure we don't send to it by mistake
TODO
bq. *** what's the point of using a threadpool and SendUpdateToReplicaTask here? why not just send the updates in a (randdomly assigned) determinisitc order?
TODO
bq. **** if we are going to use an ExecutorService, then the result of awaitTermination has to be checked
TODO
*** since this tests puts replicas out of sync, a "delete all docs and optimize" followed up "wait for recovers" should happen at the end of this test (or just in between every test) .. especially if we refactor it (or to protect someone in the future who might refactor it)
TODO
** delayedReorderingFetchesMissingUpdateFromLeaderTest
bq. *** ditto previous comments about using a threadpool and SendUpdateToReplicaTask
TODO
bq. *** Is there no way we can programatically tell if LIR has kicked in? ... pehaps by setting a ZK watch? ... this "Thread.sleep(500);" is no garuntee and seens arbitrary.
TODO.
** getReplicaValue
bq. *** using SolrTestCaseJ4.params(...) would make this method a lot shorter
Fixed
bq. *** based on where/how this method is used, i don't understand why it returns String instead of just Object
Fixed
bq. ** assertReplicaValue
bq. *** should take in some sort of assertion message and/or build/append an assertion message using the clientId
Fixed
bq. *** if getReplicaValue returns an Object, this can take an "Object expected" param and eliminate abonch of toString & string concating throughout the test
Fixed
** simulatedUpdateRequest
bq. *** if this method is going to assume that the only shard you ever want to similulate an update to is SHARD1 then the method name should be "simulatedUpdateRequestToShard1Replica"
TODO
bq. *** better still - why not ask the DocRouter which shard this doc belongs in, and fetch the leader URL tha way?
TODO
bq. ** most usage of "addFields" can just be replaced with a call to "sdoc(...)" to simplify code
Fixed
bq. ** replace createMap usage with SolrTestCaseJ4.map
Fixed
bq. ** why override tearDown if we're just calling super?
Fixed
bq. ** in general, i think this test would be a lot easier to read if there were well named variables for HttpSolrClient instances pointed at specific replicas (ie HttpSolrClient SHARD1_LEADER = ...; HttpSolrClient SHARD1_REPLICA1 = ...; etc...) and passed those around to the various methods instead of magic ints (ie: "1", "2") to refer to which index in the static clients list should be used for a given update.
Fixed: LEADER and NON_LEADERS now

* TestStressInPlaceUpdates
TODO

* SolrDocument
bq. ** applyOlderUpdate
Fixed, moved it to UpdateLog. I think Noble had moved this from UpdateLog to SolrDocument; and if so, I'll check with him once (after I'm done with other changes here marked as TODO).
bq. *** as mentioned before, i don't think this method is correct when it comes to multivalued fields, and should have more unit tests of various permutations to be sure
TODO
bq. *** no matter where it lives, it should have some javadocs
Fixed

> Support updates of numeric DocValues
> ------------------------------------
>
> Key: SOLR-5944
> URL: https://issues.apache.org/jira/browse/SOLR-5944
> Project: Solr
> Issue Type: New Feature
> Reporter: Ishan Chattopadhyaya
> Assignee: Shalin Shekhar Mangar
> Attachments: DUP.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, TestStressInPlaceUpdates.eb044ac71.beast-167-failure.stdout.txt, TestStressInPlaceUpdates.eb044ac71.beast-587-failure.stdout.txt, TestStressInPlaceUpdates.eb044ac71.failures.tar.gz, hoss.62D328FA1DEA57FD.fail.txt, hoss.62D328FA1DEA57FD.fail2.txt, hoss.62D328FA1DEA57FD.fail3.txt, hoss.D768DD9443A98DC.fail.txt, hoss.D768DD9443A98DC.pass.txt
>
>
> LUCENE-5189 introduced support for updates to numeric docvalues. It would be really nice to have Solr support this.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org