You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2020/01/16 13:32:35 UTC

[GitHub] [couchdb] garrensmith opened a new pull request #2460: Change map indexes to be stored in one row

garrensmith opened a new pull request #2460: Change map indexes to be stored in one row
URL: https://github.com/apache/couchdb/pull/2460
 
 
   ## Overview
   
   Changes map indexes to store the original key and value in a single
   FDB row.
   
   ## Testing recommendations
   
   All tests should pass
   
   ## Related Issues or Pull Requests
   
   <!-- If your changes affects multiple components in different
        repositories please put links to those issues or pull requests here.  -->
   
   ## Checklist
   
   - [x] Code is written and works correctly
   - [x] Changes are covered by tests
   - [ ] Any new configurable parameters are documented in `rel/overlay/etc/default.ini`
   - [ ] A PR for documentation changes has been made in https://github.com/apache/couchdb-documentation
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [couchdb] nickva commented on a change in pull request #2460: Change map indexes to be stored in one row

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #2460: Change map indexes to be stored in one row
URL: https://github.com/apache/couchdb/pull/2460#discussion_r368811056
 
 

 ##########
 File path: src/couch_views/src/couch_views_fdb.erl
 ##########
 @@ -147,132 +128,80 @@ write_doc(TxDb, Sig, ViewIds, Doc) ->
     ExistingViewKeys = get_view_keys(TxDb, Sig, DocId),
 
     clear_id_idx(TxDb, Sig, DocId),
-
-    lists:foreach(fun({ViewId, NewRows}) ->
-        update_id_idx(TxDb, Sig, ViewId, DocId, NewRows),
-
-        ExistingKeys = case lists:keyfind(ViewId, 1, ExistingViewKeys) of
-            {ViewId, TotalRows, TotalSize, EKeys} ->
-                RowChange = length(NewRows) - TotalRows,
-                SizeChange = calculate_row_size(NewRows) - TotalSize,
-                update_row_count(TxDb, Sig, ViewId, RowChange),
-                update_kv_size(TxDb, Sig, ViewId, SizeChange),
-                EKeys;
-            false ->
-                RowChange = length(NewRows),
-                SizeChange = calculate_row_size(NewRows),
-                update_row_count(TxDb, Sig, ViewId, RowChange),
-                update_kv_size(TxDb, Sig, ViewId, SizeChange),
-                []
-        end,
-        update_map_idx(TxDb, Sig, ViewId, DocId, ExistingKeys, NewRows)
-    end, lists:zip(ViewIds, Results)).
-
-
-% For each row in a map view there are two rows stored in
-% FoundationDB:
+    lists:foreach(fun({View, NewRows}) ->
+        #mrview{
+            map_names = MNames,
+            id_num = ViewId
+        } = View,
+
+        try
+            NewRowSize = calculate_row_size(NewRows),
+            update_id_idx(TxDb, Sig, ViewId, DocId, NewRows),
+
+            ExistingKeys = case lists:keyfind(ViewId, 1, ExistingViewKeys) of
+                {ViewId, TotalRows, TotalSize, EKeys} ->
+                    RowChange = length(NewRows) - TotalRows,
+                    SizeChange = NewRowSize - TotalSize,
+                    update_row_count(TxDb, Sig, ViewId, RowChange),
+                    update_kv_size(TxDb, Sig, ViewId, SizeChange),
+                    EKeys;
+                false ->
+                    RowChange = length(NewRows),
+                    SizeChange = NewRowSize,
+                    update_row_count(TxDb, Sig, ViewId, RowChange),
+                    update_kv_size(TxDb, Sig, ViewId, SizeChange),
+                    []
+            end,
+            update_map_idx(TxDb, Sig, ViewId, DocId, ExistingKeys, NewRows)
+        catch
+            throw:{size_exceeded, Type}  ->
+                case lists:keyfind(ViewId, 1, ExistingViewKeys) of
+                    false ->
+                        ok;
+                    ExistingViewKey ->
+                        remove_doc_from_idx(TxDb, Sig, DocId, ExistingViewKey)
+                end,
+                couch_log:error("Doc `~s` exceeded the ~s size for view `~s`"
+                    " and was not indexed.", [DocId, Type, MNames])
 
 Review comment:
   Just curious, what would `MNames` looks like in a log line?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [couchdb] nickva commented on a change in pull request #2460: Change map indexes to be stored in one row

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #2460: Change map indexes to be stored in one row
URL: https://github.com/apache/couchdb/pull/2460#discussion_r368809542
 
 

 ##########
 File path: rel/overlay/etc/default.ini
 ##########
 @@ -249,6 +249,10 @@ iterations = 10 ; iterations for password hashing
 ; Settings for view indexing
 [couch_views]
 ; max_workers = 100
+; The maximum allowed key size emitted from a view for a document (in Kb)
+; key_size_limit = 8
+; The maximum allowed value size emitted from a view for a document (in Kb)
+; key_size_limit = 64
 
 Review comment:
   There will also be hard maximum for these value? So just setting them to say 12MB won't work, there is some limit above which FDB operation would fail. We should mention maybe something like "values larger than xyz will be truncated to ..."?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [couchdb] garrensmith commented on issue #2460: Change map indexes to be stored in one row

Posted by GitBox <gi...@apache.org>.
garrensmith commented on issue #2460: Change map indexes to be stored in one row
URL: https://github.com/apache/couchdb/pull/2460#issuecomment-576201555
 
 
   I've added the map index key/value limits. If a document exceeds the key/value limit for a view in a design doc. It will be removed from that view. But if there are multiple views in a design doc, it will still be indexed in the other views in that design doc.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [couchdb] nickva commented on a change in pull request #2460: Change map indexes to be stored in one row

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #2460: Change map indexes to be stored in one row
URL: https://github.com/apache/couchdb/pull/2460#discussion_r368810667
 
 

 ##########
 File path: src/couch_views/src/couch_views_fdb.erl
 ##########
 @@ -132,13 +115,11 @@ write_doc(TxDb, Sig, _ViewIds, #{deleted := true} = Doc) ->
     ExistingViewKeys = get_view_keys(TxDb, Sig, DocId),
 
     clear_id_idx(TxDb, Sig, DocId),
-    lists:foreach(fun({ViewId, TotalKeys, TotalSize, UniqueKeys}) ->
-        clear_map_idx(TxDb, Sig, ViewId, DocId, UniqueKeys),
-        update_row_count(TxDb, Sig, ViewId, -TotalKeys),
-        update_kv_size(TxDb, Sig, ViewId, -TotalSize)
+    lists:foreach(fun(ExistingViewKey) ->
+        remove_doc_from_idx(TxDb, Sig, DocId, ExistingViewKey)
     end, ExistingViewKeys);
 
-write_doc(TxDb, Sig, ViewIds, Doc) ->
+write_doc(TxDb, Sig, Views, Doc) ->
 
 Review comment:
   I think we need to update the _ViewIds in the clause above as well

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [couchdb] nickva commented on a change in pull request #2460: Change map indexes to be stored in one row

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #2460: Change map indexes to be stored in one row
URL: https://github.com/apache/couchdb/pull/2460#discussion_r368808874
 
 

 ##########
 File path: rel/overlay/etc/default.ini
 ##########
 @@ -249,6 +249,10 @@ iterations = 10 ; iterations for password hashing
 ; Settings for view indexing
 [couch_views]
 ; max_workers = 100
+; The maximum allowed key size emitted from a view for a document (in Kb)
 
 Review comment:
   Is the is per a single emit or across multiple emits from the same map function?
   
   For consistency, maybe we could use bytes, if we already have used bytes before for buffers, document size limits etc.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [couchdb] nickva commented on a change in pull request #2460: Change map indexes to be stored in one row

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #2460: Change map indexes to be stored in one row
URL: https://github.com/apache/couchdb/pull/2460#discussion_r368810850
 
 

 ##########
 File path: src/couch_views/src/couch_views_fdb.erl
 ##########
 @@ -147,132 +128,80 @@ write_doc(TxDb, Sig, ViewIds, Doc) ->
     ExistingViewKeys = get_view_keys(TxDb, Sig, DocId),
 
     clear_id_idx(TxDb, Sig, DocId),
-
-    lists:foreach(fun({ViewId, NewRows}) ->
-        update_id_idx(TxDb, Sig, ViewId, DocId, NewRows),
-
-        ExistingKeys = case lists:keyfind(ViewId, 1, ExistingViewKeys) of
-            {ViewId, TotalRows, TotalSize, EKeys} ->
-                RowChange = length(NewRows) - TotalRows,
-                SizeChange = calculate_row_size(NewRows) - TotalSize,
-                update_row_count(TxDb, Sig, ViewId, RowChange),
-                update_kv_size(TxDb, Sig, ViewId, SizeChange),
-                EKeys;
-            false ->
-                RowChange = length(NewRows),
-                SizeChange = calculate_row_size(NewRows),
-                update_row_count(TxDb, Sig, ViewId, RowChange),
-                update_kv_size(TxDb, Sig, ViewId, SizeChange),
-                []
-        end,
-        update_map_idx(TxDb, Sig, ViewId, DocId, ExistingKeys, NewRows)
-    end, lists:zip(ViewIds, Results)).
-
-
-% For each row in a map view there are two rows stored in
-% FoundationDB:
+    lists:foreach(fun({View, NewRows}) ->
+        #mrview{
+            map_names = MNames,
+            id_num = ViewId
+        } = View,
+
+        try
+            NewRowSize = calculate_row_size(NewRows),
+            update_id_idx(TxDb, Sig, ViewId, DocId, NewRows),
+
+            ExistingKeys = case lists:keyfind(ViewId, 1, ExistingViewKeys) of
+                {ViewId, TotalRows, TotalSize, EKeys} ->
+                    RowChange = length(NewRows) - TotalRows,
+                    SizeChange = NewRowSize - TotalSize,
+                    update_row_count(TxDb, Sig, ViewId, RowChange),
+                    update_kv_size(TxDb, Sig, ViewId, SizeChange),
+                    EKeys;
+                false ->
+                    RowChange = length(NewRows),
+                    SizeChange = NewRowSize,
+                    update_row_count(TxDb, Sig, ViewId, RowChange),
+                    update_kv_size(TxDb, Sig, ViewId, SizeChange),
+                    []
+            end,
+            update_map_idx(TxDb, Sig, ViewId, DocId, ExistingKeys, NewRows)
+        catch
+            throw:{size_exceeded, Type}  ->
+                case lists:keyfind(ViewId, 1, ExistingViewKeys) of
+                    false ->
+                        ok;
+                    ExistingViewKey ->
+                        remove_doc_from_idx(TxDb, Sig, DocId, ExistingViewKey)
+                end,
+                couch_log:error("Doc `~s` exceeded the ~s size for view `~s`"
 
 Review comment:
   Any chance we could the db name as well? That might help if someone is parsing the logs

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [couchdb] garrensmith merged pull request #2460: Change map indexes to be stored in one row

Posted by GitBox <gi...@apache.org>.
garrensmith merged pull request #2460: Change map indexes to be stored in one row
URL: https://github.com/apache/couchdb/pull/2460
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [couchdb] nickva commented on a change in pull request #2460: Change map indexes to be stored in one row

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #2460: Change map indexes to be stored in one row
URL: https://github.com/apache/couchdb/pull/2460#discussion_r368810850
 
 

 ##########
 File path: src/couch_views/src/couch_views_fdb.erl
 ##########
 @@ -147,132 +128,80 @@ write_doc(TxDb, Sig, ViewIds, Doc) ->
     ExistingViewKeys = get_view_keys(TxDb, Sig, DocId),
 
     clear_id_idx(TxDb, Sig, DocId),
-
-    lists:foreach(fun({ViewId, NewRows}) ->
-        update_id_idx(TxDb, Sig, ViewId, DocId, NewRows),
-
-        ExistingKeys = case lists:keyfind(ViewId, 1, ExistingViewKeys) of
-            {ViewId, TotalRows, TotalSize, EKeys} ->
-                RowChange = length(NewRows) - TotalRows,
-                SizeChange = calculate_row_size(NewRows) - TotalSize,
-                update_row_count(TxDb, Sig, ViewId, RowChange),
-                update_kv_size(TxDb, Sig, ViewId, SizeChange),
-                EKeys;
-            false ->
-                RowChange = length(NewRows),
-                SizeChange = calculate_row_size(NewRows),
-                update_row_count(TxDb, Sig, ViewId, RowChange),
-                update_kv_size(TxDb, Sig, ViewId, SizeChange),
-                []
-        end,
-        update_map_idx(TxDb, Sig, ViewId, DocId, ExistingKeys, NewRows)
-    end, lists:zip(ViewIds, Results)).
-
-
-% For each row in a map view there are two rows stored in
-% FoundationDB:
+    lists:foreach(fun({View, NewRows}) ->
+        #mrview{
+            map_names = MNames,
+            id_num = ViewId
+        } = View,
+
+        try
+            NewRowSize = calculate_row_size(NewRows),
+            update_id_idx(TxDb, Sig, ViewId, DocId, NewRows),
+
+            ExistingKeys = case lists:keyfind(ViewId, 1, ExistingViewKeys) of
+                {ViewId, TotalRows, TotalSize, EKeys} ->
+                    RowChange = length(NewRows) - TotalRows,
+                    SizeChange = NewRowSize - TotalSize,
+                    update_row_count(TxDb, Sig, ViewId, RowChange),
+                    update_kv_size(TxDb, Sig, ViewId, SizeChange),
+                    EKeys;
+                false ->
+                    RowChange = length(NewRows),
+                    SizeChange = NewRowSize,
+                    update_row_count(TxDb, Sig, ViewId, RowChange),
+                    update_kv_size(TxDb, Sig, ViewId, SizeChange),
+                    []
+            end,
+            update_map_idx(TxDb, Sig, ViewId, DocId, ExistingKeys, NewRows)
+        catch
+            throw:{size_exceeded, Type}  ->
+                case lists:keyfind(ViewId, 1, ExistingViewKeys) of
+                    false ->
+                        ok;
+                    ExistingViewKey ->
+                        remove_doc_from_idx(TxDb, Sig, DocId, ExistingViewKey)
+                end,
+                couch_log:error("Doc `~s` exceeded the ~s size for view `~s`"
 
 Review comment:
   Any chance we could log the db name as well? That might help if someone is parsing the logs

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [couchdb] garrensmith commented on issue #2460: Change map indexes to be stored in one row

Posted by GitBox <gi...@apache.org>.
garrensmith commented on issue #2460: Change map indexes to be stored in one row
URL: https://github.com/apache/couchdb/pull/2460#issuecomment-577140384
 
 
   I've removed the key limiting code and just left the switch to 1 fdb row for 1 map row. Since this has already received a +1. I'm merging.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services