You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@quickstep.apache.org by cramja <gi...@git.apache.org> on 2016/09/20 01:38:21 UTC

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

GitHub user cramja opened a pull request:

    https://github.com/apache/incubator-quickstep/pull/100

    Refactor bulk insert for SplitRowStore

    This code refactors out multiple calls to the catalog in tight insert loops.
    
    We see a 2x improvement on large inserts.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cramja/incubator-quickstep refactor_bulk_ins

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-quickstep/pull/100.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #100
    
----
commit 29ebdae0e6218e9b8a3e3df2b056c6f62d598529
Author: cramja <ma...@gmail.com>
Date:   2016-09-16T21:35:19Z

    BulkInsert optimization for SplitRowStore
    
    This change adds a struct which holds the calculations for insert
    information for tuples coming from a value accessor and being inserted
    into a splitrowstore tuple block. This greatly speeds up highly
    unselective queries.

commit 57bd3e893b564327d7204039b479d04fa385738e
Author: cramja <ma...@gmail.com>
Date:   2016-09-16T23:19:16Z

    Adds insert optimization to bulkInsertWithRemappedAttributes
    
    Similar (copy+paste with one addition) to the last change to the
    SplitRowStore.

commit 47a1a4b62a12a3e74f6f687d75180f935e2b965c
Author: cramja <ma...@gmail.com>
Date:   2016-09-20T01:22:41Z

    Removes duplicate code in bulkInsert
    
    This refactor is meant to remove code complexity via removing duplicate
    code. Prefer cleaner/more maintainable code over a slightly faster algorithm.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by cramja <gi...@git.apache.org>.
Github user cramja commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/100#discussion_r79724504
  
    --- Diff: storage/SplitRowStoreTupleStorageSubBlock.cpp ---
    @@ -194,379 +257,125 @@ TupleStorageSubBlock::InsertResult SplitRowStoreTupleStorageSubBlock::insertTupl
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuples(ValueAccessor *accessor) {
    -  const tuple_id original_num_tuples = header_->num_tuples;
    -  tuple_id pos = 0;
    -
    -  InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          // If packed, insert at the end of the slot array, otherwise find the
    -          // first hole.
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), true>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          // Allocate variable-length storage.
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          // Find the slot and locate its sub-structures.
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          // Start writing variable-length data at the beginning of the newly
    -          // allocated range.
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              // Set null bit and move on.
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              // Write offset and size into the slot, then copy the actual
    -              // value into the variable-length storage region.
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              // Copy fixed-length value directly into the slot.
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          // Update occupancy bitmap and header.
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Same as above, but skip variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(accessor_attr_id);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    -            } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        // Same as most general case above, but skip null checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), false>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Simplest case: skip both null and variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -            std::memcpy(fixed_length_attr_storage
    -                            + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                        attr_value,
    -                        attr_it->getType().maximumByteLength());
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    }
    -  });
    -
    -  return header_->num_tuples - original_num_tuples;
    +  std::vector<attribute_id> simple_remap;
    +  for (attribute_id attr_id = 0; 
    +			attr_id < static_cast<attribute_id>(relation_.size());
    +			++attr_id) {
    +    simple_remap.push_back(attr_id);
    +  }
    +  return bulkInsertTuplesWithRemappedAttributes(simple_remap, accessor);
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuplesWithRemappedAttributes(
         const std::vector<attribute_id> &attribute_map,
         ValueAccessor *accessor) {
    -  DEBUG_ASSERT(attribute_map.size() == relation_.size());
    +  DCHECK_EQ(relation_.size(), attribute_map.size());
       const tuple_id original_num_tuples = header_->num_tuples;
       tuple_id pos = 0;
     
    +  BasicInsertInfo insertInfo(relation_);
    +
       InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    +    accessor,
    +    [&](auto *accessor) -> void {  // NOLINT(build/c++11
    +      while (accessor->next()) {
    +        // If packed, insert at the end of the slot array, otherwise find the
    +        // first hole.
    +        pos = this->isPacked() ? header_->num_tuples
    +                               : occupancy_bitmap_->firstZero(pos);
    +
    +        // Only calculate space used if needed.
    +        if (!this->spaceToInsert(pos, insertInfo.max_var_length_)) {
               const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(
    -                  relation_, *accessor, attribute_map);
    +            = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(relation_, *accessor,
    +                                                                                     attribute_map);
               if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
                 accessor->previous();
                 break;
               }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
             }
    -      } else {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(*attr_map_it);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    +
    +        // Find the slot and locate its sub-structures.
    +        void *tuple_slot = static_cast<char *>(tuple_storage_) + pos * tuple_slot_bytes_;
    +
    +        BitVector<true> tuple_null_bitmap(tuple_slot, insertInfo.num_nullable_attrs_);
    +        tuple_null_bitmap.clear();
    +        char *fixed_length_attr_storage = static_cast<char *>(tuple_slot) + insertInfo.fixed_len_offset_;
    +        std::uint32_t *variable_length_info_array =
    +          reinterpret_cast<std::uint32_t *>(static_cast<char *>(tuple_slot) + insertInfo.var_len_offset_);
    +
    +        // Start writing variable-length data at the beginning of the
    +        // newly allocated range.
    +        std::size_t current_variable_position = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    +        std::uint32_t current_null_idx = 0;
    +        for (attribute_id accessor_attr_id = 0;
    +             static_cast<std::size_t >(accessor_attr_id) < insertInfo.num_attrs_; ++accessor_attr_id) {
    +          bool nullable = insertInfo.is_nullable_.getBit(accessor_attr_id);
    +          bool variable = insertInfo.is_variable_.getBit(accessor_attr_id);
    +
    +          if (!nullable && !variable) {
    +            DCHECK_EQ(-1, relation_.getNullableAttributeIndex(accessor_attr_id));
    +
    +            const void *attr_value = accessor->template getUntypedValue<false>(attribute_map[accessor_attr_id]);
    +            std::memcpy(fixed_length_attr_storage + insertInfo.fixed_len_offsets_[accessor_attr_id],
    +                        attr_value,
    +                        insertInfo.fixed_len_sizes_[accessor_attr_id]);
    +          } else if (nullable && !variable) {
    +            DCHECK_EQ(relation_.getNullableAttributeIndex(accessor_attr_id), static_cast<int>(current_null_idx));
    +
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    +            if (attr_value.isNull()) {
    +              tuple_null_bitmap.setBit(current_null_idx, true);
                 } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(*attr_map_it);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    +              std::memcpy(fixed_length_attr_storage + insertInfo.fixed_len_offsets_[accessor_attr_id],
    +                          attr_value.getDataPtr(),
    +                          insertInfo.fixed_len_sizes_[accessor_attr_id]);
                 }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), false>(
    -                  relation_, *accessor, attribute_map);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    +            current_null_idx++;
    +          } else if (!nullable && variable) {
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    +
    +            DCHECK_EQ(-1, relation_.getNullableAttributeIndex(accessor_attr_id));
    +            DCHECK_EQ(insertInfo.var_len_offsets_[accessor_attr_id],
    +                      relation_.getVariableLengthAttributeIndex(accessor_attr_id));
    +            DCHECK(!attr_value.isNull());
    +
    +            const std::size_t attr_size = attr_value.getDataSize();
    +            current_variable_position -= attr_size;
    +            const int var_len_info_idx = insertInfo.var_len_offsets_[accessor_attr_id] * 2;
    +            variable_length_info_array[var_len_info_idx] = current_variable_position;
    +            variable_length_info_array[var_len_info_idx + 1] = attr_size;
    +            attr_value.copyInto(static_cast<char *>(tuple_storage_) + current_variable_position);
    +
    +            header_->variable_length_bytes_allocated += attr_size;
    +          } else {  // nullable, variable length
    +            DCHECK_EQ(static_cast<int>(current_null_idx), relation_.getNullableAttributeIndex(accessor_attr_id));
    +
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    +            if (attr_value.isNull()) {
    +              tuple_null_bitmap.setBit(current_null_idx, true);
                 } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    +              DCHECK_EQ(relation_.getVariableLengthAttributeIndex(accessor_attr_id),
    +                        insertInfo.var_len_offsets_[accessor_attr_id]);
    +
    +              const std::size_t attr_size = attr_value.getDataSize();
    +              current_variable_position -= attr_size;
    +              const int var_len_info_idx = insertInfo.var_len_offsets_[accessor_attr_id] * 2;
    +              variable_length_info_array[var_len_info_idx] = current_variable_position;
    +              variable_length_info_array[var_len_info_idx + 1] = attr_size;
    +              attr_value.copyInto(static_cast<char *>(tuple_storage_) + current_variable_position);
    +              header_->variable_length_bytes_allocated += attr_size;
                 }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    +            current_null_idx++;
               }
             }
    -      } else {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const void *attr_value = accessor->template getUntypedValue<false>(*attr_map_it);
    -            std::memcpy(fixed_length_attr_storage
    -                            + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                        attr_value,
    -                        attr_it->getType().maximumByteLength());
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    +        occupancy_bitmap_->setBit(pos, true);
    --- End diff --
    
    Agreed that it would be smaller in the common case (empty!) and since the 
    ```c++
      std::unique_ptr<BitVector<false>> occupancy_bitmap_;
    ```
    is on the heap anyways, it's not like we need to worry about how to serialize a variable length list when a block is evicted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by navsan <gi...@git.apache.org>.
Github user navsan commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/100#discussion_r79701889
  
    --- Diff: storage/SplitRowStoreTupleStorageSubBlock.cpp ---
    @@ -194,379 +257,125 @@ TupleStorageSubBlock::InsertResult SplitRowStoreTupleStorageSubBlock::insertTupl
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuples(ValueAccessor *accessor) {
    -  const tuple_id original_num_tuples = header_->num_tuples;
    -  tuple_id pos = 0;
    -
    -  InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          // If packed, insert at the end of the slot array, otherwise find the
    -          // first hole.
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), true>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          // Allocate variable-length storage.
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          // Find the slot and locate its sub-structures.
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          // Start writing variable-length data at the beginning of the newly
    -          // allocated range.
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              // Set null bit and move on.
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              // Write offset and size into the slot, then copy the actual
    -              // value into the variable-length storage region.
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              // Copy fixed-length value directly into the slot.
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          // Update occupancy bitmap and header.
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Same as above, but skip variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(accessor_attr_id);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    -            } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        // Same as most general case above, but skip null checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), false>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Simplest case: skip both null and variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -            std::memcpy(fixed_length_attr_storage
    -                            + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                        attr_value,
    -                        attr_it->getType().maximumByteLength());
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    }
    -  });
    -
    -  return header_->num_tuples - original_num_tuples;
    +  std::vector<attribute_id> simple_remap;
    +  for (attribute_id attr_id = 0; 
    +			attr_id < static_cast<attribute_id>(relation_.size());
    +			++attr_id) {
    +    simple_remap.push_back(attr_id);
    +  }
    +  return bulkInsertTuplesWithRemappedAttributes(simple_remap, accessor);
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuplesWithRemappedAttributes(
         const std::vector<attribute_id> &attribute_map,
         ValueAccessor *accessor) {
    -  DEBUG_ASSERT(attribute_map.size() == relation_.size());
    +  DCHECK_EQ(relation_.size(), attribute_map.size());
       const tuple_id original_num_tuples = header_->num_tuples;
       tuple_id pos = 0;
     
    +  BasicInsertInfo insertInfo(relation_);
    +
       InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    +    accessor,
    +    [&](auto *accessor) -> void {  // NOLINT(build/c++11
    +      while (accessor->next()) {
    +        // If packed, insert at the end of the slot array, otherwise find the
    +        // first hole.
    +        pos = this->isPacked() ? header_->num_tuples
    +                               : occupancy_bitmap_->firstZero(pos);
    +
    +        // Only calculate space used if needed.
    +        if (!this->spaceToInsert(pos, insertInfo.max_var_length_)) {
               const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(
    -                  relation_, *accessor, attribute_map);
    +            = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(relation_, *accessor,
    +                                                                                     attribute_map);
               if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
                 accessor->previous();
                 break;
               }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
             }
    -      } else {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(*attr_map_it);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    +
    +        // Find the slot and locate its sub-structures.
    +        void *tuple_slot = static_cast<char *>(tuple_storage_) + pos * tuple_slot_bytes_;
    +
    +        BitVector<true> tuple_null_bitmap(tuple_slot, insertInfo.num_nullable_attrs_);
    +        tuple_null_bitmap.clear();
    +        char *fixed_length_attr_storage = static_cast<char *>(tuple_slot) + insertInfo.fixed_len_offset_;
    +        std::uint32_t *variable_length_info_array =
    +          reinterpret_cast<std::uint32_t *>(static_cast<char *>(tuple_slot) + insertInfo.var_len_offset_);
    +
    +        // Start writing variable-length data at the beginning of the
    +        // newly allocated range.
    +        std::size_t current_variable_position = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    +        std::uint32_t current_null_idx = 0;
    +        for (attribute_id accessor_attr_id = 0;
    +             static_cast<std::size_t >(accessor_attr_id) < insertInfo.num_attrs_; ++accessor_attr_id) {
    +          bool nullable = insertInfo.is_nullable_.getBit(accessor_attr_id);
    +          bool variable = insertInfo.is_variable_.getBit(accessor_attr_id);
    +
    +          if (!nullable && !variable) {
    +            DCHECK_EQ(-1, relation_.getNullableAttributeIndex(accessor_attr_id));
    +
    +            const void *attr_value = accessor->template getUntypedValue<false>(attribute_map[accessor_attr_id]);
    +            std::memcpy(fixed_length_attr_storage + insertInfo.fixed_len_offsets_[accessor_attr_id],
    +                        attr_value,
    +                        insertInfo.fixed_len_sizes_[accessor_attr_id]);
    +          } else if (nullable && !variable) {
    +            DCHECK_EQ(relation_.getNullableAttributeIndex(accessor_attr_id), static_cast<int>(current_null_idx));
    +
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    --- End diff --
    
    You don't need to use a TypedValue here. Just check whether getUntypedValue returned a null pointer instead of using isNull.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #100: Refactor bulk insert for SplitRowStore

Posted by navsan <gi...@git.apache.org>.
Github user navsan commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/100
  
    Sounds good! Thanks.
    
    On Tue, Sep 20, 2016, 17:22 Marc S <no...@github.com> wrote:
    
    > @navsan <https://github.com/navsan> love the comments, very thorough.
    >
    > I updated the header of this PR to include benchmarks/testing info and
    > will keep this in mind in the future.
    >
    > *Changes queued for this PR*
    >
    >    - getting rid of TypedValue
    >    - moving some of the computations out of the inner loop and keeping a
    >    leaner set of running state variables
    >    - moving to a per-column info struct rather than a number of
    >    bitvectors and vectors
    >
    > I'm going to address these in an addendum commit. I think your suggestions
    > are great.
    >
    > *Different PR*
    >
    >    - getting rid of occupancy_bitmap_
    >    - moving the null_bitmap into the header as a single bitmap for the
    >    entire tuple storage subblock.
    >
    > I think these are valid issues (comments above for specifics) but that
    > they can be the subject of another commit/PR in the pursuit of more concise
    > PRs.
    >
    > *Another PR*
    >
    > Can you coalesce writes for contiguous columns in input/output? This helps
    > speed up the common case of materialized join results where (almost)
    > everything gets copied out anyway.
    >
    > Can you also add support for partialBulkInsert functions? See the commits
    > in my branch for reference. It'll help us improve the join result
    > materialization cost, as well as a few other operations.
    >
    > Both these points can be addressed but in a separate PR.
    >
    > \u2014
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/incubator-quickstep/pull/100#issuecomment-248452906>,
    > or mute the thread
    > <https://github.com/notifications/unsubscribe-auth/ACZB600yw0Z4bSlEoLxPC8FugsWcGDJNks5qsFyQgaJpZM4KBIzE>
    > .
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #100: Refactor bulk insert for SplitRowStore

Posted by cramja <gi...@git.apache.org>.
Github user cramja commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/100
  
    I'm closing this PR in favor of another which uses a technique to merge attribute copies when possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #100: Refactor bulk insert for SplitRowStore

Posted by navsan <gi...@git.apache.org>.
Github user navsan commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/100
  
    Finally, can you also add support for partialBulkInsert functions? See the commits in my branch for reference. It'll help us improve the join result materialization cost, as well as a few other operations. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by navsan <gi...@git.apache.org>.
Github user navsan commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/100#discussion_r79693696
  
    --- Diff: storage/SplitRowStoreTupleStorageSubBlock.cpp ---
    @@ -194,379 +257,125 @@ TupleStorageSubBlock::InsertResult SplitRowStoreTupleStorageSubBlock::insertTupl
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuples(ValueAccessor *accessor) {
    -  const tuple_id original_num_tuples = header_->num_tuples;
    -  tuple_id pos = 0;
    -
    -  InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          // If packed, insert at the end of the slot array, otherwise find the
    -          // first hole.
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), true>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          // Allocate variable-length storage.
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          // Find the slot and locate its sub-structures.
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          // Start writing variable-length data at the beginning of the newly
    -          // allocated range.
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              // Set null bit and move on.
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              // Write offset and size into the slot, then copy the actual
    -              // value into the variable-length storage region.
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              // Copy fixed-length value directly into the slot.
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          // Update occupancy bitmap and header.
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Same as above, but skip variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(accessor_attr_id);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    -            } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        // Same as most general case above, but skip null checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), false>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Simplest case: skip both null and variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -            std::memcpy(fixed_length_attr_storage
    -                            + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                        attr_value,
    -                        attr_it->getType().maximumByteLength());
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    }
    -  });
    -
    -  return header_->num_tuples - original_num_tuples;
    +  std::vector<attribute_id> simple_remap;
    +  for (attribute_id attr_id = 0; 
    +			attr_id < static_cast<attribute_id>(relation_.size());
    +			++attr_id) {
    +    simple_remap.push_back(attr_id);
    +  }
    +  return bulkInsertTuplesWithRemappedAttributes(simple_remap, accessor);
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuplesWithRemappedAttributes(
         const std::vector<attribute_id> &attribute_map,
         ValueAccessor *accessor) {
    -  DEBUG_ASSERT(attribute_map.size() == relation_.size());
    +  DCHECK_EQ(relation_.size(), attribute_map.size());
       const tuple_id original_num_tuples = header_->num_tuples;
       tuple_id pos = 0;
     
    +  BasicInsertInfo insertInfo(relation_);
    +
       InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    +    accessor,
    +    [&](auto *accessor) -> void {  // NOLINT(build/c++11
    +      while (accessor->next()) {
    +        // If packed, insert at the end of the slot array, otherwise find the
    +        // first hole.
    +        pos = this->isPacked() ? header_->num_tuples
    --- End diff --
    
    This inner loop branch is why I didn't want to support the occupancy_bitmap_ at all. 
    
    Any time there are "holes" in the tuple storage subblock, we would unset a number of bits in the occupancy_bitmap_. Then, during every bulk insert, we would go through a search through this large bitmap to get to the first zero. Is all this expense really worth it? Why not just disallow deletes (as PackedRowStore does), or allow holes to stay?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by cramja <gi...@git.apache.org>.
Github user cramja commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/100#discussion_r79720633
  
    --- Diff: storage/SplitRowStoreTupleStorageSubBlock.cpp ---
    @@ -194,379 +257,125 @@ TupleStorageSubBlock::InsertResult SplitRowStoreTupleStorageSubBlock::insertTupl
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuples(ValueAccessor *accessor) {
    -  const tuple_id original_num_tuples = header_->num_tuples;
    -  tuple_id pos = 0;
    -
    -  InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          // If packed, insert at the end of the slot array, otherwise find the
    -          // first hole.
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), true>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          // Allocate variable-length storage.
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          // Find the slot and locate its sub-structures.
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          // Start writing variable-length data at the beginning of the newly
    -          // allocated range.
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              // Set null bit and move on.
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              // Write offset and size into the slot, then copy the actual
    -              // value into the variable-length storage region.
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              // Copy fixed-length value directly into the slot.
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          // Update occupancy bitmap and header.
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Same as above, but skip variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(accessor_attr_id);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    -            } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        // Same as most general case above, but skip null checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), false>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Simplest case: skip both null and variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -            std::memcpy(fixed_length_attr_storage
    -                            + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                        attr_value,
    -                        attr_it->getType().maximumByteLength());
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    }
    -  });
    -
    -  return header_->num_tuples - original_num_tuples;
    +  std::vector<attribute_id> simple_remap;
    +  for (attribute_id attr_id = 0; 
    +			attr_id < static_cast<attribute_id>(relation_.size());
    +			++attr_id) {
    +    simple_remap.push_back(attr_id);
    +  }
    +  return bulkInsertTuplesWithRemappedAttributes(simple_remap, accessor);
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuplesWithRemappedAttributes(
         const std::vector<attribute_id> &attribute_map,
         ValueAccessor *accessor) {
    -  DEBUG_ASSERT(attribute_map.size() == relation_.size());
    +  DCHECK_EQ(relation_.size(), attribute_map.size());
       const tuple_id original_num_tuples = header_->num_tuples;
       tuple_id pos = 0;
     
    +  BasicInsertInfo insertInfo(relation_);
    +
       InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    +    accessor,
    +    [&](auto *accessor) -> void {  // NOLINT(build/c++11
    +      while (accessor->next()) {
    +        // If packed, insert at the end of the slot array, otherwise find the
    +        // first hole.
    +        pos = this->isPacked() ? header_->num_tuples
    +                               : occupancy_bitmap_->firstZero(pos);
    +
    +        // Only calculate space used if needed.
    +        if (!this->spaceToInsert(pos, insertInfo.max_var_length_)) {
               const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(
    -                  relation_, *accessor, attribute_map);
    +            = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(relation_, *accessor,
    +                                                                                     attribute_map);
               if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
                 accessor->previous();
                 break;
               }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
             }
    -      } else {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(*attr_map_it);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    +
    +        // Find the slot and locate its sub-structures.
    +        void *tuple_slot = static_cast<char *>(tuple_storage_) + pos * tuple_slot_bytes_;
    +
    +        BitVector<true> tuple_null_bitmap(tuple_slot, insertInfo.num_nullable_attrs_);
    --- End diff --
    
    > That if/else statement certainly doesn't need to be done for every tuple insertion
    
    Which if/else? In my eyes this is not a branch at all.
    
    > In general, I think we should just move to having a single null bitmap for the entire subblock (i.e., across all tuples).
    
    What's the logic here. I think for predicate evaluation it makes sense to keep the null data 'close' to the attribute data, as they will likely be accessed at nearly the same time (ie in the creation of a TypedValue for predicate evaluation). For copying, it might be faster, but I'm not sure why necessarily, maybe you could explain.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #100: Refactor bulk insert for SplitRowStore

Posted by pateljm <gi...@git.apache.org>.
Github user pateljm commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/100
  
    @cramja This is a nice feature. Thanks @navsan for the detailed comments. 
    
    Both, can we close this PR out over the next few days. Gotta keep moving \u2122 :) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by navsan <gi...@git.apache.org>.
Github user navsan commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/100#discussion_r79695934
  
    --- Diff: storage/SplitRowStoreTupleStorageSubBlock.cpp ---
    @@ -88,6 +88,67 @@ inline std::size_t CalculateVariableSizeWithRemappedAttributes(
       return total_size;
     }
     
    +
    +/**
    + * A struct which holds the offset information for a non-remapping insert
    + * operation
    + */
    +struct BasicInsertInfo {
    +  BasicInsertInfo(
    +    const CatalogRelationSchema &relation)
    +    : num_attrs_(relation.size()),
    +      num_nullable_attrs_(relation.numNullableAttributes()),
    +      max_var_length_(relation.getMaximumVariableByteLength()),
    +      fixed_len_offset_(BitVector<true>::BytesNeeded(num_nullable_attrs_)),
    +      var_len_offset_(fixed_len_offset_ + relation.getFixedByteLength()),
    +      is_variable_(num_attrs_),
    +      is_nullable_(num_attrs_),
    +      fixed_len_offsets_(num_attrs_),
    +      fixed_len_sizes_(num_attrs_),
    +      var_len_offsets_(num_attrs_) {
    +    attribute_id accessor_attr_id = 0;
    +    for (CatalogRelationSchema::const_iterator attr_it = relation.begin();
    +         attr_it != relation.end();
    +         ++attr_it, ++accessor_attr_id) {
    +      DCHECK_EQ(accessor_attr_id, attr_it->getID());
    +
    +      const int nullable_idx = relation.getNullableAttributeIndex(accessor_attr_id);
    +      const int variable_idx = relation.getVariableLengthAttributeIndex(accessor_attr_id);
    +      is_nullable_.setBit(accessor_attr_id, nullable_idx != -1);
    +
    +      if (variable_idx == -1) {
    +        is_variable_.setBit(accessor_attr_id, false);
    +        fixed_len_offsets_[accessor_attr_id] = relation.getFixedLengthAttributeOffset(accessor_attr_id);
    +        fixed_len_sizes_[accessor_attr_id] = relation.getAttributeById(
    +          accessor_attr_id)->getType().maximumByteLength();
    +        var_len_offsets_[accessor_attr_id] = -1;
    +      } else {
    +        is_variable_.setBit(accessor_attr_id, true);
    +        fixed_len_offsets_[accessor_attr_id] = 0;
    +        fixed_len_sizes_[accessor_attr_id] = 0;
    +        var_len_offsets_[accessor_attr_id] = relation.getVariableLengthAttributeIndex(accessor_attr_id);
    +      }
    +    }
    +  }
    +
    +  std::size_t num_attrs_;
    +  std::size_t num_nullable_attrs_;
    +  std::size_t max_var_length_;
    +
    +  // byte offset from the beginning of a tuple to the first fixed length attribute
    +  std::uint32_t fixed_len_offset_;
    +  // byte offset from the beginning of a tuple to the first variable length offset/length pair
    +  std::uint32_t var_len_offset_;
    +
    +  BitVector<true> is_variable_;
    --- End diff --
    
    Actually, I think it's even better to just create one struct for each column, with all the info we care about for that column, like whether it's nullable/variable length and what its size is. That way, InsertInfo will have a vector<struct> which is better than a bunch of vector<int>s you have (because the struct packed with all the info you need can just be unpacked into registers directly, whereas the vector will force you to do memory loads for each column for each tuple insertion.
    
    See my bulkInsert function in PackedRowStore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by cramja <gi...@git.apache.org>.
Github user cramja commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/100#discussion_r79723161
  
    --- Diff: storage/SplitRowStoreTupleStorageSubBlock.cpp ---
    @@ -194,379 +257,125 @@ TupleStorageSubBlock::InsertResult SplitRowStoreTupleStorageSubBlock::insertTupl
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuples(ValueAccessor *accessor) {
    -  const tuple_id original_num_tuples = header_->num_tuples;
    -  tuple_id pos = 0;
    -
    -  InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          // If packed, insert at the end of the slot array, otherwise find the
    -          // first hole.
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), true>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          // Allocate variable-length storage.
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          // Find the slot and locate its sub-structures.
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          // Start writing variable-length data at the beginning of the newly
    -          // allocated range.
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              // Set null bit and move on.
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              // Write offset and size into the slot, then copy the actual
    -              // value into the variable-length storage region.
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              // Copy fixed-length value directly into the slot.
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          // Update occupancy bitmap and header.
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Same as above, but skip variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(accessor_attr_id);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    -            } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        // Same as most general case above, but skip null checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), false>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Simplest case: skip both null and variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -            std::memcpy(fixed_length_attr_storage
    -                            + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                        attr_value,
    -                        attr_it->getType().maximumByteLength());
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    }
    -  });
    -
    -  return header_->num_tuples - original_num_tuples;
    +  std::vector<attribute_id> simple_remap;
    +  for (attribute_id attr_id = 0; 
    +			attr_id < static_cast<attribute_id>(relation_.size());
    +			++attr_id) {
    +    simple_remap.push_back(attr_id);
    +  }
    +  return bulkInsertTuplesWithRemappedAttributes(simple_remap, accessor);
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuplesWithRemappedAttributes(
         const std::vector<attribute_id> &attribute_map,
         ValueAccessor *accessor) {
    -  DEBUG_ASSERT(attribute_map.size() == relation_.size());
    +  DCHECK_EQ(relation_.size(), attribute_map.size());
       const tuple_id original_num_tuples = header_->num_tuples;
       tuple_id pos = 0;
     
    +  BasicInsertInfo insertInfo(relation_);
    +
       InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    +    accessor,
    +    [&](auto *accessor) -> void {  // NOLINT(build/c++11
    +      while (accessor->next()) {
    +        // If packed, insert at the end of the slot array, otherwise find the
    +        // first hole.
    +        pos = this->isPacked() ? header_->num_tuples
    +                               : occupancy_bitmap_->firstZero(pos);
    +
    +        // Only calculate space used if needed.
    +        if (!this->spaceToInsert(pos, insertInfo.max_var_length_)) {
               const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(
    -                  relation_, *accessor, attribute_map);
    +            = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(relation_, *accessor,
    +                                                                                     attribute_map);
               if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
                 accessor->previous();
                 break;
               }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
             }
    -      } else {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(*attr_map_it);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    +
    +        // Find the slot and locate its sub-structures.
    +        void *tuple_slot = static_cast<char *>(tuple_storage_) + pos * tuple_slot_bytes_;
    +
    +        BitVector<true> tuple_null_bitmap(tuple_slot, insertInfo.num_nullable_attrs_);
    +        tuple_null_bitmap.clear();
    +        char *fixed_length_attr_storage = static_cast<char *>(tuple_slot) + insertInfo.fixed_len_offset_;
    +        std::uint32_t *variable_length_info_array =
    +          reinterpret_cast<std::uint32_t *>(static_cast<char *>(tuple_slot) + insertInfo.var_len_offset_);
    +
    +        // Start writing variable-length data at the beginning of the
    +        // newly allocated range.
    +        std::size_t current_variable_position = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    +        std::uint32_t current_null_idx = 0;
    +        for (attribute_id accessor_attr_id = 0;
    +             static_cast<std::size_t >(accessor_attr_id) < insertInfo.num_attrs_; ++accessor_attr_id) {
    +          bool nullable = insertInfo.is_nullable_.getBit(accessor_attr_id);
    +          bool variable = insertInfo.is_variable_.getBit(accessor_attr_id);
    +
    +          if (!nullable && !variable) {
    +            DCHECK_EQ(-1, relation_.getNullableAttributeIndex(accessor_attr_id));
    +
    +            const void *attr_value = accessor->template getUntypedValue<false>(attribute_map[accessor_attr_id]);
    +            std::memcpy(fixed_length_attr_storage + insertInfo.fixed_len_offsets_[accessor_attr_id],
    +                        attr_value,
    +                        insertInfo.fixed_len_sizes_[accessor_attr_id]);
    +          } else if (nullable && !variable) {
    +            DCHECK_EQ(relation_.getNullableAttributeIndex(accessor_attr_id), static_cast<int>(current_null_idx));
    +
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    +            if (attr_value.isNull()) {
    +              tuple_null_bitmap.setBit(current_null_idx, true);
                 } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(*attr_map_it);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    +              std::memcpy(fixed_length_attr_storage + insertInfo.fixed_len_offsets_[accessor_attr_id],
    +                          attr_value.getDataPtr(),
    +                          insertInfo.fixed_len_sizes_[accessor_attr_id]);
                 }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), false>(
    -                  relation_, *accessor, attribute_map);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    +            current_null_idx++;
    +          } else if (!nullable && variable) {
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    +
    +            DCHECK_EQ(-1, relation_.getNullableAttributeIndex(accessor_attr_id));
    +            DCHECK_EQ(insertInfo.var_len_offsets_[accessor_attr_id],
    +                      relation_.getVariableLengthAttributeIndex(accessor_attr_id));
    +            DCHECK(!attr_value.isNull());
    +
    +            const std::size_t attr_size = attr_value.getDataSize();
    +            current_variable_position -= attr_size;
    +            const int var_len_info_idx = insertInfo.var_len_offsets_[accessor_attr_id] * 2;
    +            variable_length_info_array[var_len_info_idx] = current_variable_position;
    +            variable_length_info_array[var_len_info_idx + 1] = attr_size;
    +            attr_value.copyInto(static_cast<char *>(tuple_storage_) + current_variable_position);
    --- End diff --
    
    Okay, sounds good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #100: Refactor bulk insert for SplitRowStore

Posted by cramja <gi...@git.apache.org>.
Github user cramja commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/100
  
    @navsan please take a look


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #100: Refactor bulk insert for SplitRowStore

Posted by cramja <gi...@git.apache.org>.
Github user cramja commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/100
  
    @navsan love the comments, very thorough.
    
    I updated the header of this PR to include benchmarks/testing info and will keep this in mind in the future.
    
    **Changes queued for this PR**
    * getting rid of TypedValue
    * moving some of the computations out of the inner loop and keeping a leaner set of running state variables
    * moving to a per-column info struct rather than a number of bitvectors and vector<int>s
    
    I'm going to address these in an addendum commit. I think your suggestions are great.
    
    **Different PR**
    * getting rid of occupancy_bitmap_
    * moving the null_bitmap into the header as a single bitmap for the entire tuple storage subblock.
    
    I think these are valid issues (comments above for specifics) but that they can be the subject of another commit/PR in the pursuit of more concise PRs.
    
    **Another PR**
    > Can you coalesce writes for contiguous columns in input/output? This helps speed up the common case of materialized join results where (almost) everything gets copied out anyway.
    
    > Can you also add support for partialBulkInsert functions? See the commits in my branch for reference. It'll help us improve the join result materialization cost, as well as a few other operations.
    
    Both these points can be addressed but in a separate PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by navsan <gi...@git.apache.org>.
Github user navsan commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/100#discussion_r79694477
  
    --- Diff: storage/SplitRowStoreTupleStorageSubBlock.cpp ---
    @@ -194,379 +257,125 @@ TupleStorageSubBlock::InsertResult SplitRowStoreTupleStorageSubBlock::insertTupl
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuples(ValueAccessor *accessor) {
    -  const tuple_id original_num_tuples = header_->num_tuples;
    -  tuple_id pos = 0;
    -
    -  InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          // If packed, insert at the end of the slot array, otherwise find the
    -          // first hole.
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), true>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          // Allocate variable-length storage.
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          // Find the slot and locate its sub-structures.
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          // Start writing variable-length data at the beginning of the newly
    -          // allocated range.
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              // Set null bit and move on.
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              // Write offset and size into the slot, then copy the actual
    -              // value into the variable-length storage region.
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              // Copy fixed-length value directly into the slot.
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          // Update occupancy bitmap and header.
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Same as above, but skip variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(accessor_attr_id);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    -            } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        // Same as most general case above, but skip null checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), false>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Simplest case: skip both null and variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -            std::memcpy(fixed_length_attr_storage
    -                            + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                        attr_value,
    -                        attr_it->getType().maximumByteLength());
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    }
    -  });
    -
    -  return header_->num_tuples - original_num_tuples;
    +  std::vector<attribute_id> simple_remap;
    +  for (attribute_id attr_id = 0; 
    +			attr_id < static_cast<attribute_id>(relation_.size());
    +			++attr_id) {
    +    simple_remap.push_back(attr_id);
    +  }
    +  return bulkInsertTuplesWithRemappedAttributes(simple_remap, accessor);
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuplesWithRemappedAttributes(
         const std::vector<attribute_id> &attribute_map,
         ValueAccessor *accessor) {
    -  DEBUG_ASSERT(attribute_map.size() == relation_.size());
    +  DCHECK_EQ(relation_.size(), attribute_map.size());
       const tuple_id original_num_tuples = header_->num_tuples;
       tuple_id pos = 0;
     
    +  BasicInsertInfo insertInfo(relation_);
    +
       InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    +    accessor,
    +    [&](auto *accessor) -> void {  // NOLINT(build/c++11
    +      while (accessor->next()) {
    +        // If packed, insert at the end of the slot array, otherwise find the
    +        // first hole.
    +        pos = this->isPacked() ? header_->num_tuples
    +                               : occupancy_bitmap_->firstZero(pos);
    +
    +        // Only calculate space used if needed.
    +        if (!this->spaceToInsert(pos, insertInfo.max_var_length_)) {
               const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(
    -                  relation_, *accessor, attribute_map);
    +            = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(relation_, *accessor,
    +                                                                                     attribute_map);
               if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
                 accessor->previous();
                 break;
               }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
             }
    -      } else {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(*attr_map_it);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    +
    +        // Find the slot and locate its sub-structures.
    +        void *tuple_slot = static_cast<char *>(tuple_storage_) + pos * tuple_slot_bytes_;
    +
    +        BitVector<true> tuple_null_bitmap(tuple_slot, insertInfo.num_nullable_attrs_);
    --- End diff --
    
    SplitRowStore's null bitmap on a per-tuple basis is needlessly expensive. For instance, this constructor used here does a check on num_nullable_attrs to figure out how big the bitmap must be. That if/else statement certainly doesn't need to be done for every tuple insertion. 
    
    In general, I think we should just move to having a single null bitmap for the entire subblock (i.e., across all tuples). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #100: Refactor bulk insert for SplitRowStore

Posted by navsan <gi...@git.apache.org>.
Github user navsan commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/100
  
    Also, always explain in the PR what tests/experiments you ran. 
    
    - Did you confirm correctness of the code? How? Just the current test suite? TPCH queries?
    - How did you determine the 2x performance improvement? What were the test parameters? What machine did you run this on?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #100: Refactor bulk insert for SplitRowStore

Posted by cramja <gi...@git.apache.org>.
Github user cramja commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/100
  
    Agreed. I've been working on adding @navsan 's idea of merging contiguous attributes. I have another [branch with those changes](https://github.com/cramja/incubator-quickstep/tree/splitrow_partial_ins). It's almost ready, I'm in the process of benchmarking with TPCH to see if it can replace PackedRow.
    
    Granted this PR speeds up Splitrow, and has tests so can be merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by navsan <gi...@git.apache.org>.
Github user navsan commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/100#discussion_r79702096
  
    --- Diff: storage/SplitRowStoreTupleStorageSubBlock.cpp ---
    @@ -194,379 +257,125 @@ TupleStorageSubBlock::InsertResult SplitRowStoreTupleStorageSubBlock::insertTupl
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuples(ValueAccessor *accessor) {
    -  const tuple_id original_num_tuples = header_->num_tuples;
    -  tuple_id pos = 0;
    -
    -  InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          // If packed, insert at the end of the slot array, otherwise find the
    -          // first hole.
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), true>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          // Allocate variable-length storage.
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          // Find the slot and locate its sub-structures.
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          // Start writing variable-length data at the beginning of the newly
    -          // allocated range.
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              // Set null bit and move on.
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              // Write offset and size into the slot, then copy the actual
    -              // value into the variable-length storage region.
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              // Copy fixed-length value directly into the slot.
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          // Update occupancy bitmap and header.
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Same as above, but skip variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(accessor_attr_id);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    -            } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        // Same as most general case above, but skip null checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), false>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Simplest case: skip both null and variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -            std::memcpy(fixed_length_attr_storage
    -                            + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                        attr_value,
    -                        attr_it->getType().maximumByteLength());
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    }
    -  });
    -
    -  return header_->num_tuples - original_num_tuples;
    +  std::vector<attribute_id> simple_remap;
    +  for (attribute_id attr_id = 0; 
    +			attr_id < static_cast<attribute_id>(relation_.size());
    +			++attr_id) {
    +    simple_remap.push_back(attr_id);
    +  }
    +  return bulkInsertTuplesWithRemappedAttributes(simple_remap, accessor);
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuplesWithRemappedAttributes(
         const std::vector<attribute_id> &attribute_map,
         ValueAccessor *accessor) {
    -  DEBUG_ASSERT(attribute_map.size() == relation_.size());
    +  DCHECK_EQ(relation_.size(), attribute_map.size());
       const tuple_id original_num_tuples = header_->num_tuples;
       tuple_id pos = 0;
     
    +  BasicInsertInfo insertInfo(relation_);
    +
       InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    +    accessor,
    +    [&](auto *accessor) -> void {  // NOLINT(build/c++11
    +      while (accessor->next()) {
    +        // If packed, insert at the end of the slot array, otherwise find the
    +        // first hole.
    +        pos = this->isPacked() ? header_->num_tuples
    +                               : occupancy_bitmap_->firstZero(pos);
    +
    +        // Only calculate space used if needed.
    +        if (!this->spaceToInsert(pos, insertInfo.max_var_length_)) {
               const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(
    -                  relation_, *accessor, attribute_map);
    +            = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(relation_, *accessor,
    +                                                                                     attribute_map);
               if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
                 accessor->previous();
                 break;
               }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
             }
    -      } else {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(*attr_map_it);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    +
    +        // Find the slot and locate its sub-structures.
    +        void *tuple_slot = static_cast<char *>(tuple_storage_) + pos * tuple_slot_bytes_;
    +
    +        BitVector<true> tuple_null_bitmap(tuple_slot, insertInfo.num_nullable_attrs_);
    +        tuple_null_bitmap.clear();
    +        char *fixed_length_attr_storage = static_cast<char *>(tuple_slot) + insertInfo.fixed_len_offset_;
    +        std::uint32_t *variable_length_info_array =
    +          reinterpret_cast<std::uint32_t *>(static_cast<char *>(tuple_slot) + insertInfo.var_len_offset_);
    +
    +        // Start writing variable-length data at the beginning of the
    +        // newly allocated range.
    +        std::size_t current_variable_position = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    +        std::uint32_t current_null_idx = 0;
    +        for (attribute_id accessor_attr_id = 0;
    +             static_cast<std::size_t >(accessor_attr_id) < insertInfo.num_attrs_; ++accessor_attr_id) {
    +          bool nullable = insertInfo.is_nullable_.getBit(accessor_attr_id);
    +          bool variable = insertInfo.is_variable_.getBit(accessor_attr_id);
    +
    +          if (!nullable && !variable) {
    +            DCHECK_EQ(-1, relation_.getNullableAttributeIndex(accessor_attr_id));
    +
    +            const void *attr_value = accessor->template getUntypedValue<false>(attribute_map[accessor_attr_id]);
    +            std::memcpy(fixed_length_attr_storage + insertInfo.fixed_len_offsets_[accessor_attr_id],
    +                        attr_value,
    +                        insertInfo.fixed_len_sizes_[accessor_attr_id]);
    +          } else if (nullable && !variable) {
    +            DCHECK_EQ(relation_.getNullableAttributeIndex(accessor_attr_id), static_cast<int>(current_null_idx));
    +
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    +            if (attr_value.isNull()) {
    --- End diff --
    
    It's safe to do this memcpy even if this value is null. Not important now, but useful when you want to coalesce memcpy-calls for contiguous attributes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by cramja <gi...@git.apache.org>.
Github user cramja commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/100#discussion_r79721122
  
    --- Diff: storage/SplitRowStoreTupleStorageSubBlock.cpp ---
    @@ -194,379 +257,125 @@ TupleStorageSubBlock::InsertResult SplitRowStoreTupleStorageSubBlock::insertTupl
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuples(ValueAccessor *accessor) {
    -  const tuple_id original_num_tuples = header_->num_tuples;
    -  tuple_id pos = 0;
    -
    -  InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          // If packed, insert at the end of the slot array, otherwise find the
    -          // first hole.
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), true>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          // Allocate variable-length storage.
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          // Find the slot and locate its sub-structures.
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          // Start writing variable-length data at the beginning of the newly
    -          // allocated range.
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              // Set null bit and move on.
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              // Write offset and size into the slot, then copy the actual
    -              // value into the variable-length storage region.
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              // Copy fixed-length value directly into the slot.
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          // Update occupancy bitmap and header.
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Same as above, but skip variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(accessor_attr_id);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    -            } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        // Same as most general case above, but skip null checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), false>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Simplest case: skip both null and variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -            std::memcpy(fixed_length_attr_storage
    -                            + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                        attr_value,
    -                        attr_it->getType().maximumByteLength());
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    }
    -  });
    -
    -  return header_->num_tuples - original_num_tuples;
    +  std::vector<attribute_id> simple_remap;
    +  for (attribute_id attr_id = 0; 
    +			attr_id < static_cast<attribute_id>(relation_.size());
    +			++attr_id) {
    +    simple_remap.push_back(attr_id);
    +  }
    +  return bulkInsertTuplesWithRemappedAttributes(simple_remap, accessor);
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuplesWithRemappedAttributes(
         const std::vector<attribute_id> &attribute_map,
         ValueAccessor *accessor) {
    -  DEBUG_ASSERT(attribute_map.size() == relation_.size());
    +  DCHECK_EQ(relation_.size(), attribute_map.size());
       const tuple_id original_num_tuples = header_->num_tuples;
       tuple_id pos = 0;
     
    +  BasicInsertInfo insertInfo(relation_);
    +
       InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    +    accessor,
    +    [&](auto *accessor) -> void {  // NOLINT(build/c++11
    +      while (accessor->next()) {
    +        // If packed, insert at the end of the slot array, otherwise find the
    +        // first hole.
    +        pos = this->isPacked() ? header_->num_tuples
    +                               : occupancy_bitmap_->firstZero(pos);
    +
    +        // Only calculate space used if needed.
    +        if (!this->spaceToInsert(pos, insertInfo.max_var_length_)) {
    --- End diff --
    
    I like it, we can do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by navsan <gi...@git.apache.org>.
Github user navsan commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/100#discussion_r79703426
  
    --- Diff: storage/SplitRowStoreTupleStorageSubBlock.cpp ---
    @@ -194,379 +257,125 @@ TupleStorageSubBlock::InsertResult SplitRowStoreTupleStorageSubBlock::insertTupl
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuples(ValueAccessor *accessor) {
    -  const tuple_id original_num_tuples = header_->num_tuples;
    -  tuple_id pos = 0;
    -
    -  InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          // If packed, insert at the end of the slot array, otherwise find the
    -          // first hole.
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), true>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          // Allocate variable-length storage.
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          // Find the slot and locate its sub-structures.
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          // Start writing variable-length data at the beginning of the newly
    -          // allocated range.
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              // Set null bit and move on.
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              // Write offset and size into the slot, then copy the actual
    -              // value into the variable-length storage region.
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              // Copy fixed-length value directly into the slot.
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          // Update occupancy bitmap and header.
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Same as above, but skip variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(accessor_attr_id);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    -            } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        // Same as most general case above, but skip null checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), false>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Simplest case: skip both null and variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -            std::memcpy(fixed_length_attr_storage
    -                            + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                        attr_value,
    -                        attr_it->getType().maximumByteLength());
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    }
    -  });
    -
    -  return header_->num_tuples - original_num_tuples;
    +  std::vector<attribute_id> simple_remap;
    +  for (attribute_id attr_id = 0; 
    +			attr_id < static_cast<attribute_id>(relation_.size());
    +			++attr_id) {
    +    simple_remap.push_back(attr_id);
    +  }
    +  return bulkInsertTuplesWithRemappedAttributes(simple_remap, accessor);
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuplesWithRemappedAttributes(
         const std::vector<attribute_id> &attribute_map,
         ValueAccessor *accessor) {
    -  DEBUG_ASSERT(attribute_map.size() == relation_.size());
    +  DCHECK_EQ(relation_.size(), attribute_map.size());
       const tuple_id original_num_tuples = header_->num_tuples;
       tuple_id pos = 0;
     
    +  BasicInsertInfo insertInfo(relation_);
    +
       InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    +    accessor,
    +    [&](auto *accessor) -> void {  // NOLINT(build/c++11
    +      while (accessor->next()) {
    +        // If packed, insert at the end of the slot array, otherwise find the
    +        // first hole.
    +        pos = this->isPacked() ? header_->num_tuples
    +                               : occupancy_bitmap_->firstZero(pos);
    +
    +        // Only calculate space used if needed.
    +        if (!this->spaceToInsert(pos, insertInfo.max_var_length_)) {
               const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(
    -                  relation_, *accessor, attribute_map);
    +            = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(relation_, *accessor,
    +                                                                                     attribute_map);
               if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
                 accessor->previous();
                 break;
               }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
             }
    -      } else {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(*attr_map_it);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    +
    +        // Find the slot and locate its sub-structures.
    +        void *tuple_slot = static_cast<char *>(tuple_storage_) + pos * tuple_slot_bytes_;
    +
    +        BitVector<true> tuple_null_bitmap(tuple_slot, insertInfo.num_nullable_attrs_);
    +        tuple_null_bitmap.clear();
    +        char *fixed_length_attr_storage = static_cast<char *>(tuple_slot) + insertInfo.fixed_len_offset_;
    +        std::uint32_t *variable_length_info_array =
    +          reinterpret_cast<std::uint32_t *>(static_cast<char *>(tuple_slot) + insertInfo.var_len_offset_);
    +
    +        // Start writing variable-length data at the beginning of the
    +        // newly allocated range.
    +        std::size_t current_variable_position = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    +        std::uint32_t current_null_idx = 0;
    +        for (attribute_id accessor_attr_id = 0;
    +             static_cast<std::size_t >(accessor_attr_id) < insertInfo.num_attrs_; ++accessor_attr_id) {
    +          bool nullable = insertInfo.is_nullable_.getBit(accessor_attr_id);
    +          bool variable = insertInfo.is_variable_.getBit(accessor_attr_id);
    +
    +          if (!nullable && !variable) {
    +            DCHECK_EQ(-1, relation_.getNullableAttributeIndex(accessor_attr_id));
    +
    +            const void *attr_value = accessor->template getUntypedValue<false>(attribute_map[accessor_attr_id]);
    +            std::memcpy(fixed_length_attr_storage + insertInfo.fixed_len_offsets_[accessor_attr_id],
    +                        attr_value,
    +                        insertInfo.fixed_len_sizes_[accessor_attr_id]);
    +          } else if (nullable && !variable) {
    +            DCHECK_EQ(relation_.getNullableAttributeIndex(accessor_attr_id), static_cast<int>(current_null_idx));
    +
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    +            if (attr_value.isNull()) {
    +              tuple_null_bitmap.setBit(current_null_idx, true);
                 } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(*attr_map_it);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    +              std::memcpy(fixed_length_attr_storage + insertInfo.fixed_len_offsets_[accessor_attr_id],
    +                          attr_value.getDataPtr(),
    +                          insertInfo.fixed_len_sizes_[accessor_attr_id]);
                 }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), false>(
    -                  relation_, *accessor, attribute_map);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    +            current_null_idx++;
    +          } else if (!nullable && variable) {
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    +
    +            DCHECK_EQ(-1, relation_.getNullableAttributeIndex(accessor_attr_id));
    +            DCHECK_EQ(insertInfo.var_len_offsets_[accessor_attr_id],
    +                      relation_.getVariableLengthAttributeIndex(accessor_attr_id));
    +            DCHECK(!attr_value.isNull());
    +
    +            const std::size_t attr_size = attr_value.getDataSize();
    +            current_variable_position -= attr_size;
    +            const int var_len_info_idx = insertInfo.var_len_offsets_[accessor_attr_id] * 2;
    +            variable_length_info_array[var_len_info_idx] = current_variable_position;
    +            variable_length_info_array[var_len_info_idx + 1] = attr_size;
    +            attr_value.copyInto(static_cast<char *>(tuple_storage_) + current_variable_position);
    +
    +            header_->variable_length_bytes_allocated += attr_size;
    +          } else {  // nullable, variable length
    +            DCHECK_EQ(static_cast<int>(current_null_idx), relation_.getNullableAttributeIndex(accessor_attr_id));
    +
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    +            if (attr_value.isNull()) {
    +              tuple_null_bitmap.setBit(current_null_idx, true);
                 } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    +              DCHECK_EQ(relation_.getVariableLengthAttributeIndex(accessor_attr_id),
    +                        insertInfo.var_len_offsets_[accessor_attr_id]);
    +
    +              const std::size_t attr_size = attr_value.getDataSize();
    +              current_variable_position -= attr_size;
    +              const int var_len_info_idx = insertInfo.var_len_offsets_[accessor_attr_id] * 2;
    +              variable_length_info_array[var_len_info_idx] = current_variable_position;
    +              variable_length_info_array[var_len_info_idx + 1] = attr_size;
    +              attr_value.copyInto(static_cast<char *>(tuple_storage_) + current_variable_position);
    +              header_->variable_length_bytes_allocated += attr_size;
                 }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    +            current_null_idx++;
               }
             }
    -      } else {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const void *attr_value = accessor->template getUntypedValue<false>(*attr_map_it);
    -            std::memcpy(fixed_length_attr_storage
    -                            + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                        attr_value,
    -                        attr_it->getType().maximumByteLength());
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    +        occupancy_bitmap_->setBit(pos, true);
    --- End diff --
    
    All of this would go away if we drop the occupancy_bitmap_. 
    
    On the other hand, if we absolutely want to allow deletes, it'd be faster (and more space-efficient) to keep a sorted list of tuple IDs rather than an entire bitmap. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by navsan <gi...@git.apache.org>.
Github user navsan commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/100#discussion_r79703207
  
    --- Diff: storage/SplitRowStoreTupleStorageSubBlock.cpp ---
    @@ -194,379 +257,125 @@ TupleStorageSubBlock::InsertResult SplitRowStoreTupleStorageSubBlock::insertTupl
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuples(ValueAccessor *accessor) {
    -  const tuple_id original_num_tuples = header_->num_tuples;
    -  tuple_id pos = 0;
    -
    -  InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          // If packed, insert at the end of the slot array, otherwise find the
    -          // first hole.
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), true>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          // Allocate variable-length storage.
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          // Find the slot and locate its sub-structures.
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          // Start writing variable-length data at the beginning of the newly
    -          // allocated range.
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              // Set null bit and move on.
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              // Write offset and size into the slot, then copy the actual
    -              // value into the variable-length storage region.
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              // Copy fixed-length value directly into the slot.
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          // Update occupancy bitmap and header.
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Same as above, but skip variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(accessor_attr_id);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    -            } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        // Same as most general case above, but skip null checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), false>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Simplest case: skip both null and variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -            std::memcpy(fixed_length_attr_storage
    -                            + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                        attr_value,
    -                        attr_it->getType().maximumByteLength());
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    }
    -  });
    -
    -  return header_->num_tuples - original_num_tuples;
    +  std::vector<attribute_id> simple_remap;
    +  for (attribute_id attr_id = 0; 
    +			attr_id < static_cast<attribute_id>(relation_.size());
    +			++attr_id) {
    +    simple_remap.push_back(attr_id);
    +  }
    +  return bulkInsertTuplesWithRemappedAttributes(simple_remap, accessor);
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuplesWithRemappedAttributes(
         const std::vector<attribute_id> &attribute_map,
         ValueAccessor *accessor) {
    -  DEBUG_ASSERT(attribute_map.size() == relation_.size());
    +  DCHECK_EQ(relation_.size(), attribute_map.size());
       const tuple_id original_num_tuples = header_->num_tuples;
       tuple_id pos = 0;
     
    +  BasicInsertInfo insertInfo(relation_);
    +
       InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    +    accessor,
    +    [&](auto *accessor) -> void {  // NOLINT(build/c++11
    +      while (accessor->next()) {
    +        // If packed, insert at the end of the slot array, otherwise find the
    +        // first hole.
    +        pos = this->isPacked() ? header_->num_tuples
    +                               : occupancy_bitmap_->firstZero(pos);
    +
    +        // Only calculate space used if needed.
    +        if (!this->spaceToInsert(pos, insertInfo.max_var_length_)) {
               const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(
    -                  relation_, *accessor, attribute_map);
    +            = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(relation_, *accessor,
    +                                                                                     attribute_map);
               if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
                 accessor->previous();
                 break;
               }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
             }
    -      } else {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(*attr_map_it);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    +
    +        // Find the slot and locate its sub-structures.
    +        void *tuple_slot = static_cast<char *>(tuple_storage_) + pos * tuple_slot_bytes_;
    +
    +        BitVector<true> tuple_null_bitmap(tuple_slot, insertInfo.num_nullable_attrs_);
    +        tuple_null_bitmap.clear();
    +        char *fixed_length_attr_storage = static_cast<char *>(tuple_slot) + insertInfo.fixed_len_offset_;
    +        std::uint32_t *variable_length_info_array =
    +          reinterpret_cast<std::uint32_t *>(static_cast<char *>(tuple_slot) + insertInfo.var_len_offset_);
    +
    +        // Start writing variable-length data at the beginning of the
    +        // newly allocated range.
    +        std::size_t current_variable_position = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    +        std::uint32_t current_null_idx = 0;
    +        for (attribute_id accessor_attr_id = 0;
    +             static_cast<std::size_t >(accessor_attr_id) < insertInfo.num_attrs_; ++accessor_attr_id) {
    +          bool nullable = insertInfo.is_nullable_.getBit(accessor_attr_id);
    +          bool variable = insertInfo.is_variable_.getBit(accessor_attr_id);
    +
    +          if (!nullable && !variable) {
    +            DCHECK_EQ(-1, relation_.getNullableAttributeIndex(accessor_attr_id));
    +
    +            const void *attr_value = accessor->template getUntypedValue<false>(attribute_map[accessor_attr_id]);
    +            std::memcpy(fixed_length_attr_storage + insertInfo.fixed_len_offsets_[accessor_attr_id],
    +                        attr_value,
    +                        insertInfo.fixed_len_sizes_[accessor_attr_id]);
    +          } else if (nullable && !variable) {
    +            DCHECK_EQ(relation_.getNullableAttributeIndex(accessor_attr_id), static_cast<int>(current_null_idx));
    +
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    +            if (attr_value.isNull()) {
    +              tuple_null_bitmap.setBit(current_null_idx, true);
                 } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(*attr_map_it);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    +              std::memcpy(fixed_length_attr_storage + insertInfo.fixed_len_offsets_[accessor_attr_id],
    +                          attr_value.getDataPtr(),
    +                          insertInfo.fixed_len_sizes_[accessor_attr_id]);
                 }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), false>(
    -                  relation_, *accessor, attribute_map);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    +            current_null_idx++;
    +          } else if (!nullable && variable) {
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    +
    +            DCHECK_EQ(-1, relation_.getNullableAttributeIndex(accessor_attr_id));
    +            DCHECK_EQ(insertInfo.var_len_offsets_[accessor_attr_id],
    +                      relation_.getVariableLengthAttributeIndex(accessor_attr_id));
    +            DCHECK(!attr_value.isNull());
    +
    +            const std::size_t attr_size = attr_value.getDataSize();
    +            current_variable_position -= attr_size;
    +            const int var_len_info_idx = insertInfo.var_len_offsets_[accessor_attr_id] * 2;
    +            variable_length_info_array[var_len_info_idx] = current_variable_position;
    +            variable_length_info_array[var_len_info_idx + 1] = attr_size;
    +            attr_value.copyInto(static_cast<char *>(tuple_storage_) + current_variable_position);
    +
    +            header_->variable_length_bytes_allocated += attr_size;
    --- End diff --
    
    I think it's cleaner (and slightly more efficient) to update the header at the end, and just using current_variable_position as the running position index in the loop.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by cramja <gi...@git.apache.org>.
Github user cramja commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/100#discussion_r79722671
  
    --- Diff: storage/SplitRowStoreTupleStorageSubBlock.cpp ---
    @@ -194,379 +257,125 @@ TupleStorageSubBlock::InsertResult SplitRowStoreTupleStorageSubBlock::insertTupl
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuples(ValueAccessor *accessor) {
    -  const tuple_id original_num_tuples = header_->num_tuples;
    -  tuple_id pos = 0;
    -
    -  InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          // If packed, insert at the end of the slot array, otherwise find the
    -          // first hole.
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), true>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          // Allocate variable-length storage.
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          // Find the slot and locate its sub-structures.
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          // Start writing variable-length data at the beginning of the newly
    -          // allocated range.
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              // Set null bit and move on.
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              // Write offset and size into the slot, then copy the actual
    -              // value into the variable-length storage region.
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              // Copy fixed-length value directly into the slot.
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          // Update occupancy bitmap and header.
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Same as above, but skip variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(accessor_attr_id);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    -            } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        // Same as most general case above, but skip null checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), false>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Simplest case: skip both null and variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -            std::memcpy(fixed_length_attr_storage
    -                            + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                        attr_value,
    -                        attr_it->getType().maximumByteLength());
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    }
    -  });
    -
    -  return header_->num_tuples - original_num_tuples;
    +  std::vector<attribute_id> simple_remap;
    +  for (attribute_id attr_id = 0; 
    +			attr_id < static_cast<attribute_id>(relation_.size());
    +			++attr_id) {
    +    simple_remap.push_back(attr_id);
    +  }
    +  return bulkInsertTuplesWithRemappedAttributes(simple_remap, accessor);
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuplesWithRemappedAttributes(
         const std::vector<attribute_id> &attribute_map,
         ValueAccessor *accessor) {
    -  DEBUG_ASSERT(attribute_map.size() == relation_.size());
    +  DCHECK_EQ(relation_.size(), attribute_map.size());
       const tuple_id original_num_tuples = header_->num_tuples;
       tuple_id pos = 0;
     
    +  BasicInsertInfo insertInfo(relation_);
    +
       InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    +    accessor,
    +    [&](auto *accessor) -> void {  // NOLINT(build/c++11
    +      while (accessor->next()) {
    +        // If packed, insert at the end of the slot array, otherwise find the
    +        // first hole.
    +        pos = this->isPacked() ? header_->num_tuples
    +                               : occupancy_bitmap_->firstZero(pos);
    +
    +        // Only calculate space used if needed.
    +        if (!this->spaceToInsert(pos, insertInfo.max_var_length_)) {
               const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(
    -                  relation_, *accessor, attribute_map);
    +            = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(relation_, *accessor,
    +                                                                                     attribute_map);
               if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
                 accessor->previous();
                 break;
               }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
             }
    -      } else {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(*attr_map_it);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    +
    +        // Find the slot and locate its sub-structures.
    +        void *tuple_slot = static_cast<char *>(tuple_storage_) + pos * tuple_slot_bytes_;
    +
    +        BitVector<true> tuple_null_bitmap(tuple_slot, insertInfo.num_nullable_attrs_);
    +        tuple_null_bitmap.clear();
    +        char *fixed_length_attr_storage = static_cast<char *>(tuple_slot) + insertInfo.fixed_len_offset_;
    +        std::uint32_t *variable_length_info_array =
    +          reinterpret_cast<std::uint32_t *>(static_cast<char *>(tuple_slot) + insertInfo.var_len_offset_);
    +
    +        // Start writing variable-length data at the beginning of the
    +        // newly allocated range.
    +        std::size_t current_variable_position = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    +        std::uint32_t current_null_idx = 0;
    +        for (attribute_id accessor_attr_id = 0;
    +             static_cast<std::size_t >(accessor_attr_id) < insertInfo.num_attrs_; ++accessor_attr_id) {
    +          bool nullable = insertInfo.is_nullable_.getBit(accessor_attr_id);
    +          bool variable = insertInfo.is_variable_.getBit(accessor_attr_id);
    +
    +          if (!nullable && !variable) {
    +            DCHECK_EQ(-1, relation_.getNullableAttributeIndex(accessor_attr_id));
    +
    +            const void *attr_value = accessor->template getUntypedValue<false>(attribute_map[accessor_attr_id]);
    +            std::memcpy(fixed_length_attr_storage + insertInfo.fixed_len_offsets_[accessor_attr_id],
    +                        attr_value,
    +                        insertInfo.fixed_len_sizes_[accessor_attr_id]);
    +          } else if (nullable && !variable) {
    +            DCHECK_EQ(relation_.getNullableAttributeIndex(accessor_attr_id), static_cast<int>(current_null_idx));
    +
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    --- End diff --
    
    Good catch. Initially I didn't do this because I thought it wouldn't be handled by the Compressed/ValueAccessors. However, scanning their code it seems like this will work, and I will make this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by navsan <gi...@git.apache.org>.
Github user navsan commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/100#discussion_r79702332
  
    --- Diff: storage/SplitRowStoreTupleStorageSubBlock.cpp ---
    @@ -194,379 +257,125 @@ TupleStorageSubBlock::InsertResult SplitRowStoreTupleStorageSubBlock::insertTupl
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuples(ValueAccessor *accessor) {
    -  const tuple_id original_num_tuples = header_->num_tuples;
    -  tuple_id pos = 0;
    -
    -  InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          // If packed, insert at the end of the slot array, otherwise find the
    -          // first hole.
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), true>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          // Allocate variable-length storage.
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          // Find the slot and locate its sub-structures.
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          // Start writing variable-length data at the beginning of the newly
    -          // allocated range.
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              // Set null bit and move on.
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              // Write offset and size into the slot, then copy the actual
    -              // value into the variable-length storage region.
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              // Copy fixed-length value directly into the slot.
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          // Update occupancy bitmap and header.
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Same as above, but skip variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(accessor_attr_id);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    -            } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        // Same as most general case above, but skip null checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), false>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Simplest case: skip both null and variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -            std::memcpy(fixed_length_attr_storage
    -                            + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                        attr_value,
    -                        attr_it->getType().maximumByteLength());
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    }
    -  });
    -
    -  return header_->num_tuples - original_num_tuples;
    +  std::vector<attribute_id> simple_remap;
    +  for (attribute_id attr_id = 0; 
    +			attr_id < static_cast<attribute_id>(relation_.size());
    +			++attr_id) {
    +    simple_remap.push_back(attr_id);
    +  }
    +  return bulkInsertTuplesWithRemappedAttributes(simple_remap, accessor);
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuplesWithRemappedAttributes(
         const std::vector<attribute_id> &attribute_map,
         ValueAccessor *accessor) {
    -  DEBUG_ASSERT(attribute_map.size() == relation_.size());
    +  DCHECK_EQ(relation_.size(), attribute_map.size());
       const tuple_id original_num_tuples = header_->num_tuples;
       tuple_id pos = 0;
     
    +  BasicInsertInfo insertInfo(relation_);
    +
       InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    +    accessor,
    +    [&](auto *accessor) -> void {  // NOLINT(build/c++11
    +      while (accessor->next()) {
    +        // If packed, insert at the end of the slot array, otherwise find the
    +        // first hole.
    +        pos = this->isPacked() ? header_->num_tuples
    +                               : occupancy_bitmap_->firstZero(pos);
    +
    +        // Only calculate space used if needed.
    +        if (!this->spaceToInsert(pos, insertInfo.max_var_length_)) {
               const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(
    -                  relation_, *accessor, attribute_map);
    +            = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(relation_, *accessor,
    +                                                                                     attribute_map);
               if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
                 accessor->previous();
                 break;
               }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
             }
    -      } else {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(*attr_map_it);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    +
    +        // Find the slot and locate its sub-structures.
    +        void *tuple_slot = static_cast<char *>(tuple_storage_) + pos * tuple_slot_bytes_;
    +
    +        BitVector<true> tuple_null_bitmap(tuple_slot, insertInfo.num_nullable_attrs_);
    +        tuple_null_bitmap.clear();
    +        char *fixed_length_attr_storage = static_cast<char *>(tuple_slot) + insertInfo.fixed_len_offset_;
    +        std::uint32_t *variable_length_info_array =
    +          reinterpret_cast<std::uint32_t *>(static_cast<char *>(tuple_slot) + insertInfo.var_len_offset_);
    +
    +        // Start writing variable-length data at the beginning of the
    +        // newly allocated range.
    +        std::size_t current_variable_position = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    +        std::uint32_t current_null_idx = 0;
    +        for (attribute_id accessor_attr_id = 0;
    +             static_cast<std::size_t >(accessor_attr_id) < insertInfo.num_attrs_; ++accessor_attr_id) {
    +          bool nullable = insertInfo.is_nullable_.getBit(accessor_attr_id);
    +          bool variable = insertInfo.is_variable_.getBit(accessor_attr_id);
    +
    +          if (!nullable && !variable) {
    +            DCHECK_EQ(-1, relation_.getNullableAttributeIndex(accessor_attr_id));
    +
    +            const void *attr_value = accessor->template getUntypedValue<false>(attribute_map[accessor_attr_id]);
    +            std::memcpy(fixed_length_attr_storage + insertInfo.fixed_len_offsets_[accessor_attr_id],
    +                        attr_value,
    +                        insertInfo.fixed_len_sizes_[accessor_attr_id]);
    +          } else if (nullable && !variable) {
    +            DCHECK_EQ(relation_.getNullableAttributeIndex(accessor_attr_id), static_cast<int>(current_null_idx));
    +
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    +            if (attr_value.isNull()) {
    +              tuple_null_bitmap.setBit(current_null_idx, true);
                 } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(*attr_map_it);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    +              std::memcpy(fixed_length_attr_storage + insertInfo.fixed_len_offsets_[accessor_attr_id],
    +                          attr_value.getDataPtr(),
    +                          insertInfo.fixed_len_sizes_[accessor_attr_id]);
                 }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), false>(
    -                  relation_, *accessor, attribute_map);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    +            current_null_idx++;
    +          } else if (!nullable && variable) {
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    --- End diff --
    
    Here, there's no way around using TypedValue right now. But please leave a note that we should change this to use a new (as yet unimplemented) getVariableLengthUntypedValue() function that returns both a void* pointer as well as a size.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by navsan <gi...@git.apache.org>.
Github user navsan commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/100#discussion_r79692864
  
    --- Diff: storage/SplitRowStoreTupleStorageSubBlock.cpp ---
    @@ -194,379 +257,125 @@ TupleStorageSubBlock::InsertResult SplitRowStoreTupleStorageSubBlock::insertTupl
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuples(ValueAccessor *accessor) {
    -  const tuple_id original_num_tuples = header_->num_tuples;
    -  tuple_id pos = 0;
    -
    -  InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          // If packed, insert at the end of the slot array, otherwise find the
    -          // first hole.
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), true>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          // Allocate variable-length storage.
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          // Find the slot and locate its sub-structures.
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          // Start writing variable-length data at the beginning of the newly
    -          // allocated range.
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              // Set null bit and move on.
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              // Write offset and size into the slot, then copy the actual
    -              // value into the variable-length storage region.
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              // Copy fixed-length value directly into the slot.
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          // Update occupancy bitmap and header.
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Same as above, but skip variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(accessor_attr_id);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    -            } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        // Same as most general case above, but skip null checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), false>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Simplest case: skip both null and variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -            std::memcpy(fixed_length_attr_storage
    -                            + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                        attr_value,
    -                        attr_it->getType().maximumByteLength());
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    }
    -  });
    -
    -  return header_->num_tuples - original_num_tuples;
    +  std::vector<attribute_id> simple_remap;
    +  for (attribute_id attr_id = 0; 
    +			attr_id < static_cast<attribute_id>(relation_.size());
    +			++attr_id) {
    +    simple_remap.push_back(attr_id);
    +  }
    +  return bulkInsertTuplesWithRemappedAttributes(simple_remap, accessor);
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuplesWithRemappedAttributes(
         const std::vector<attribute_id> &attribute_map,
         ValueAccessor *accessor) {
    -  DEBUG_ASSERT(attribute_map.size() == relation_.size());
    +  DCHECK_EQ(relation_.size(), attribute_map.size());
       const tuple_id original_num_tuples = header_->num_tuples;
       tuple_id pos = 0;
     
    +  BasicInsertInfo insertInfo(relation_);
    +
       InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    +    accessor,
    +    [&](auto *accessor) -> void {  // NOLINT(build/c++11
    +      while (accessor->next()) {
    +        // If packed, insert at the end of the slot array, otherwise find the
    +        // first hole.
    +        pos = this->isPacked() ? header_->num_tuples
    +                               : occupancy_bitmap_->firstZero(pos);
    +
    +        // Only calculate space used if needed.
    +        if (!this->spaceToInsert(pos, insertInfo.max_var_length_)) {
    --- End diff --
    
    As discussed, let's move this function out of the loop by pre-calculating a conservative estimate of the available space. See my PackedRowStore bulkInsert implementation for example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by cramja <gi...@git.apache.org>.
Github user cramja commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/100#discussion_r79721974
  
    --- Diff: storage/SplitRowStoreTupleStorageSubBlock.cpp ---
    @@ -194,379 +257,125 @@ TupleStorageSubBlock::InsertResult SplitRowStoreTupleStorageSubBlock::insertTupl
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuples(ValueAccessor *accessor) {
    -  const tuple_id original_num_tuples = header_->num_tuples;
    -  tuple_id pos = 0;
    -
    -  InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          // If packed, insert at the end of the slot array, otherwise find the
    -          // first hole.
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), true>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          // Allocate variable-length storage.
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          // Find the slot and locate its sub-structures.
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          // Start writing variable-length data at the beginning of the newly
    -          // allocated range.
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              // Set null bit and move on.
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              // Write offset and size into the slot, then copy the actual
    -              // value into the variable-length storage region.
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              // Copy fixed-length value directly into the slot.
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          // Update occupancy bitmap and header.
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Same as above, but skip variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(accessor_attr_id);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    -            } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        // Same as most general case above, but skip null checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), false>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Simplest case: skip both null and variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -            std::memcpy(fixed_length_attr_storage
    -                            + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                        attr_value,
    -                        attr_it->getType().maximumByteLength());
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    }
    -  });
    -
    -  return header_->num_tuples - original_num_tuples;
    +  std::vector<attribute_id> simple_remap;
    +  for (attribute_id attr_id = 0; 
    +			attr_id < static_cast<attribute_id>(relation_.size());
    +			++attr_id) {
    +    simple_remap.push_back(attr_id);
    +  }
    +  return bulkInsertTuplesWithRemappedAttributes(simple_remap, accessor);
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuplesWithRemappedAttributes(
         const std::vector<attribute_id> &attribute_map,
         ValueAccessor *accessor) {
    -  DEBUG_ASSERT(attribute_map.size() == relation_.size());
    +  DCHECK_EQ(relation_.size(), attribute_map.size());
       const tuple_id original_num_tuples = header_->num_tuples;
       tuple_id pos = 0;
     
    +  BasicInsertInfo insertInfo(relation_);
    +
       InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    +    accessor,
    +    [&](auto *accessor) -> void {  // NOLINT(build/c++11
    +      while (accessor->next()) {
    +        // If packed, insert at the end of the slot array, otherwise find the
    +        // first hole.
    +        pos = this->isPacked() ? header_->num_tuples
    --- End diff --
    
    Hmm, I agree this is an overly checky-check. However, I'm not going to address removing the delete functionality in this PR. That change would be large and not directly related to this change and so can wait for another PR. 
    
    What I can do is to remove the check and just continuously increment `pos` like you suggest. Filling in gaps can still be accomplished by using a non-bulk `insertTuple` or calling rebuild so no functionality will be lost.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by cramja <gi...@git.apache.org>.
Github user cramja commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/100#discussion_r79722961
  
    --- Diff: storage/SplitRowStoreTupleStorageSubBlock.cpp ---
    @@ -194,379 +257,125 @@ TupleStorageSubBlock::InsertResult SplitRowStoreTupleStorageSubBlock::insertTupl
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuples(ValueAccessor *accessor) {
    -  const tuple_id original_num_tuples = header_->num_tuples;
    -  tuple_id pos = 0;
    -
    -  InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          // If packed, insert at the end of the slot array, otherwise find the
    -          // first hole.
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), true>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          // Allocate variable-length storage.
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          // Find the slot and locate its sub-structures.
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          // Start writing variable-length data at the beginning of the newly
    -          // allocated range.
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              // Set null bit and move on.
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              // Write offset and size into the slot, then copy the actual
    -              // value into the variable-length storage region.
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              // Copy fixed-length value directly into the slot.
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          // Update occupancy bitmap and header.
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Same as above, but skip variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(accessor_attr_id);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    -            } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        // Same as most general case above, but skip null checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), false>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Simplest case: skip both null and variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -            std::memcpy(fixed_length_attr_storage
    -                            + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                        attr_value,
    -                        attr_it->getType().maximumByteLength());
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    }
    -  });
    -
    -  return header_->num_tuples - original_num_tuples;
    +  std::vector<attribute_id> simple_remap;
    +  for (attribute_id attr_id = 0; 
    +			attr_id < static_cast<attribute_id>(relation_.size());
    +			++attr_id) {
    +    simple_remap.push_back(attr_id);
    +  }
    +  return bulkInsertTuplesWithRemappedAttributes(simple_remap, accessor);
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuplesWithRemappedAttributes(
         const std::vector<attribute_id> &attribute_map,
         ValueAccessor *accessor) {
    -  DEBUG_ASSERT(attribute_map.size() == relation_.size());
    +  DCHECK_EQ(relation_.size(), attribute_map.size());
       const tuple_id original_num_tuples = header_->num_tuples;
       tuple_id pos = 0;
     
    +  BasicInsertInfo insertInfo(relation_);
    +
       InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    +    accessor,
    +    [&](auto *accessor) -> void {  // NOLINT(build/c++11
    +      while (accessor->next()) {
    +        // If packed, insert at the end of the slot array, otherwise find the
    +        // first hole.
    +        pos = this->isPacked() ? header_->num_tuples
    +                               : occupancy_bitmap_->firstZero(pos);
    +
    +        // Only calculate space used if needed.
    +        if (!this->spaceToInsert(pos, insertInfo.max_var_length_)) {
               const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(
    -                  relation_, *accessor, attribute_map);
    +            = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(relation_, *accessor,
    +                                                                                     attribute_map);
               if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
                 accessor->previous();
                 break;
               }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
             }
    -      } else {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(*attr_map_it);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    +
    +        // Find the slot and locate its sub-structures.
    +        void *tuple_slot = static_cast<char *>(tuple_storage_) + pos * tuple_slot_bytes_;
    +
    +        BitVector<true> tuple_null_bitmap(tuple_slot, insertInfo.num_nullable_attrs_);
    +        tuple_null_bitmap.clear();
    +        char *fixed_length_attr_storage = static_cast<char *>(tuple_slot) + insertInfo.fixed_len_offset_;
    +        std::uint32_t *variable_length_info_array =
    +          reinterpret_cast<std::uint32_t *>(static_cast<char *>(tuple_slot) + insertInfo.var_len_offset_);
    +
    +        // Start writing variable-length data at the beginning of the
    +        // newly allocated range.
    +        std::size_t current_variable_position = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    +        std::uint32_t current_null_idx = 0;
    +        for (attribute_id accessor_attr_id = 0;
    +             static_cast<std::size_t >(accessor_attr_id) < insertInfo.num_attrs_; ++accessor_attr_id) {
    +          bool nullable = insertInfo.is_nullable_.getBit(accessor_attr_id);
    +          bool variable = insertInfo.is_variable_.getBit(accessor_attr_id);
    +
    +          if (!nullable && !variable) {
    +            DCHECK_EQ(-1, relation_.getNullableAttributeIndex(accessor_attr_id));
    +
    +            const void *attr_value = accessor->template getUntypedValue<false>(attribute_map[accessor_attr_id]);
    +            std::memcpy(fixed_length_attr_storage + insertInfo.fixed_len_offsets_[accessor_attr_id],
    +                        attr_value,
    +                        insertInfo.fixed_len_sizes_[accessor_attr_id]);
    +          } else if (nullable && !variable) {
    +            DCHECK_EQ(relation_.getNullableAttributeIndex(accessor_attr_id), static_cast<int>(current_null_idx));
    +
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    +            if (attr_value.isNull()) {
    --- End diff --
    
    Yes but we'll still need to set the null bit anyways so I don't see a change here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by navsan <gi...@git.apache.org>.
Github user navsan commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/100#discussion_r79695417
  
    --- Diff: storage/SplitRowStoreTupleStorageSubBlock.cpp ---
    @@ -88,6 +88,67 @@ inline std::size_t CalculateVariableSizeWithRemappedAttributes(
       return total_size;
     }
     
    +
    +/**
    + * A struct which holds the offset information for a non-remapping insert
    + * operation
    + */
    +struct BasicInsertInfo {
    +  BasicInsertInfo(
    +    const CatalogRelationSchema &relation)
    +    : num_attrs_(relation.size()),
    +      num_nullable_attrs_(relation.numNullableAttributes()),
    +      max_var_length_(relation.getMaximumVariableByteLength()),
    +      fixed_len_offset_(BitVector<true>::BytesNeeded(num_nullable_attrs_)),
    +      var_len_offset_(fixed_len_offset_ + relation.getFixedByteLength()),
    +      is_variable_(num_attrs_),
    +      is_nullable_(num_attrs_),
    +      fixed_len_offsets_(num_attrs_),
    +      fixed_len_sizes_(num_attrs_),
    +      var_len_offsets_(num_attrs_) {
    +    attribute_id accessor_attr_id = 0;
    +    for (CatalogRelationSchema::const_iterator attr_it = relation.begin();
    +         attr_it != relation.end();
    +         ++attr_it, ++accessor_attr_id) {
    +      DCHECK_EQ(accessor_attr_id, attr_it->getID());
    +
    +      const int nullable_idx = relation.getNullableAttributeIndex(accessor_attr_id);
    +      const int variable_idx = relation.getVariableLengthAttributeIndex(accessor_attr_id);
    +      is_nullable_.setBit(accessor_attr_id, nullable_idx != -1);
    +
    +      if (variable_idx == -1) {
    +        is_variable_.setBit(accessor_attr_id, false);
    +        fixed_len_offsets_[accessor_attr_id] = relation.getFixedLengthAttributeOffset(accessor_attr_id);
    +        fixed_len_sizes_[accessor_attr_id] = relation.getAttributeById(
    +          accessor_attr_id)->getType().maximumByteLength();
    +        var_len_offsets_[accessor_attr_id] = -1;
    +      } else {
    +        is_variable_.setBit(accessor_attr_id, true);
    +        fixed_len_offsets_[accessor_attr_id] = 0;
    +        fixed_len_sizes_[accessor_attr_id] = 0;
    +        var_len_offsets_[accessor_attr_id] = relation.getVariableLengthAttributeIndex(accessor_attr_id);
    +      }
    +    }
    +  }
    +
    +  std::size_t num_attrs_;
    +  std::size_t num_nullable_attrs_;
    +  std::size_t max_var_length_;
    +
    +  // byte offset from the beginning of a tuple to the first fixed length attribute
    +  std::uint32_t fixed_len_offset_;
    +  // byte offset from the beginning of a tuple to the first variable length offset/length pair
    +  std::uint32_t var_len_offset_;
    +
    +  BitVector<true> is_variable_;
    --- End diff --
    
    Packing these Booleans into a bitvector is needlessly expensive. Just use straight up 1-byte bools instead. We only have a few columns, so the space overhead is negligible. But doing so will avoid the complexity/cost of bit arithmetic to do lookups during insertion.
    [This applies to both the bitmaps.]


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #100: Refactor bulk insert for SplitRowStore

Posted by navsan <gi...@git.apache.org>.
Github user navsan commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/100
  
    Please address the comments: mostly to do with performance improvements. Note that the objective here is not just to make SplitRowStore faster than it is today, but to get it to the point where we can drop PackedRowStore. We're not close to that right now, because we're doing far too much work in the inner loop. 
    
    Also, 
    - Consider getting rid of occupancy_bitmap_
    - Consider moving the null_bitmap into the header as a single bitmap for the entire tuple storage subblock. 
    - Consider getting rid of TypedValue 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by navsan <gi...@git.apache.org>.
Github user navsan commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/100#discussion_r79693915
  
    --- Diff: storage/SplitRowStoreTupleStorageSubBlock.cpp ---
    @@ -194,379 +257,125 @@ TupleStorageSubBlock::InsertResult SplitRowStoreTupleStorageSubBlock::insertTupl
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuples(ValueAccessor *accessor) {
    -  const tuple_id original_num_tuples = header_->num_tuples;
    -  tuple_id pos = 0;
    -
    -  InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          // If packed, insert at the end of the slot array, otherwise find the
    -          // first hole.
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), true>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          // Allocate variable-length storage.
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          // Find the slot and locate its sub-structures.
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          // Start writing variable-length data at the beginning of the newly
    -          // allocated range.
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              // Set null bit and move on.
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              // Write offset and size into the slot, then copy the actual
    -              // value into the variable-length storage region.
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              // Copy fixed-length value directly into the slot.
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          // Update occupancy bitmap and header.
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Same as above, but skip variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(accessor_attr_id);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    -            } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        // Same as most general case above, but skip null checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), false>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Simplest case: skip both null and variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -            std::memcpy(fixed_length_attr_storage
    -                            + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                        attr_value,
    -                        attr_it->getType().maximumByteLength());
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    }
    -  });
    -
    -  return header_->num_tuples - original_num_tuples;
    +  std::vector<attribute_id> simple_remap;
    +  for (attribute_id attr_id = 0; 
    +			attr_id < static_cast<attribute_id>(relation_.size());
    +			++attr_id) {
    +    simple_remap.push_back(attr_id);
    +  }
    +  return bulkInsertTuplesWithRemappedAttributes(simple_remap, accessor);
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuplesWithRemappedAttributes(
         const std::vector<attribute_id> &attribute_map,
         ValueAccessor *accessor) {
    -  DEBUG_ASSERT(attribute_map.size() == relation_.size());
    +  DCHECK_EQ(relation_.size(), attribute_map.size());
       const tuple_id original_num_tuples = header_->num_tuples;
       tuple_id pos = 0;
     
    +  BasicInsertInfo insertInfo(relation_);
    +
       InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    +    accessor,
    +    [&](auto *accessor) -> void {  // NOLINT(build/c++11
    +      while (accessor->next()) {
    +        // If packed, insert at the end of the slot array, otherwise find the
    +        // first hole.
    +        pos = this->isPacked() ? header_->num_tuples
    +                               : occupancy_bitmap_->firstZero(pos);
    +
    +        // Only calculate space used if needed.
    +        if (!this->spaceToInsert(pos, insertInfo.max_var_length_)) {
               const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(
    -                  relation_, *accessor, attribute_map);
    +            = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(relation_, *accessor,
    +                                                                                     attribute_map);
               if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
                 accessor->previous();
                 break;
               }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
             }
    -      } else {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(*attr_map_it);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    +
    +        // Find the slot and locate its sub-structures.
    +        void *tuple_slot = static_cast<char *>(tuple_storage_) + pos * tuple_slot_bytes_;
    --- End diff --
    
    Can't we move this calculation out of the loop and keep just a tuple_slot increment here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by navsan <gi...@git.apache.org>.
Github user navsan commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/100#discussion_r79702587
  
    --- Diff: storage/SplitRowStoreTupleStorageSubBlock.cpp ---
    @@ -194,379 +257,125 @@ TupleStorageSubBlock::InsertResult SplitRowStoreTupleStorageSubBlock::insertTupl
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuples(ValueAccessor *accessor) {
    -  const tuple_id original_num_tuples = header_->num_tuples;
    -  tuple_id pos = 0;
    -
    -  InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          // If packed, insert at the end of the slot array, otherwise find the
    -          // first hole.
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), true>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          // Allocate variable-length storage.
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          // Find the slot and locate its sub-structures.
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          // Start writing variable-length data at the beginning of the newly
    -          // allocated range.
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              // Set null bit and move on.
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              // Write offset and size into the slot, then copy the actual
    -              // value into the variable-length storage region.
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              // Copy fixed-length value directly into the slot.
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          // Update occupancy bitmap and header.
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Same as above, but skip variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(accessor_attr_id);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    -            } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        // Same as most general case above, but skip null checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), false>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Simplest case: skip both null and variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -            std::memcpy(fixed_length_attr_storage
    -                            + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                        attr_value,
    -                        attr_it->getType().maximumByteLength());
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    }
    -  });
    -
    -  return header_->num_tuples - original_num_tuples;
    +  std::vector<attribute_id> simple_remap;
    +  for (attribute_id attr_id = 0; 
    +			attr_id < static_cast<attribute_id>(relation_.size());
    +			++attr_id) {
    +    simple_remap.push_back(attr_id);
    +  }
    +  return bulkInsertTuplesWithRemappedAttributes(simple_remap, accessor);
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuplesWithRemappedAttributes(
         const std::vector<attribute_id> &attribute_map,
         ValueAccessor *accessor) {
    -  DEBUG_ASSERT(attribute_map.size() == relation_.size());
    +  DCHECK_EQ(relation_.size(), attribute_map.size());
       const tuple_id original_num_tuples = header_->num_tuples;
       tuple_id pos = 0;
     
    +  BasicInsertInfo insertInfo(relation_);
    +
       InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    +    accessor,
    +    [&](auto *accessor) -> void {  // NOLINT(build/c++11
    +      while (accessor->next()) {
    +        // If packed, insert at the end of the slot array, otherwise find the
    +        // first hole.
    +        pos = this->isPacked() ? header_->num_tuples
    +                               : occupancy_bitmap_->firstZero(pos);
    +
    +        // Only calculate space used if needed.
    +        if (!this->spaceToInsert(pos, insertInfo.max_var_length_)) {
               const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(
    -                  relation_, *accessor, attribute_map);
    +            = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(relation_, *accessor,
    +                                                                                     attribute_map);
               if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
                 accessor->previous();
                 break;
               }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
             }
    -      } else {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(*attr_map_it);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    +
    +        // Find the slot and locate its sub-structures.
    +        void *tuple_slot = static_cast<char *>(tuple_storage_) + pos * tuple_slot_bytes_;
    +
    +        BitVector<true> tuple_null_bitmap(tuple_slot, insertInfo.num_nullable_attrs_);
    +        tuple_null_bitmap.clear();
    +        char *fixed_length_attr_storage = static_cast<char *>(tuple_slot) + insertInfo.fixed_len_offset_;
    +        std::uint32_t *variable_length_info_array =
    +          reinterpret_cast<std::uint32_t *>(static_cast<char *>(tuple_slot) + insertInfo.var_len_offset_);
    +
    +        // Start writing variable-length data at the beginning of the
    +        // newly allocated range.
    +        std::size_t current_variable_position = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    +        std::uint32_t current_null_idx = 0;
    +        for (attribute_id accessor_attr_id = 0;
    +             static_cast<std::size_t >(accessor_attr_id) < insertInfo.num_attrs_; ++accessor_attr_id) {
    +          bool nullable = insertInfo.is_nullable_.getBit(accessor_attr_id);
    +          bool variable = insertInfo.is_variable_.getBit(accessor_attr_id);
    +
    +          if (!nullable && !variable) {
    +            DCHECK_EQ(-1, relation_.getNullableAttributeIndex(accessor_attr_id));
    +
    +            const void *attr_value = accessor->template getUntypedValue<false>(attribute_map[accessor_attr_id]);
    +            std::memcpy(fixed_length_attr_storage + insertInfo.fixed_len_offsets_[accessor_attr_id],
    +                        attr_value,
    +                        insertInfo.fixed_len_sizes_[accessor_attr_id]);
    +          } else if (nullable && !variable) {
    +            DCHECK_EQ(relation_.getNullableAttributeIndex(accessor_attr_id), static_cast<int>(current_null_idx));
    +
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    +            if (attr_value.isNull()) {
    +              tuple_null_bitmap.setBit(current_null_idx, true);
                 } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(*attr_map_it);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    +              std::memcpy(fixed_length_attr_storage + insertInfo.fixed_len_offsets_[accessor_attr_id],
    +                          attr_value.getDataPtr(),
    +                          insertInfo.fixed_len_sizes_[accessor_attr_id]);
                 }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), false>(
    -                  relation_, *accessor, attribute_map);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    +            current_null_idx++;
    +          } else if (!nullable && variable) {
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    +
    +            DCHECK_EQ(-1, relation_.getNullableAttributeIndex(accessor_attr_id));
    +            DCHECK_EQ(insertInfo.var_len_offsets_[accessor_attr_id],
    +                      relation_.getVariableLengthAttributeIndex(accessor_attr_id));
    +            DCHECK(!attr_value.isNull());
    +
    +            const std::size_t attr_size = attr_value.getDataSize();
    +            current_variable_position -= attr_size;
    +            const int var_len_info_idx = insertInfo.var_len_offsets_[accessor_attr_id] * 2;
    +            variable_length_info_array[var_len_info_idx] = current_variable_position;
    +            variable_length_info_array[var_len_info_idx + 1] = attr_size;
    +            attr_value.copyInto(static_cast<char *>(tuple_storage_) + current_variable_position);
    --- End diff --
    
    This copyInto function uses bitshifts and has a needless branch in our codepath. Let's just use a memcpy with the pointer and attr_size known at this point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by cramja <gi...@git.apache.org>.
Github user cramja commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/100#discussion_r79723030
  
    --- Diff: storage/SplitRowStoreTupleStorageSubBlock.cpp ---
    @@ -194,379 +257,125 @@ TupleStorageSubBlock::InsertResult SplitRowStoreTupleStorageSubBlock::insertTupl
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuples(ValueAccessor *accessor) {
    -  const tuple_id original_num_tuples = header_->num_tuples;
    -  tuple_id pos = 0;
    -
    -  InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          // If packed, insert at the end of the slot array, otherwise find the
    -          // first hole.
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), true>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          // Allocate variable-length storage.
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          // Find the slot and locate its sub-structures.
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          // Start writing variable-length data at the beginning of the newly
    -          // allocated range.
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              // Set null bit and move on.
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              // Write offset and size into the slot, then copy the actual
    -              // value into the variable-length storage region.
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              // Copy fixed-length value directly into the slot.
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          // Update occupancy bitmap and header.
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Same as above, but skip variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(accessor_attr_id);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    -            } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        // Same as most general case above, but skip null checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), false>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Simplest case: skip both null and variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -            std::memcpy(fixed_length_attr_storage
    -                            + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                        attr_value,
    -                        attr_it->getType().maximumByteLength());
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    }
    -  });
    -
    -  return header_->num_tuples - original_num_tuples;
    +  std::vector<attribute_id> simple_remap;
    +  for (attribute_id attr_id = 0; 
    +			attr_id < static_cast<attribute_id>(relation_.size());
    +			++attr_id) {
    +    simple_remap.push_back(attr_id);
    +  }
    +  return bulkInsertTuplesWithRemappedAttributes(simple_remap, accessor);
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuplesWithRemappedAttributes(
         const std::vector<attribute_id> &attribute_map,
         ValueAccessor *accessor) {
    -  DEBUG_ASSERT(attribute_map.size() == relation_.size());
    +  DCHECK_EQ(relation_.size(), attribute_map.size());
       const tuple_id original_num_tuples = header_->num_tuples;
       tuple_id pos = 0;
     
    +  BasicInsertInfo insertInfo(relation_);
    +
       InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    +    accessor,
    +    [&](auto *accessor) -> void {  // NOLINT(build/c++11
    +      while (accessor->next()) {
    +        // If packed, insert at the end of the slot array, otherwise find the
    +        // first hole.
    +        pos = this->isPacked() ? header_->num_tuples
    +                               : occupancy_bitmap_->firstZero(pos);
    +
    +        // Only calculate space used if needed.
    +        if (!this->spaceToInsert(pos, insertInfo.max_var_length_)) {
               const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(
    -                  relation_, *accessor, attribute_map);
    +            = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(relation_, *accessor,
    +                                                                                     attribute_map);
               if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
                 accessor->previous();
                 break;
               }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
             }
    -      } else {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(*attr_map_it);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    +
    +        // Find the slot and locate its sub-structures.
    +        void *tuple_slot = static_cast<char *>(tuple_storage_) + pos * tuple_slot_bytes_;
    +
    +        BitVector<true> tuple_null_bitmap(tuple_slot, insertInfo.num_nullable_attrs_);
    +        tuple_null_bitmap.clear();
    +        char *fixed_length_attr_storage = static_cast<char *>(tuple_slot) + insertInfo.fixed_len_offset_;
    +        std::uint32_t *variable_length_info_array =
    +          reinterpret_cast<std::uint32_t *>(static_cast<char *>(tuple_slot) + insertInfo.var_len_offset_);
    +
    +        // Start writing variable-length data at the beginning of the
    +        // newly allocated range.
    +        std::size_t current_variable_position = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    +        std::uint32_t current_null_idx = 0;
    +        for (attribute_id accessor_attr_id = 0;
    +             static_cast<std::size_t >(accessor_attr_id) < insertInfo.num_attrs_; ++accessor_attr_id) {
    +          bool nullable = insertInfo.is_nullable_.getBit(accessor_attr_id);
    +          bool variable = insertInfo.is_variable_.getBit(accessor_attr_id);
    +
    +          if (!nullable && !variable) {
    +            DCHECK_EQ(-1, relation_.getNullableAttributeIndex(accessor_attr_id));
    +
    +            const void *attr_value = accessor->template getUntypedValue<false>(attribute_map[accessor_attr_id]);
    +            std::memcpy(fixed_length_attr_storage + insertInfo.fixed_len_offsets_[accessor_attr_id],
    +                        attr_value,
    +                        insertInfo.fixed_len_sizes_[accessor_attr_id]);
    +          } else if (nullable && !variable) {
    +            DCHECK_EQ(relation_.getNullableAttributeIndex(accessor_attr_id), static_cast<int>(current_null_idx));
    +
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    +            if (attr_value.isNull()) {
    +              tuple_null_bitmap.setBit(current_null_idx, true);
                 } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(*attr_map_it);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    +              std::memcpy(fixed_length_attr_storage + insertInfo.fixed_len_offsets_[accessor_attr_id],
    +                          attr_value.getDataPtr(),
    +                          insertInfo.fixed_len_sizes_[accessor_attr_id]);
                 }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), false>(
    -                  relation_, *accessor, attribute_map);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    +            current_null_idx++;
    +          } else if (!nullable && variable) {
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    --- End diff --
    
    Okay, I will add `TODOs`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by cramja <gi...@git.apache.org>.
Github user cramja commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/100#discussion_r79721010
  
    --- Diff: storage/SplitRowStoreTupleStorageSubBlock.cpp ---
    @@ -194,379 +257,125 @@ TupleStorageSubBlock::InsertResult SplitRowStoreTupleStorageSubBlock::insertTupl
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuples(ValueAccessor *accessor) {
    -  const tuple_id original_num_tuples = header_->num_tuples;
    -  tuple_id pos = 0;
    -
    -  InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          // If packed, insert at the end of the slot array, otherwise find the
    -          // first hole.
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), true>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          // Allocate variable-length storage.
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          // Find the slot and locate its sub-structures.
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          // Start writing variable-length data at the beginning of the newly
    -          // allocated range.
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              // Set null bit and move on.
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              // Write offset and size into the slot, then copy the actual
    -              // value into the variable-length storage region.
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              // Copy fixed-length value directly into the slot.
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          // Update occupancy bitmap and header.
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Same as above, but skip variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(accessor_attr_id);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    -            } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        // Same as most general case above, but skip null checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), false>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Simplest case: skip both null and variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -            std::memcpy(fixed_length_attr_storage
    -                            + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                        attr_value,
    -                        attr_it->getType().maximumByteLength());
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    }
    -  });
    -
    -  return header_->num_tuples - original_num_tuples;
    +  std::vector<attribute_id> simple_remap;
    +  for (attribute_id attr_id = 0; 
    +			attr_id < static_cast<attribute_id>(relation_.size());
    +			++attr_id) {
    +    simple_remap.push_back(attr_id);
    +  }
    +  return bulkInsertTuplesWithRemappedAttributes(simple_remap, accessor);
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuplesWithRemappedAttributes(
         const std::vector<attribute_id> &attribute_map,
         ValueAccessor *accessor) {
    -  DEBUG_ASSERT(attribute_map.size() == relation_.size());
    +  DCHECK_EQ(relation_.size(), attribute_map.size());
       const tuple_id original_num_tuples = header_->num_tuples;
       tuple_id pos = 0;
     
    +  BasicInsertInfo insertInfo(relation_);
    +
       InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    +    accessor,
    +    [&](auto *accessor) -> void {  // NOLINT(build/c++11
    +      while (accessor->next()) {
    +        // If packed, insert at the end of the slot array, otherwise find the
    +        // first hole.
    +        pos = this->isPacked() ? header_->num_tuples
    +                               : occupancy_bitmap_->firstZero(pos);
    +
    +        // Only calculate space used if needed.
    +        if (!this->spaceToInsert(pos, insertInfo.max_var_length_)) {
               const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(
    -                  relation_, *accessor, attribute_map);
    +            = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(relation_, *accessor,
    +                                                                                     attribute_map);
               if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
                 accessor->previous();
                 break;
               }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
             }
    -      } else {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(*attr_map_it);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    +
    +        // Find the slot and locate its sub-structures.
    +        void *tuple_slot = static_cast<char *>(tuple_storage_) + pos * tuple_slot_bytes_;
    --- End diff --
    
    That's right. I will correct this.
    
    (Should a compiler be able to infer this?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #100: Refactor bulk insert for SplitRowStore

Posted by cramja <gi...@git.apache.org>.
Github user cramja commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/100
  
    @navsan I made the following changes:
    * Make per-column insert info
    * Moved some variables outside the loop
    * sped up the way that the block estimates the number of tuples it can insert
    
    After these changes, the experiment (mentioned in the header to this PR) speeds up to `7267.457 ms` with the same setup.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by cramja <gi...@git.apache.org>.
Github user cramja commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/100#discussion_r79723597
  
    --- Diff: storage/SplitRowStoreTupleStorageSubBlock.cpp ---
    @@ -194,379 +257,125 @@ TupleStorageSubBlock::InsertResult SplitRowStoreTupleStorageSubBlock::insertTupl
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuples(ValueAccessor *accessor) {
    -  const tuple_id original_num_tuples = header_->num_tuples;
    -  tuple_id pos = 0;
    -
    -  InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          // If packed, insert at the end of the slot array, otherwise find the
    -          // first hole.
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), true>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          // Allocate variable-length storage.
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          // Find the slot and locate its sub-structures.
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          // Start writing variable-length data at the beginning of the newly
    -          // allocated range.
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              // Set null bit and move on.
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              // Write offset and size into the slot, then copy the actual
    -              // value into the variable-length storage region.
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              // Copy fixed-length value directly into the slot.
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          // Update occupancy bitmap and header.
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Same as above, but skip variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(accessor_attr_id);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    -            } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        // Same as most general case above, but skip null checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), false>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Simplest case: skip both null and variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -            std::memcpy(fixed_length_attr_storage
    -                            + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                        attr_value,
    -                        attr_it->getType().maximumByteLength());
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    }
    -  });
    -
    -  return header_->num_tuples - original_num_tuples;
    +  std::vector<attribute_id> simple_remap;
    +  for (attribute_id attr_id = 0; 
    +			attr_id < static_cast<attribute_id>(relation_.size());
    +			++attr_id) {
    +    simple_remap.push_back(attr_id);
    +  }
    +  return bulkInsertTuplesWithRemappedAttributes(simple_remap, accessor);
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuplesWithRemappedAttributes(
         const std::vector<attribute_id> &attribute_map,
         ValueAccessor *accessor) {
    -  DEBUG_ASSERT(attribute_map.size() == relation_.size());
    +  DCHECK_EQ(relation_.size(), attribute_map.size());
       const tuple_id original_num_tuples = header_->num_tuples;
       tuple_id pos = 0;
     
    +  BasicInsertInfo insertInfo(relation_);
    +
       InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    +    accessor,
    +    [&](auto *accessor) -> void {  // NOLINT(build/c++11
    +      while (accessor->next()) {
    +        // If packed, insert at the end of the slot array, otherwise find the
    +        // first hole.
    +        pos = this->isPacked() ? header_->num_tuples
    +                               : occupancy_bitmap_->firstZero(pos);
    +
    +        // Only calculate space used if needed.
    +        if (!this->spaceToInsert(pos, insertInfo.max_var_length_)) {
               const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(
    -                  relation_, *accessor, attribute_map);
    +            = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(relation_, *accessor,
    +                                                                                     attribute_map);
               if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
                 accessor->previous();
                 break;
               }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
             }
    -      } else {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(*attr_map_it);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    +
    +        // Find the slot and locate its sub-structures.
    +        void *tuple_slot = static_cast<char *>(tuple_storage_) + pos * tuple_slot_bytes_;
    +
    +        BitVector<true> tuple_null_bitmap(tuple_slot, insertInfo.num_nullable_attrs_);
    +        tuple_null_bitmap.clear();
    +        char *fixed_length_attr_storage = static_cast<char *>(tuple_slot) + insertInfo.fixed_len_offset_;
    +        std::uint32_t *variable_length_info_array =
    +          reinterpret_cast<std::uint32_t *>(static_cast<char *>(tuple_slot) + insertInfo.var_len_offset_);
    +
    +        // Start writing variable-length data at the beginning of the
    +        // newly allocated range.
    +        std::size_t current_variable_position = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    +        std::uint32_t current_null_idx = 0;
    +        for (attribute_id accessor_attr_id = 0;
    +             static_cast<std::size_t >(accessor_attr_id) < insertInfo.num_attrs_; ++accessor_attr_id) {
    +          bool nullable = insertInfo.is_nullable_.getBit(accessor_attr_id);
    +          bool variable = insertInfo.is_variable_.getBit(accessor_attr_id);
    +
    +          if (!nullable && !variable) {
    +            DCHECK_EQ(-1, relation_.getNullableAttributeIndex(accessor_attr_id));
    +
    +            const void *attr_value = accessor->template getUntypedValue<false>(attribute_map[accessor_attr_id]);
    +            std::memcpy(fixed_length_attr_storage + insertInfo.fixed_len_offsets_[accessor_attr_id],
    +                        attr_value,
    +                        insertInfo.fixed_len_sizes_[accessor_attr_id]);
    +          } else if (nullable && !variable) {
    +            DCHECK_EQ(relation_.getNullableAttributeIndex(accessor_attr_id), static_cast<int>(current_null_idx));
    +
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    +            if (attr_value.isNull()) {
    +              tuple_null_bitmap.setBit(current_null_idx, true);
                 } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(*attr_map_it);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    +              std::memcpy(fixed_length_attr_storage + insertInfo.fixed_len_offsets_[accessor_attr_id],
    +                          attr_value.getDataPtr(),
    +                          insertInfo.fixed_len_sizes_[accessor_attr_id]);
                 }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), false>(
    -                  relation_, *accessor, attribute_map);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    +            current_null_idx++;
    +          } else if (!nullable && variable) {
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    +
    +            DCHECK_EQ(-1, relation_.getNullableAttributeIndex(accessor_attr_id));
    +            DCHECK_EQ(insertInfo.var_len_offsets_[accessor_attr_id],
    +                      relation_.getVariableLengthAttributeIndex(accessor_attr_id));
    +            DCHECK(!attr_value.isNull());
    +
    +            const std::size_t attr_size = attr_value.getDataSize();
    +            current_variable_position -= attr_size;
    +            const int var_len_info_idx = insertInfo.var_len_offsets_[accessor_attr_id] * 2;
    +            variable_length_info_array[var_len_info_idx] = current_variable_position;
    +            variable_length_info_array[var_len_info_idx + 1] = attr_size;
    +            attr_value.copyInto(static_cast<char *>(tuple_storage_) + current_variable_position);
    +
    +            header_->variable_length_bytes_allocated += attr_size;
    --- End diff --
    
    One less addition! Gotta catch em all\u2122 :)
    
    Part of the reason they do it incrementally is the dependence of method `spaceToInsert` on the header's information. 
    
    I will see if this change makes sense after refactoring the related code, and if not leave a comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by cramja <gi...@git.apache.org>.
Github user cramja closed the pull request at:

    https://github.com/apache/incubator-quickstep/pull/100


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #100: Refactor bulk insert for SplitRowStor...

Posted by navsan <gi...@git.apache.org>.
Github user navsan commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/100#discussion_r79702952
  
    --- Diff: storage/SplitRowStoreTupleStorageSubBlock.cpp ---
    @@ -194,379 +257,125 @@ TupleStorageSubBlock::InsertResult SplitRowStoreTupleStorageSubBlock::insertTupl
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuples(ValueAccessor *accessor) {
    -  const tuple_id original_num_tuples = header_->num_tuples;
    -  tuple_id pos = 0;
    -
    -  InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          // If packed, insert at the end of the slot array, otherwise find the
    -          // first hole.
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), true>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          // Allocate variable-length storage.
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          // Find the slot and locate its sub-structures.
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          // Start writing variable-length data at the beginning of the newly
    -          // allocated range.
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              // Set null bit and move on.
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              // Write offset and size into the slot, then copy the actual
    -              // value into the variable-length storage region.
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              // Copy fixed-length value directly into the slot.
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          // Update occupancy bitmap and header.
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Same as above, but skip variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(accessor_attr_id);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    -            } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        // Same as most general case above, but skip null checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSize<decltype(*accessor), false>(relation_, *accessor);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(accessor_attr_id));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      } else {
    -        // Simplest case: skip both null and variable-length checks.
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          attribute_id accessor_attr_id = 0;
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++accessor_attr_id) {
    -            const void *attr_value = accessor->template getUntypedValue<false>(accessor_attr_id);
    -            std::memcpy(fixed_length_attr_storage
    -                            + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                        attr_value,
    -                        attr_it->getType().maximumByteLength());
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    }
    -  });
    -
    -  return header_->num_tuples - original_num_tuples;
    +  std::vector<attribute_id> simple_remap;
    +  for (attribute_id attr_id = 0; 
    +			attr_id < static_cast<attribute_id>(relation_.size());
    +			++attr_id) {
    +    simple_remap.push_back(attr_id);
    +  }
    +  return bulkInsertTuplesWithRemappedAttributes(simple_remap, accessor);
     }
     
     tuple_id SplitRowStoreTupleStorageSubBlock::bulkInsertTuplesWithRemappedAttributes(
         const std::vector<attribute_id> &attribute_map,
         ValueAccessor *accessor) {
    -  DEBUG_ASSERT(attribute_map.size() == relation_.size());
    +  DCHECK_EQ(relation_.size(), attribute_map.size());
       const tuple_id original_num_tuples = header_->num_tuples;
       tuple_id pos = 0;
     
    +  BasicInsertInfo insertInfo(relation_);
    +
       InvokeOnAnyValueAccessor(
    -      accessor,
    -      [&](auto *accessor) -> void {  // NOLINT(build/c++11)
    -    if (relation_.hasNullableAttributes()) {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    +    accessor,
    +    [&](auto *accessor) -> void {  // NOLINT(build/c++11
    +      while (accessor->next()) {
    +        // If packed, insert at the end of the slot array, otherwise find the
    +        // first hole.
    +        pos = this->isPacked() ? header_->num_tuples
    +                               : occupancy_bitmap_->firstZero(pos);
    +
    +        // Only calculate space used if needed.
    +        if (!this->spaceToInsert(pos, insertInfo.max_var_length_)) {
               const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(
    -                  relation_, *accessor, attribute_map);
    +            = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), true>(relation_, *accessor,
    +                                                                                     attribute_map);
               if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
                 accessor->previous();
                 break;
               }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if ((nullable_idx != -1) && (attr_value.isNull())) {
    -              tuple_null_bitmap.setBit(nullable_idx, true);
    -              continue;
    -            }
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    -            } else {
    -              attr_value.copyInto(fixed_length_attr_storage
    -                                  + relation_.getFixedLengthAttributeOffset(attr_it->getID()));
    -            }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
             }
    -      } else {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          if (!this->spaceToInsert(pos, 0)) {
    -            accessor->previous();
    -            break;
    -          }
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          BitVector<true> tuple_null_bitmap(tuple_slot,
    -                                            relation_.numNullableAttributes());
    -          tuple_null_bitmap.clear();
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int nullable_idx = relation_.getNullableAttributeIndex(attr_it->getID());
    -            if (nullable_idx != -1) {
    -              const void *attr_value = accessor->template getUntypedValue<true>(*attr_map_it);
    -              if (attr_value == nullptr) {
    -                tuple_null_bitmap.setBit(nullable_idx, true);
    -              } else {
    -                std::memcpy(fixed_length_attr_storage
    -                                + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                            attr_value,
    -                            attr_it->getType().maximumByteLength());
    -              }
    +
    +        // Find the slot and locate its sub-structures.
    +        void *tuple_slot = static_cast<char *>(tuple_storage_) + pos * tuple_slot_bytes_;
    +
    +        BitVector<true> tuple_null_bitmap(tuple_slot, insertInfo.num_nullable_attrs_);
    +        tuple_null_bitmap.clear();
    +        char *fixed_length_attr_storage = static_cast<char *>(tuple_slot) + insertInfo.fixed_len_offset_;
    +        std::uint32_t *variable_length_info_array =
    +          reinterpret_cast<std::uint32_t *>(static_cast<char *>(tuple_slot) + insertInfo.var_len_offset_);
    +
    +        // Start writing variable-length data at the beginning of the
    +        // newly allocated range.
    +        std::size_t current_variable_position = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    +        std::uint32_t current_null_idx = 0;
    +        for (attribute_id accessor_attr_id = 0;
    +             static_cast<std::size_t >(accessor_attr_id) < insertInfo.num_attrs_; ++accessor_attr_id) {
    +          bool nullable = insertInfo.is_nullable_.getBit(accessor_attr_id);
    +          bool variable = insertInfo.is_variable_.getBit(accessor_attr_id);
    +
    +          if (!nullable && !variable) {
    +            DCHECK_EQ(-1, relation_.getNullableAttributeIndex(accessor_attr_id));
    +
    +            const void *attr_value = accessor->template getUntypedValue<false>(attribute_map[accessor_attr_id]);
    +            std::memcpy(fixed_length_attr_storage + insertInfo.fixed_len_offsets_[accessor_attr_id],
    +                        attr_value,
    +                        insertInfo.fixed_len_sizes_[accessor_attr_id]);
    +          } else if (nullable && !variable) {
    +            DCHECK_EQ(relation_.getNullableAttributeIndex(accessor_attr_id), static_cast<int>(current_null_idx));
    +
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    +            if (attr_value.isNull()) {
    +              tuple_null_bitmap.setBit(current_null_idx, true);
                 } else {
    -              const void *attr_value = accessor->template getUntypedValue<false>(*attr_map_it);
    -              std::memcpy(fixed_length_attr_storage
    -                              + relation_.getFixedLengthAttributeOffset(attr_it->getID()),
    -                          attr_value,
    -                          attr_it->getType().maximumByteLength());
    +              std::memcpy(fixed_length_attr_storage + insertInfo.fixed_len_offsets_[accessor_attr_id],
    +                          attr_value.getDataPtr(),
    +                          insertInfo.fixed_len_sizes_[accessor_attr_id]);
                 }
    -          }
    -          occupancy_bitmap_->setBit(pos, true);
    -          ++(header_->num_tuples);
    -          if (pos > header_->max_tid) {
    -            header_->max_tid = pos;
    -          }
    -        }
    -      }
    -    } else {
    -      if (relation_.isVariableLength()) {
    -        while (accessor->next()) {
    -          pos = this->isPacked() ? header_->num_tuples
    -                                 : occupancy_bitmap_->firstZero(pos);
    -          const std::size_t tuple_variable_bytes
    -              = CalculateVariableSizeWithRemappedAttributes<decltype(*accessor), false>(
    -                  relation_, *accessor, attribute_map);
    -          if (!this->spaceToInsert(pos, tuple_variable_bytes)) {
    -            accessor->previous();
    -            break;
    -          }
    -          header_->variable_length_bytes_allocated += tuple_variable_bytes;
    -
    -          void *tuple_slot = static_cast<char*>(tuple_storage_) + pos * tuple_slot_bytes_;
    -          char *fixed_length_attr_storage = static_cast<char*>(tuple_slot) + per_tuple_null_bitmap_bytes_;
    -          std::uint32_t *variable_length_info_array = reinterpret_cast<std::uint32_t*>(
    -              fixed_length_attr_storage + relation_.getFixedByteLength());
    -          std::uint32_t current_variable_position
    -              = tuple_storage_bytes_ - header_->variable_length_bytes_allocated;
    -
    -          std::vector<attribute_id>::const_iterator attr_map_it = attribute_map.begin();
    -          for (CatalogRelationSchema::const_iterator attr_it = relation_.begin();
    -               attr_it != relation_.end();
    -               ++attr_it, ++attr_map_it) {
    -            const int variable_idx = relation_.getVariableLengthAttributeIndex(attr_it->getID());
    -            TypedValue attr_value(accessor->getTypedValue(*attr_map_it));
    -            if (variable_idx != -1) {
    -              const std::size_t attr_size = attr_value.getDataSize();
    -              variable_length_info_array[variable_idx << 1] = current_variable_position;
    -              variable_length_info_array[(variable_idx << 1) + 1] = attr_size;
    -              attr_value.copyInto(static_cast<char*>(tuple_storage_) + current_variable_position);
    -              current_variable_position += attr_size;
    +            current_null_idx++;
    +          } else if (!nullable && variable) {
    +            TypedValue attr_value(accessor->getTypedValue(attribute_map[accessor_attr_id]));
    +
    +            DCHECK_EQ(-1, relation_.getNullableAttributeIndex(accessor_attr_id));
    +            DCHECK_EQ(insertInfo.var_len_offsets_[accessor_attr_id],
    +                      relation_.getVariableLengthAttributeIndex(accessor_attr_id));
    +            DCHECK(!attr_value.isNull());
    +
    +            const std::size_t attr_size = attr_value.getDataSize();
    +            current_variable_position -= attr_size;
    +            const int var_len_info_idx = insertInfo.var_len_offsets_[accessor_attr_id] * 2;
    +            variable_length_info_array[var_len_info_idx] = current_variable_position;
    +            variable_length_info_array[var_len_info_idx + 1] = attr_size;
    +            attr_value.copyInto(static_cast<char *>(tuple_storage_) + current_variable_position);
    +
    +            header_->variable_length_bytes_allocated += attr_size;
    +          } else {  // nullable, variable length
    +            DCHECK_EQ(static_cast<int>(current_null_idx), relation_.getNullableAttributeIndex(accessor_attr_id));
    --- End diff --
    
    Much of the code in this case is duplicated with the case above. Instead, just do the same thing as above (with the memcpy and no null-check) for all variable-length columns, and then do a null-check at the end for nullable ones. That way there's less code duplication, and almost exactly as much work (except in the pathological case of a large number of nulls). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---