You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2016/11/24 02:26:58 UTC

[jira] [Commented] (HBASE-15902) Scan Object

    [ https://issues.apache.org/jira/browse/HBASE-15902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691972#comment-15691972 ] 

Enis Soztutar commented on HBASE-15902:
---------------------------------------

Thanks Sudeep for the patch. This is a good start. 

- The patch file seems it is only for the second part of two commits: 
{code}
Subject: [PATCH 2/2] Adding sources for Scan object
{code}
Did you mean to attach the first part as well. I don't see changes to BUCK files, so maybe they are there? 

- This should be private: 
{code}
+  bool IsStartRowAndEqualsStopRow() const;
{code}

- Seems that we did not do the Query interface in the get patch. In the Java-land that is how we do carry the filter, as well as some extra methods. I would suggest that we should leave those out for now, and do another issue for the Query interface + the Filter methods. Query has bunch of other methods that we do not need, so we can discuss what to include there. 

- We are trying to simplify the Scan API elsewhere in the java client. Some of these will be deprecated and removed in java as well. There are a few methods that are in the Scan API to deal with how to do paging for large rows containing thousands / millions of cells. In our first cut, we would not deal with the complexity of passing partial results within rows at all. Once the client / scanner is in better shape, we can add that as a feature with better APIs (hopefully by that time Scan's API in Java will also be simplified). See https://issues.apache.org/jira/browse/HBASE-15484?focusedCommentId=15325647&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15325647. 

For now, let's remove these methods, and keep the API simple (again we can revisit in a later issue). 
{code}
+Scan& Scan::SetRowOffsetPerColumnFamily(int store_offset) {
+int Scan::RowOffsetPerColumnFamily() const {
+  int MaxResultsPerColumnFamily() const;
+  Scan &SetBatch(int batch);
+  int Batch() const;
{code}
So basically, we should remove everything related to rowOffsetPerColumnFamily, MaxResultsPerColumnFamily,and Batch. We can keep AllowPartialResults for now since that is unlikely to change. The first implementation probably will not deal with partial results though. Sorry that we should have talked about this before. 

- We can remove this method as well, since it is relevant only on the server side (gets are internally processed as a Scan with start row = stop row, that is why there this method is used in some server side code paths). 
{code}
+  bool IsGetScan() const;
{code}

- In this patch, we are allocating TimeRange objects in heap via a pointer, versus in the Get object, we are embedding those objects: 
{code}
+Scan &Scan::SetTimeRange(long min_stamp, long max_stamp) {
+  if (nullptr != tr_) {
+    delete tr_;
+    tr_ = nullptr;
+  }
+  tr_ = new TimeRange(min_stamp, max_stamp);
+  return *this;
+}
Get& Get::SetTimeRange(long min_timestamp, long max_timestamp) {
  this->tr_ = TimeRange(min_timestamp, max_timestamp);
  return *this;
}
{code}
We should stick with a single approach. Is the Get::SetTimeRange() ends up pointing to stack-allocated object (sorry my C++ memory model is rusty)? We should fix it if so.  


> Scan Object
> -----------
>
>                 Key: HBASE-15902
>                 URL: https://issues.apache.org/jira/browse/HBASE-15902
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Sudeep Sunthankar
>            Assignee: Sudeep Sunthankar
>         Attachments: HBASE-15902.HBASE-14850.patch, HBASE-15902.HBASE-14850.v2.patch
>
>
> Patch for creating Scan objects. Scan objects thus created can be used by Table implementation to fetch results for a given row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)