You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@phoenix.apache.org by ja...@apache.org on 2016/03/11 19:17:52 UTC
svn commit: r1734609 - in /phoenix/site: publish/faq.html source/src/site/markdown/faq.md

Author: jamestaylor
Date: Fri Mar 11 18:17:52 2016
New Revision: 1734609

URL: http://svn.apache.org/viewvc?rev=1734609&view=rev
Log:
Add new Why empty KeyValue FAQ

Modified:
    phoenix/site/publish/faq.html
    phoenix/site/source/src/site/markdown/faq.md

Modified: phoenix/site/publish/faq.html
URL: http://svn.apache.org/viewvc/phoenix/site/publish/faq.html?rev=1734609&r1=1734608&r2=1734609&view=diff
==============================================================================
--- phoenix/site/publish/faq.html (original)
+++ phoenix/site/publish/faq.html Fri Mar 11 18:17:52 2016
@@ -1,7 +1,7 @@
 
 <!DOCTYPE html>
 <!--
- Generated by Apache Maven Doxia at 2016-03-10
+ Generated by Apache Maven Doxia at 2016-03-11
  Rendered using Reflow Maven Skin 1.1.0 (http://andriusvelykis.github.io/reflow-maven-skin)
 -->
 <html  xml:lang="en" lang="en">
@@ -158,6 +158,7 @@
  <li><a href="#Can_phoenix_work_on_tables_with_arbitrary_timestamp_as_flexible_as_HBase_API">Can phoenix work on tables with arbitrary timestamp as flexible as HBase API?</a></li> 
  <li><a href="#Why_isnt_my_query_doing_a_RANGE_SCAN">Why isnât my query doing a RANGE SCAN?</a></li> 
  <li><a href="#Should_I_pool_Phoenix_JDBC_Connections">Should I pool Phoenix JDBC Connections?</a></li> 
+ <li><a href="#Why_empty_key_value">Why does Phoenix add an empty or dummy KeyValue when doing an upsert?</a></li> 
 </ul> 
 <div class="section"> 
  <div class="section"> 
@@ -363,6 +364,13 @@ conn.commit();
   <p>Phoenixâs Connection objects are different from most other JDBC Connections due to the underlying HBase connection. The Phoenix Connection object is designed to be a thin object that is inexpensive to create. If Phoenix Connections are reused, it is possible that the underlying HBase connection is not always left in a healthy state by the previous user. It is better to create new Phoenix Connections to ensure that you avoid any potential issues.</p> 
   <p>Implementing pooling for Phoenix could be done simply by creating a delegate Connection that instantiates a new Phoenix connection when retrieved from the pool and then closes the connection when returning it to the pool (see <a class="externalLink" href="https://issues.apache.org/jira/browse/PHOENIX-2388">PHOENIX-2388</a>).</p> 
  </div> 
+ <div class="section"> 
+  <h3 id="Why_empty_key_value">Why does Phoenix add an empty/dummy KeyValue when doing an upsert?<a name="Why_does_Phoenix_add_an_emptydummy_KeyValue_when_doing_an_upsert"></a></h3> 
+  <p>The empty or dummy KeyValue (with a column qualifier of _0) is needed to ensure that a given column is available for all rows.</p> 
+  <p>As you may know, data is stored in HBase as KeyValues, meaning that the full row key is stored for each column value. This also implies that the row key is not stored at all unless there is at least one column stored.</p> 
+  <p>Now consider JDBC row which has an integer primary key, and several columns which are all null. In order to be able to store the primary key, a KeyValue needs to be stored to show that the row is present at all. This column is represented by the empty column that youâve noticed. This allows doing a âSELECT * FROM TABLEâ and receiving records for all rows, even those whose non-pk columns are null.</p> 
+  <p>The same issue comes up even if only one column is null for some (or all) records. A scan over Phoenix will include the empty column to ensure that rows that only consist of the primary key (and have null for all non-key columns) will be included in a scan result.</p> 
+ </div> 
 </div>
 			</div>
 		</div>

Modified: phoenix/site/source/src/site/markdown/faq.md
URL: http://svn.apache.org/viewvc/phoenix/site/source/src/site/markdown/faq.md?rev=1734609&r1=1734608&r2=1734609&view=diff
==============================================================================
--- phoenix/site/source/src/site/markdown/faq.md (original)
+++ phoenix/site/source/src/site/markdown/faq.md Fri Mar 11 18:17:52 2016
@@ -12,7 +12,7 @@
 * [Can phoenix work on tables with arbitrary timestamp as flexible as HBase API?](#Can_phoenix_work_on_tables_with_arbitrary_timestamp_as_flexible_as_HBase_API)
 * [Why isn't my query doing a RANGE SCAN?](#Why_isnt_my_query_doing_a_RANGE_SCAN)
 * [Should I pool Phoenix JDBC Connections?](#Should_I_pool_Phoenix_JDBC_Connections)
-
+* [Why does Phoenix add an empty or dummy KeyValue when doing an upsert?](#Why_empty_key_value)
 
 ### I want to get started. Is there a Phoenix _Hello World_?
 
@@ -285,3 +285,26 @@ No, it is not necessary to pool Phoenix
 Phoenix's Connection objects are different from most other JDBC Connections due to the underlying HBase connection. The Phoenix Connection object is designed to be a thin object that is inexpensive to create. If Phoenix Connections are reused, it is possible that the underlying HBase connection is not always left in a healthy state by the previous user. It is better to create new Phoenix Connections to ensure that you avoid any potential issues.
 
 Implementing pooling for Phoenix could be done simply by creating a delegate Connection that instantiates a new Phoenix connection when retrieved from the pool and then closes the connection when returning it to the pool (see [PHOENIX-2388](https://issues.apache.org/jira/browse/PHOENIX-2388)).
+
+
+### <a id="Why_empty_key_value"/>Why does Phoenix add an empty/dummy KeyValue when doing an upsert?
+The empty or dummy KeyValue (with a column qualifier of _0) is needed to ensure that a given column is available
+for all rows.
+
+As you may know, data is stored in HBase as KeyValues, meaning that
+the full row key is stored for each column value. This also implies
+that the row key is not stored at all unless there is at least one
+column stored.
+
+Now consider JDBC row which has an integer primary key, and several
+columns which are all null. In order to be able to store the primary
+key, a KeyValue needs to be stored to show that the row is present at
+all. This column is represented by the empty column that you've
+noticed. This allows doing a "SELECT * FROM TABLE" and receiving
+records for all rows, even those whose non-pk columns are null.
+
+The same issue comes up even if only one column is null for some (or
+all) records. A scan over Phoenix will include the empty column to
+ensure that rows that only consist of the primary key (and have null
+for all non-key columns) will be included in a scan result.
+