You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "James Taylor (JIRA)" <ji...@apache.org> on 2016/01/15 00:48:39 UTC
[jira] [Commented] (PHOENIX-2599) PhoenixRecordReader does not handle StaleRegionBoundaryCacheException

    [ https://issues.apache.org/jira/browse/PHOENIX-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099150#comment-15099150 ] 

James Taylor commented on PHOENIX-2599:
---------------------------------------

PhoenixRecordReader (which is used for MR, Spark, and Pig integration) does not deal with a StaleRegionBoundaryCacheException. This exception occurs if a split occurs while the job is running. In the standard JDBC code path, when a query is executed in Phoenix, we detect this exception and retry the batches that have failed. The reason this is necessary is for cases in which a merge sort is being done among the parallel scans we run. For example, if an ORDER BY is performed, the rows we get back from each scan are required to be sorted so that the merge sort correctly produces sorted rows. If a scan spans a region boundary (which is what this exception means), then the resulting rows we get back from the scan are from two separate region scanners and will not be completely ordered.

Since our MR, Spark, and Pig integration does not support any constructs that require merge sorting, the simplest fix would be to simply not do this check on the region server. This would require the following changes:
- In PhoenixRecordReader.initialize(), within the loop over the Scan objects, set a new attribute on the Scan (add a new constant to BaseScannerRegionObserver such as SKIP_REGION_BOUNDARY_CHECK).
- In BaseScannerRegionObserver.preScannerOpen(), conditionally don't call throwIfScanOutOfRegion() if this new attribute is set

I'm assuming that a new MR integration, I'm pretty sure the above approach is fine. For our Spark integration, I'm not positive. What's the lifecycle of the HConnection that's established to HBase with our integration, [~jmahonin]? Is it held open with multiple query invocations coming through it? If that's the case, then we may want to take a different solution, the reason being that this exception means that the HConnection on the client side (which is caching the region boundaries), is out of sync with the actual region boundaries. In this case, we'd want to clear this cache when we get this exception and then re-run the same scan again. Or an alternate would be to bypass the cache when we lookup the region boundaries (which requires another RPC).

Would you have cycles to provide a fix, [~jmahonin]?

> PhoenixRecordReader does not handle StaleRegionBoundaryCacheException
> ---------------------------------------------------------------------
>
>                 Key: PHOENIX-2599
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2599
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.5.1
>         Environment: HBase 1.0 + Linux
>            Reporter: Li Gao
>
> When running Spark 1.4.1 and Phoenix 4.5.1 via Phoenix-Spark connector. We notice sometimes (30~50%) time the following error would appear and kill the running spark job:
> 16/01/14 19:40:16 ERROR yarn.ApplicationMaster: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 110.0 failed 4 times, most recent failure: Lost task 5.3 in stage 110.0 (TID 35526, datanode-123.somewhere): java.lang.RuntimeException: org.apache.phoenix.schema.StaleRegionBoundaryCacheException: ERROR 1108 (XCL08): Cache of region boundaries are out of date.
> at com.google.common.base.Throwables.propagate(Throwables.java:156)
> at org.apache.phoenix.mapreduce.PhoenixRecordReader.initialize(PhoenixRecordReader.java:126)
> at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66)
> at org.apache.phoenix.spark.PhoenixRDD.compute(PhoenixRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
> at org.apache.spark.scheduler.Task.run(Task.scala:70)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.phoenix.schema.StaleRegionBoundaryCacheException: ERROR 1108 (XCL08): Cache of region boundaries are out of date.
> at org.apache.phoenix.exception.SQLExceptionCode$13.newException(SQLExceptionCode.java:304)
> at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
> at org.apache.phoenix.util.ServerUtil.parseRemoteException(ServerUtil.java:131)
> at org.apache.phoenix.util.ServerUtil.parseServerExceptionOrNull(ServerUtil.java:115)
> at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:104)
> at org.apache.phoenix.iterate.TableResultIterator.getDelegate(TableResultIterator.java:70)
> at org.apache.phoenix.iterate.TableResultIterator.<init>(TableResultIterator.java:88)
> at org.apache.phoenix.iterate.TableResultIterator.<init>(TableResultIterator.java:79)
> at org.apache.phoenix.mapreduce.PhoenixRecordReader.initialize(PhoenixRecordReader.java:111)
> ... 18 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)