You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Sungwoo Park <gl...@gmail.com> on 2020/07/19 19:53:23 UTC
MR3 1.1 released
We are pleased to announce the release of MR3 1.1. Three main improvements
in MR3 1.1 are:
1. Hive on MR3 on Kubernetes now runs almost as fast as Hive on MR3 on
Hadoop. For experimental results, please see a new blog article "Why you
should run Hive on Kubernetes, even in a Hadoop cluster".
https://www.datamonad.com/post/2020-07-19-why-hive-k8s/
2. Fetch delays rarely occur thanks to: 1) speculative execution and 2)
support for multiple shuffle handlers in a single ContainerWorker. For
experimental results, please see the following page in MR3docs:
https://mr3docs.datamonad.com/docs/mr3/features/fetchdelay/
3. Hive 4 on MR3 now runs stable (after applying HIVE-23114 which fixes a
design bug reported by the MR3 team).
Please visit https://mr3docs.datamonad.com/ for the full documentation on
MR3. The quick start guide for Hive on MR3 on Kubernetes has been updated
for MR3 1.1 (https://mr3docs.datamonad.com/docs/quick/).
For Hive on MR3 on Kubernetes, one can quickly test with a pre-built Docker
image available at DockerHub (https://hub.docker.com/u/mr3project) and the
scripts available at GitHub (https://github.com/mr3project/mr3-run-k8s/).
Thank you!
--- Sungwoo
=== Features and improvements in MR3 1.1
## MR3
Mapping DAGs to Task queues:
https://mr3docs.datamonad.com/docs/mr3/features/dag-scheduling/
Multiple shuffle handlers in a single ContainerWorker:
https://mr3docs.datamonad.com/docs/mr3/features/shufflehandler/
Speculative execution:
https://mr3docs.datamonad.com/docs/mr3/features/speculative/
Eliminating fetch delays:
https://mr3docs.datamonad.com/docs/mr3/features/fetchdelay/
Running shuffle handlers in a separate process on Kubernetes:
https://mr3docs.datamonad.com/docs/mr3/guide/use-shufflehandler/
## Hive on MR3 on Kubernetes
Fast recovery in Hive on MR3 on Kubernetes:
https://mr3docs.datamonad.com/docs/k8s/features/recovery/
Using HDFS instead of PersistentVolumes:
https://mr3docs.datamonad.com/docs/k8s/advanced/use-hdfs/
Using Amazon S3 instead of PersistentVolumes:
https://mr3docs.datamonad.com/docs/k8s/advanced/use-s3/
Configuring kernel parameters in Docker containers:
https://mr3docs.datamonad.com/docs/k8s/advanced/configure-kernel/
Performance tuning for Hive on MR3 on Kubernetes:
https://mr3docs.datamonad.com/docs/k8s/advanced/performance-tuning/
Using S3 instead of EFS on Amazon EKS:
https://mr3docs.datamonad.com/docs/k8s/eks/use-s3/
=== Release notes for MR3 1.1
## MR3
- Support DAG scheduling schemes (specified by `mr3.dag.queue.scheme`).
- Optimize DAGAppMaster by freeing memory for messages to Tasks when
fault tolerance is disabled (with `mr3.am.task.max.failed.attempts` set to
1).
- Fix a minor memory leak in DaemonTask (which also prevents MR3 from
running more than 2^30 DAGs when using the shuffle handler).
- Improve the chance of assigning TaskAttempts to ContainerWorkers that
match location hints.
- TaskScheduler can use location hints produced by `ONE_TO_ONE` edges.
- TaskScheduler can use location hints from HDFS when assigning
TaskAttempts to ContainerWorker Pods on Kubernetes (with `
mr3.convert.container.address.host.name`).
- Introduce `mr3.k8s.pod.cpu.cores.max.multiplier` to specify the
multiplier for the limit of CPU cores.
- Introduce `mr3.k8s.pod.memory.max.multiplier` to specify the multiplier
for the limit of memory.
- Introduce `mr3.k8s.pod.worker.security.context.sysctls` to configure
kernel parameters of ContainerWorker Pods using init containers.
- Support speculative execution of TaskAttempts (with
`mr3.am.task.concurrent.run.threshold.percent`).
- A ContainerWorker can run multiple instances of shuffle handlers each
with a different port. The configuration key
`mr3.use.daemon.shufflehandler` now specifies the number of shuffle handler
instances in each ContainerWorker.
- With speculative execution and multiple instances of shuffle handlers
in a single ContainerWorker, fetch delays rarely occur.
- A ContainerWorker Pod can run shuffle handlers in a separate container
(with `mr3.k8s.shuffle.process.ports`).
- On Kubernetes, DAGAppMaster uses ReplicationController instead of Pod,
thus making recovery much faster.
- On Kubernetes, ConfigMaps `mr3conf-configmap-master` and
`mr3conf-configmap-worker` survive MR3, so the user should delete them
manually.
- Java 8u251/8u252 can be used on Kubernetes 1.17 and later.
## Hive on MR3 on Hadoop
- CrossProductHandler asks MR3 DAGAppMaster to set
TEZ_CARTESIAN_PRODUCT_MAX_PARALLELISM (Cf. HIVE-16690, Hive 3/4).
- Hive 4 on MR3 is stable (currently using 4.0.0-SNAPSHOT).
- No longer support Hive 1.
## Hive on MR3 on Kubernetes
- Ranger uses a local directory (emptyDir volume) for logging.
- The open file limit for Solr (in Ranger) is not limited to 1024.
- HiveServer2 and DAGAppMaster create readiness and liveness probes.