You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by wanghaifei <wa...@jd.com> on 2015/03/06 09:42:58 UTC
How hive0.14 concurrent read detla file
dear sir,
problem 1: for files to concurrent read ?
Hive0.14 file is read directly from the HDFS.The following is the record of the log:
15/02/26 16:43:31 [main]: INFO orc.ReaderImpl: Reading ORC rows from hdfs://spark-jrdata-12.pekdc1.jdfin.local:9000/user/hive/warehouse/sku_01/end_dt=20150111/000000_0 with {include: [true, true, true, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false], offset: 0, length: 9223372036854775807}
Here I have a question. To hive0.13, through the MR to read the file. If the data quantity is big, the faster the execution rate. But in hive0.14, It Is how to take concurrent reads the file, so as to improve the query speed. Here I know hive0.14, through the package data structure, to your query need column only get this column instead of the whole line.
I hope you tell me detail implementation class .
problem 2: to run merge the data of detail implementation class .
I hope to answer.
Thank you .