You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Aitor Perez Cedres <ap...@pragsis.com> on 2014/05/19 17:22:19 UTC
Hive-server load file times increase over time
Hi all,
We have a hive server 0.9.0 running, and we are loading data (around
350MB) into a partitioned table every minute through JDBC client. Our
work-flow is to first check if the partition where we have to load the
data already exists; in that case, we locate the smallest file into that
partition and append the new data to that file by using HDFS API; in the
case the partition doesn't exist, we make a query "load data inpath"
with the new partition through a JDBC client to Hive-server.
Just after starting the hive-server, this process usually takes around 2
seconds to complete the file load; but after some days, it starts
increasing the load time to 6-8 seconds; after a week, it takes around
15 secs; then, around the second week, the process fails with error:
2014-04-23 08:01:46,258 ERROR exec.Task
(SessionState.java:printError(403)) - Failed with exception
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter
partition.
org.apache.hadoop.hive.ql.metadata.HiveException:
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter
partition.
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable
to alter partition.
We started logging our application with root.logger in DEBUG, and we
found the next trace, where it is "hanging" and adding more time for the
process to complete:
2014-05-07 17:44:30,264 DEBUG ClientCnxn:727 - Got ping response
for sessionid: 0x143f18e41e63b55 after 0ms
2014-05-07 17:44:32,439 DEBUG LazySimpleSerDe:195 -
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe initialized with:
columnNames=[] columnTypes=[] separator=[[B@4428573f] nullstring=\N
lastColumnTakesRest=false
From those traces, we see it takes 2 seconds to initialize the
LazySimpleSerDe; and as the time passes, it takes more and more time
between those traces until hive-server throws the error above, and the
load times are then "restarted", then the time between those traces is
less than a second.
Have anyone experienced an issue similar before? Any comment or help is
appreciated.
Thanks in advance,
--
*Aitor PĂ©rez*
/Big Data System Engineer/
Telf.: +34 917 680 490
Fax: +34 913 833 301
C/Manuel Tovar, 49-53 - 28034 Madrid - Spain
_http://www.bidoop.es_