You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Aitor Perez Cedres <ap...@pragsis.com> on 2014/05/19 17:22:19 UTC

Hive-server load file times increase over time

Hi all,

We have a hive server 0.9.0 running, and we are loading data (around 
350MB) into a partitioned table every minute through JDBC client. Our 
work-flow is to first check if the partition where we have to load the 
data already exists; in that case, we locate the smallest file into that 
partition and append the new data to that file by using HDFS API; in the 
case the partition doesn't exist, we make a query "load data inpath" 
with the new partition through a JDBC client to Hive-server.

Just after starting the hive-server, this process usually takes around 2 
seconds to complete the file load; but after some days, it starts 
increasing the load time to 6-8 seconds; after a week, it takes around 
15 secs; then, around the second week, the process fails with error:

    2014-04-23 08:01:46,258 ERROR exec.Task
    (SessionState.java:printError(403)) - Failed with exception
    org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter
    partition.
    org.apache.hadoop.hive.ql.metadata.HiveException:
    org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter
    partition.
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable
    to alter partition.

We started logging our application with root.logger in DEBUG, and we 
found the next trace, where it is "hanging" and adding more time for the 
process to complete:

      2014-05-07 17:44:30,264 DEBUG ClientCnxn:727 - Got ping response
    for sessionid: 0x143f18e41e63b55 after 0ms
      2014-05-07 17:44:32,439 DEBUG LazySimpleSerDe:195 -
    org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe initialized with:
    columnNames=[] columnTypes=[] separator=[[B@4428573f] nullstring=\N
    lastColumnTakesRest=false

 From those traces, we see it takes 2 seconds to initialize the 
LazySimpleSerDe; and as the time passes, it takes more and more time 
between those traces until hive-server throws the error above, and the 
load times are then "restarted", then the time between those traces is 
less than a second.

Have anyone experienced an issue similar before? Any comment or help is 
appreciated.

Thanks in advance,
-- 
*Aitor PĂ©rez*
/Big Data System Engineer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_