You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Peter Vary (JIRA)" <ji...@apache.org> on 2016/09/01 22:14:20 UTC

[jira] [Comment Edited] (HIVE-14675) Ensure directories are cleaned up on test cleanup in QTestUtil

    [ https://issues.apache.org/jira/browse/HIVE-14675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15456776#comment-15456776 ] 

Peter Vary edited comment on HIVE-14675 at 9/1/16 10:13 PM:
------------------------------------------------------------

Hi,

Yeah, this helped a lot. That one is in the main pom.xml here - these are run before the tests:
{noformat}
    <plugins>
      <!-- plugins are always listed in sorted order by groupId, artifectId -->
[..]
          <execution>
            <id>setup-test-dirs</id>
            <phase>process-test-resources</phase>
            <goals>
              <goal>run</goal>
            </goals>
            <configuration>
              <target>
                <delete dir="${test.tmp.dir}" />
                <delete dir="${test.warehouse.dir}" />
                <mkdir dir="${test.tmp.dir}" />
                <mkdir dir="${test.warehouse.dir}" />
                <mkdir dir="${test.tmp.dir}/conf" />
                <!-- copies hive-site.xml so it can be modified -->
                <copy todir="${test.tmp.dir}/conf/">
                  <fileset dir="${basedir}/${hive.path.to.root}/data/conf/"/>
                </copy>
              </target>
            </configuration>
          </execution>
        </executions>
      </plugin>
[..]
    </plugins>
{noformat}

As for the afterTestClass (it is typically just clearPostTestEffects, which only shuts down the zookeeper), so no deletion there.

I have found the following hive-site.xml-s:
- ./beeline/src/test/resources/hive-site.xml
- ./conf/hive-site.xml
- ./data/conf/hive-site.xml
- ./data/conf/llap/hive-site.xml
- ./data/conf/perf-reg/hive-site.xml
- ./data/conf/spark/standalone/hive-site.xml
- ./data/conf/spark/yarn-client/hive-site.xml
- ./data/conf/tez/hive-site.xml
- ./hcatalog/src/test/e2e/templeton/deployers/config/hive/hive-site.xml

The values I have found:
{noformat}
- ${test.tmp.dir}/metastore_db
- ${test.tmp.dir}/warehouse
- ${test.tmp.dir}/hadoop-tmp
- ${test.tmp.dir}/scratchdir
- ${test.tmp.dir}/localscratchdir/
- ${test.tmp.dir}/junit_metastore_db
- ${test.warehouse.dir}
- file://${test.tmp.dir}/metadb/
- ${test.tmp.dir}/log/
- ${test.tmp.dir}/tmp
- ${spark.home}/logs/ - In files: ./data/conf/spark/yarn-client/hive-site.xml, ./data/conf/spark/standalone/hive-site.xml
- /tmp/webhcat_e2e/logs/webhcat_e2e_metastore_db; - I file: ./hcatalog/src/test/e2e/templeton/deployers/config/hive/hive-site.xml
{noformat}

I have checked the other xml-s in the data/conf directory, but did not find any other directories there.

spark.home is defined in itests/hive-unit/pom.xml and qtest-spark/pom.xml:
{noformat}
<spark.home>${basedir}/${hive.path.to.root}/itests/qtest-spark/target/spark</spark.home>
{noformat}

To sum this up, it seems, that removing the warehouse, and the tmp directory after the tests is enough to remove files/directories created based on the test configurations. We have 2 cases where the directories are defined in the configuration and not removed:
{noformat}
- ${spark.home}/logs/ - this might not be a problem at all
- The metastore db for hcatalog tests might not be removed, which might cause some problem (but this is not QTestUtil :) )
{noformat}

During my code walkthroughs I found a java code creating a tmp dir for zookeeper in test.tmp.dir, but some other tests might use different directory patterns. Finding these is a harder nut to crack.

Is this the analysis you were looking for?

Thanks,
Peter

~Edit: Fighting with the formatter :)~


was (Author: pvary):
Hi,

Yeah, this helped a lot. That one is in the main pom.xml here - these are run before the tests:
{noformat}
    <plugins>
      <!-- plugins are always listed in sorted order by groupId, artifectId -->
[..]
          <execution>
            <id>setup-test-dirs</id>
            <phase>process-test-resources</phase>
            <goals>
              <goal>run</goal>
            </goals>
            <configuration>
              <target>
                <delete dir="${test.tmp.dir}" />
                <delete dir="${test.warehouse.dir}" />
                <mkdir dir="${test.tmp.dir}" />
                <mkdir dir="${test.warehouse.dir}" />
                <mkdir dir="${test.tmp.dir}/conf" />
                <!-- copies hive-site.xml so it can be modified -->
                <copy todir="${test.tmp.dir}/conf/">
                  <fileset dir="${basedir}/${hive.path.to.root}/data/conf/"/>
                </copy>
              </target>
            </configuration>
          </execution>
        </executions>
      </plugin>
[..]
    </plugins>
{noformat}

As for the afterTestClass (it is typically just clearPostTestEffects, which only shuts down the zookeeper), so no deletion there.

I have found the following hive-site.xml-s:
- ./beeline/src/test/resources/hive-site.xml
- ./conf/hive-site.xml
- ./data/conf/hive-site.xml
- ./data/conf/llap/hive-site.xml
- ./data/conf/perf-reg/hive-site.xml
- ./data/conf/spark/standalone/hive-site.xml
- ./data/conf/spark/yarn-client/hive-site.xml
- ./data/conf/tez/hive-site.xml
- ./hcatalog/src/test/e2e/templeton/deployers/config/hive/hive-site.xml

The values I have found:
- ${test.tmp.dir}/metastore_db
- ${test.tmp.dir}/warehouse
- ${test.tmp.dir}/hadoop-tmp
- ${test.tmp.dir}/scratchdir
- ${test.tmp.dir}/localscratchdir/
- ${test.tmp.dir}/junit_metastore_db
- ${test.warehouse.dir}
- file://${test.tmp.dir}/metadb/
- ${test.tmp.dir}/log/
- ${test.tmp.dir}/tmp
- ${spark.home}/logs/ - In files: ./data/conf/spark/yarn-client/hive-site.xml, ./data/conf/spark/standalone/hive-site.xml
- /tmp/webhcat_e2e/logs/webhcat_e2e_metastore_db; - I file: ./hcatalog/src/test/e2e/templeton/deployers/config/hive/hive-site.xml

I have checked the other xml-s in the data/conf directory, but did not find any other directories there.

spark.home is defined in itests/hive-unit/pom.xml and qtest-spark/pom.xml:
{noformat}
<spark.home>${basedir}/${hive.path.to.root}/itests/qtest-spark/target/spark</spark.home>
{noformat}

To sum this up, it seems, that removing the warehouse, and the tmp directory after the tests is enough to remove files/directories created based on the test configurations. We have 2 cases where the directories are defined in the configuration and not removed:
- ${spark.home}/logs/ - this might not be a problem at all
- The metastore db for hcatalog tests might not be removed, which might cause some problem (but this is not QTestUtil :) )

During my code walkthroughs I found a java code creating a tmp dir for zookeeper in test.tmp.dir, but some other tests might use different directory patterns. Finding these is a harder nut to crack.

Is this the analysis you were looking for?

Thanks,
Peter

> Ensure directories are cleaned up on test cleanup in QTestUtil
> --------------------------------------------------------------
>
>                 Key: HIVE-14675
>                 URL: https://issues.apache.org/jira/browse/HIVE-14675
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>
> Need to verify whether they are cleaned up or not. There's 4-5 different directories involved. If I'm not mistaken, they get cleaned up before each test invocation via mvn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)