You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Hitesh Shah (JIRA)" <ji...@apache.org> on 2015/05/05 07:10:07 UTC
[jira] [Comment Edited] (TEZ-2076) Tez framework to extract/analyze
data stored in ATS for specific dag
[ https://issues.apache.org/jira/browse/TEZ-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527925#comment-14527925 ]
Hitesh Shah edited comment on TEZ-2076 at 5/5/15 5:09 AM:
----------------------------------------------------------
Comments:
{code}
usage: java -cp $HADOOP_CONF_DIR/:./target/tez-history-parser-x.y.z-SNAPSHOT-jar-with-dependencies.jar
{code}
- please remove references to target as well as SNAPSHOT.
- are we saying that a user is no longer using the "hadoop jar" command or cannot/should not use it? Isn't that preferable to "java -cp"?
Code is not using slf4j. Also, where does log4j.properties come from at runtime?
Secure cluster support? If not supported, throw an error.
{code}
Preconditions.checkArgument(dagId != null
{code}
- check both not null and not empty.
- also check dag id is a valid dag id
{code}
maybeCreateDirectory(baseDownloadDir);
maybeCreateDirectory(downloadDir);
{code}
- please add more logging to clarify in which dir the output is being generated to. This can be confusing if the user provides a relative path so logging the absolute path will help.
{code}
this.batchSize = conf.getInt(BATCH_SIZE, BATCH_SIZE_DEFAULT);
{code}
- hidden config?
- also tool is called import but the config is export*.
tez-history-parser/pom.xml needs a mvn skip deploy to ensure that the jar does not get deployed to the mvn repo.
{code}
atsAddress = (atsAddress == null) ? setupATSAddress() : atsAddress;
this.baseUri = Joiner.on("").join(atsAddress, ATSConstants.RESOURCE_URI_BASE);
{code}
- does this make any assumptions on preceding "http://" or trailing slash etc?
"private int download()" - any reason to return an int return value instead of a void? The callee can just do a return 0 instead if no exception is thrown. Also, is there a reason to catch all exceptions and wrap them in a tezexception? Maybe not catch anything at all unless some action is needed.
"UTF-8" - final static field?
" LOG.info("Limit=" + limit + ", downloaded " + "entities len=" + entities.length());" - this seems like a debug log.
{code}
if (response.getClientResponseStatus() != ClientResponse.Status.OK) {
throw new TezException("Failed to get response from YARN Timeline: url: " + url);
}
{code}
- might need to check whether a 404 is a valid response if no data exists or if we need a better error message for dag not found if the dag entity returns a 404.
{code}
} catch (TezException e) {
387 LOG.error("Error in processing ", e);
388 throw e;
389 } catch (YarnException e) {
390 LOG.error("Error in processing ", e);
391 throw e;
392 } catch (ParseException e) {
393 LOG.error("Error in parsing options ", e);
394 printHelp(options);
395 } catch (IOException e) {
396 LOG.error("Error in processing ", e);
397 throw e;
398 } catch (Exception e) {
399 LOG.error("Error in processing ", e);
400 throw e;
{code}
- collapse this to a single Exception catch? Not sure why each separate catch clause is needed.
tez-plugins/tez-history-parser/src/main/java/org/apache/tez/history/parser/datamodel/Constants.java
- any reason for the duplication?
More comments later.
was (Author: hitesh):
Comments:
{code]
usage: java -cp $HADOOP_CONF_DIR/:./target/tez-history-parser-x.y.z-SNAPSHOT-jar-with-dependencies.jar
{code}
- please remove references to target as well as SNAPSHOT.
- are we saying that a user is no longer using the "hadoop jar" command or cannot/should not use it? Isn't that preferable to "java -cp"?
Code is not using slf4j. Also, where does log4j.properties come from at runtime?
Secure cluster support? If not supported, throw an error.
{code}
Preconditions.checkArgument(dagId != null
{code}
- check both not null and not empty.
- also check dag id is a valid dag id
{code}
maybeCreateDirectory(baseDownloadDir);
maybeCreateDirectory(downloadDir);
{code}
- please add more logging to clarify in which dir the output is being generated to. This can be confusing if the user provides a relative path so logging the absolute path will help.
{code}
this.batchSize = conf.getInt(BATCH_SIZE, BATCH_SIZE_DEFAULT);
{code}
- hidden config?
- also tool is called import but the config is export*.
tez-history-parser/pom.xml needs a mvn skip deploy to ensure that the jar does not get deployed to the mvn repo.
{code}
atsAddress = (atsAddress == null) ? setupATSAddress() : atsAddress;
this.baseUri = Joiner.on("").join(atsAddress, ATSConstants.RESOURCE_URI_BASE);
{code}
- does this make any assumptions on preceding "http://" or trailing slash etc?
"private int download()" - any reason to return an int return value instead of a void? The callee can just do a return 0 instead if no exception is thrown. Also, is there a reason to catch all exceptions and wrap them in a tezexception? Maybe not catch anything at all unless some action is needed.
"UTF-8" - final static field?
" LOG.info("Limit=" + limit + ", downloaded " + "entities len=" + entities.length());" - this seems like a debug log.
{code}
if (response.getClientResponseStatus() != ClientResponse.Status.OK) {
throw new TezException("Failed to get response from YARN Timeline: url: " + url);
}
{code}
- might need to check whether a 404 is a valid response if no data exists or if we need a better error message for dag not found if the dag entity returns a 404.
{code}
} catch (TezException e) {
387 LOG.error("Error in processing ", e);
388 throw e;
389 } catch (YarnException e) {
390 LOG.error("Error in processing ", e);
391 throw e;
392 } catch (ParseException e) {
393 LOG.error("Error in parsing options ", e);
394 printHelp(options);
395 } catch (IOException e) {
396 LOG.error("Error in processing ", e);
397 throw e;
398 } catch (Exception e) {
399 LOG.error("Error in processing ", e);
400 throw e;
{code}
- collapse this to a single Exception catch? Not sure why each separate catch clause is needed.
tez-plugins/tez-history-parser/src/main/java/org/apache/tez/history/parser/datamodel/Constants.java
- any reason for the duplication?
More comments later.
> Tez framework to extract/analyze data stored in ATS for specific dag
> --------------------------------------------------------------------
>
> Key: TEZ-2076
> URL: https://issues.apache.org/jira/browse/TEZ-2076
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-2076.1.patch, TEZ-2076.10.patch, TEZ-2076.2.patch, TEZ-2076.3.patch, TEZ-2076.4.patch, TEZ-2076.5.patch, TEZ-2076.6.patch, TEZ-2076.7.patch, TEZ-2076.8.patch, TEZ-2076.9.patch, TEZ-2076.WIP.2.patch, TEZ-2076.WIP.3.patch, TEZ-2076.WIP.patch
>
>
> - Users should be able to download ATS data pertaining to a DAG from Tez-UI (more like a zip file containing DAG/Vertex/Task/TaskAttempt info).
> - This can be plugged to an analyzer which parses the data, adds semantics and provides an in-memory representation for further analysis.
> - This will enable to write different analyzer rules, which can be run on top of this in-memory representation to come up with analysis on the DAG.
> - Results of this analyzer rules can be rendered on to UI (standalone webapp) later point in time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)