You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Aaron Kimball (JIRA)" <ji...@apache.org> on 2009/09/09 01:40:57 UTC
[jira] Commented: (MAPREDUCE-775) Add input/output formatters for Vertica clustered ADBMS.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752827#action_12752827 ] 

Aaron Kimball commented on MAPREDUCE-775:
-----------------------------------------

Omer, 

This looks like a great start. I've read through the patch and believe I understand most of how it works. I don't have any major architectural concerns, but there are a number of style issues that I think should be addressed before this is committed. All of these are outlined below. Comments are listed in the sequential order presented by your patch file.

(As for your question about test failures, build #511 has already been deleted by Hudson, so I can't check that.)

ivy.xml:
Do you actually depend on hsqldb?


Hadoop test classes typically go in the same package as that which they test (e.g., {{o.a.h.vertica}}), not a separate package like {{o.a.h.vertica.tests}}. This would save you a lot of imports in tests, and allows package-public things to be used for testing. (This applies to all your test classes)

AllTests.java
* The method name {{setUp()}} has special meaning in JUnit. Since your {{setup()}} method isn't a {{setUp()}}, can you change this to something less misleadingly-similar?
* In test description string in suite(), o.a.h.vertica, not o.a.h.sqoop.

* re your "TODO: figure out jdbc jar packaging:" Based on Hadoop source tree style, I recommend creating a {{src/contrib/vertica/lib}} directory, check any external jars you need in there, and modify {{src/contrib/vertica/build.xml}} to include the jars from that dir on the classpath for building, testing, etc. Since patches are text not binary, you should attach the jar to the JIRA issue separately. Note that adding a jar requires its license be A2-compatible. See 
[http://www.apache.org/legal/resolved.html#category-a] for a list of licenses which external dependencies may have applied to them.

TestExample.Reduce.setup(): I suggest that {{AllTests.setup();}} go in a static initializer block in {{TestExample}} rather than getting called in every {{Reduce.setup()}} call. Given that you actually require {{AllTests.setup()}} in virtually all your tests, I would suggest creating a {{VerticaTestCase}} class that subclasses {{TestCase}}, have this class call {{AllTests.setup()}} in a static initializer block, and then have all your {{Test\*}} classes subclass {{VerticaTestCase}} instead of {{TestCase}}. This way you won't worry about missing a call somewhere.

Also in this same method, why catch {{Exception e}} and print its stack trace? If {{Reduce.setup()}} fails for an exception, why shouldn't the whole test fail?

TestExample.Reduce.reduce(): Style nit: One-line {{if}} statements should still use curly-braces around the "then" clause. See [http://java.sun.com/docs/codeconv/html/CodeConventions.doc6.html#449]. Hadoop source should follow all Sun Java style conventions except for an indentation width of two spaces.

I don't think {{TestExample}}, etc, should have a {{run()}} method.

TestVertica.testVerticaRecord(): why are values of {{DATE}}, {{TIME}}, etc, commented out? Dead code should be removed, not commented out. Also, why catch the IOException and return? Why doesn't this method just throw IOException (and implicitly fail the test)?

In {{recordTest()}}, don't use "assert values.equals(new_values)", use JUnit: {{assertEquals("failure message", values, new_values);}}

Same with {{testVerticaSplit()}}, {{validateInput()}}, etc...


VerticaStreamingRecordWriter.java: Please use Java "lowerCamelCase" style for field and variable names, not "under_scores" (see {{writer_table}}, {{copy_stmt}}, etc. These should be {{writerTable}} and {{copyStmt}} respectively.)

In the constructor, the RuntimeException description {{"Vertica Formatter requies a the Vertica jdbc driver"}} contains a bunch of typos.

close() method: if statement should use curly-brace style described above.

write() method: Materializing {{record.toString()}} in {{LOG.debug()}} for every call to write is expensive. Consider wrapping this statement in a call to {{LOG.isDebugEnabled()}}.

VerticaConfiguration: comment above definition of {{DELIMITER}} has typos.


{{(input_query.charAt(input_query.length() - 1) == ';'}} ... perhaps {{inputQuery.endsWith(';');}} ?

{{getInputParameters()}} has meaningless javadoc attributes. See also getInputDelmiter() (which is a typo'd method name), setInputDelimiter(), etc... that all have empty {{@return}} attributes.

VerticaInputFormat: {{DateFormat}} is not thread-safe. {{datefmt}} should not be a static member.

This class also has a lot of {{under_score}} field and parameter names.

Can your javadoc comment for {{optimize()}} suggest when it is appropriate to call this, vs. when you would be better off not doing so? What's the heuristic a programmer should keep in mind?

This method also contains a lot of commented-out code. Please remove it entirely.

{{conn.wait(1000);}} should pull out {{1000}} into a static final constant, or even better, make it configurable.

VerticaRecordWriter.getValue() has hairy braces in an if..else statement. (You do this in {{write()}} as well.)

Also, what happens in the case where {{writer_table.split()}} returns a 0-length array? In this same method, can you please add a comment explaining why you're pulling {{rs.getString(4)}} and {{rs.getInt(5)}}? These seem arbitrary as-written.

VerticaInputSplit.executeQuery() has javadoc typos.

VerticaUtil uses tabs instead of spaces, and includes empty lines with leading whitespace. Various block statements and curly braces also require reformatting here, as well as {{variable_names}}.

VerticaRecord constructor has meaningless javadoc attributes.
Also, please obey 80-column limit in this class (as well as elsewhere).

in {{objectTypes()}}, why not use {{else if}} statements instead of just a series of {{if}} statements? You could then drop all the {{continue}} statements which make for awkward flow. Also, include a case at the end for unknown type where you throw an exception, rather than misalign the {{types}} ArrayList from the {{values}} ArrayList.

{{toSQLString()}}: Please do not start variable names with underscore. I suggest {{myDelimiter}} to differentiate it from {{delimiter}}.

Also, are fall-thrus in the case block intentional? If so, please mark this with a comment. 


> Add input/output formatters for Vertica clustered ADBMS.
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-775
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-775
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Omer Trajman
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-775.patch
>
>
> Add native support for Vertica as an input or output format taking advantage of parallel read and write properties of the DBMS.
>  
> On the input side allow for parametrized queries (a la prepared statements) and create a split for each combination of parameters.  Also support the parameter list to be generated from a sql statement.  For example - return metrics for all dimensions that meet criteria X with one input split for each dimension.  Divide the read among any number of hosts in the Vertica cluster.
>  
> On the output side, support Vertica streaming load to any number of hosts in the Vertica cluster.  Output may be to a different cluster than input.
>  
> Also includes Input and Output formatters that support streaming interface.
> Code has been tested and run on live systems under 19 and 20.  Patch for 21 with new API will be ready end of this week.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.