You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Michael Brown (Code Review)" <ge...@cloudera.org> on 2017/02/01 23:28:52 UTC

[Impala-ASF-CR] Add nested testdata flattener

Michael Brown has posted comments on this change.

Change subject: Add nested testdata flattener
......................................................................


Patch Set 1:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/5787/1//COMMIT_MSG
Commit Message:

Line 22: 
How was this tested?


http://gerrit.cloudera.org:8080/#/c/5787/1/testdata/TableFlattener/README
File testdata/TableFlattener/README:

Line 3: 
It would be useful to explain also how nested data types (array, struct, map) get flattened.


Line 11: 
Mention that testdata/bin/generate-load-nested.sh uses this.


PS1, Line 12: There are various options to specify the type of input file but the output is always
            : parquet/snappy.
You could mention that the tool tries to use file extension to infer the input format.

If you don't want to mention the other options here, you could mention how to get help:

  mvn exec:java -Dexec.mainClass=org.apache.impala.infra.tableflattener.Main -Dexec.arguments="--help"


http://gerrit.cloudera.org:8080/#/c/5787/1/testdata/TableFlattener/pom.xml
File testdata/TableFlattener/pom.xml:

Line 1: <?xml version="1.0" encoding="UTF-8"?>
Would Maven permit this whole tree be moved to testdata/src/main/java/org/apache/impala/TableFlattener ? As it is, there seem to be competing Java source trees about curating test data.


http://gerrit.cloudera.org:8080/#/c/5787/1/testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/FlattenedSchema.java
File testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/FlattenedSchema.java:

PS1, Line 93: "value"
Why not item for array?


PS1, Line 96: "idx"
Why not pos?


PS1, Line 103: "_values";
This will lead to "foo__values" (with 2 underscores). Is that intended?


PS1, Line 120: "_"
Shouldn't this use getNameSeparator() ?


http://gerrit.cloudera.org:8080/#/c/5787/1/testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/SchemaFlattener.java
File testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/SchemaFlattener.java:

PS1, Line 51: dataset
It would read a bit clearer to call this dstDataSet.


Line 60:     Preconditions.checkState(srcSchema.getType() == Type.RECORD);
It would be nice to have a comment on this method describing what it does and the parameters.


PS1, Line 65: field.schema()
Why not fieldSchema?


Line 85:     // Ensure that the parent schema has an id field so the child can reference the
Same with this method: add brief description of method and parameter meanings.


PS1, Line 93:     LinkedList<Field> fields = new LinkedList<>();
Is it correct to be instantiating fields this way, or should you be using the method written above:

  LinkedList<Field> fields = Lists.newLinkedList();


-- 
To view, visit http://gerrit.cloudera.org:8080/5787
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7e7a8e53ada9274759a3e2128b97bec292c129c6
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes