You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Sandish Kumar HN (Code Review)" <ge...@cloudera.org> on 2017/07/14 11:25:48 UTC

[kudu-CR] kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import parquet files.

Sandish Kumar HN has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7421

Change subject: kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import parquet files.
......................................................................

kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import parquet files.

Change-Id: If462af948651f3869b444e82151c3559fde19142
---
M java/kudu-client-tools/pom.xml
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java
M java/kudu-spark-tools/pom.xml
A java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala
M java/pom.xml
9 files changed, 669 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/7421/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>

[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Posted by "Sandish Kumar HN (Code Review)" <ge...@cloudera.org>.
Sandish Kumar HN has posted comments on this change.

Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro)
......................................................................


Patch Set 6:

(24 comments)

http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java:

Line 69:     StringBuilder buf = new StringBuilder();
> I'd still advocate removing this javadoc section, it adds nothing and it's 
Done


http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java:

Line 56: 
> I just noticed, you shouldn't use this Log that comes from Parquet. It's be
Done


Line 97:     // Pre-flight checks of input parquet schema and table schema.
> nit: missing space after //m
Done


Line 99:       if (schema.containsField(columnSchema.getName())) {
> Why do you not also check the type?
Done


Line 101:             .equals(getTypeName(columnSchema.getType()))) {
> Having System.exits in the code isn't good, ideally this case would be test
Done


Line 101:             .equals(getTypeName(columnSchema.getType()))) {
> I'd suggestion throwing an exception and catching it in run() below.
Done


Line 101:             .equals(getTypeName(columnSchema.getType()))) {
> what should I use instead of system.exit(0). just LOG.error("") enough??
Done


Line 104:         }
> Well Kudu doesn't support Parquet's TIMESTAMP, it's not about a recommendat
Done


http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java
File java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java:

Line 17: 
> nit: missing a blank line.
Done


Line 68: 
> nit: end comments with a period.
Done


http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java
File java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java:

Line 17: 
> nit: missing blank line.
Done


Line 50: import org.apache.kudu.mapreduce.HadoopTestingUtility;
> I'd suggest having a separate test that specifically verifies the pre-fligh
Done


Line 50: import org.apache.kudu.mapreduce.HadoopTestingUtility;
> separate test-case for pre-flight checks? and what should be the expected r
Done


Line 50: import org.apache.kudu.mapreduce.HadoopTestingUtility;
> The goal is to test all the major code paths, right now your patch isn't te
Done


Line 107:     }
> nit: long line
Done


Line 111: 
> nit: long line
Done


Line 115: 
> openTable isn't a cheap call, do it only once.
Done


Line 116:     KuduTable openTable = openTable(TABLE_NAME);
> Use scanTableToStrings and verify all the rows instead. Better for type con
Done


Line 130:         + "required boolean column4_b; "
> nit: long line
Done


Line 133:     SimpleGroupFactory f = new SimpleGroupFactory(schema);
> Those lines are all too long, also you could probably refactor this?
Done


http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java
File java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java:

PS3, Line 204: 
> What strings?
Done


http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala
File java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala:

Line 119:             kc.upsertRows(df, args.tableName)
> Forgot to remove?
Done


http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala
File java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala:

Line 17: 
> nit: add a blank line.
Done


Line 66:       "--inferschema=true"), sc)
> ?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Jean-Daniel Cryans has posted comments on this change.

Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro)
......................................................................


Patch Set 7:

(6 comments)

Took a closer look at the scala files, we're almost there :)

http://gerrit.cloudera.org:8080/#/c/7421/7/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala
File java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala:

PS7, Line 41: kudu
nit: Kudu


PS7, Line 41: form
nit: from


PS7, Line 42: currently we support
replace with "the following formats are supported"


PS7, Line 107: doesn't not
huh? :)


PS7, Line 154: Csv
nit: CSV


Line 157:     LOG.info(s"Spark version detected as" + sc.version)
I'm not really familiar with Spark, is this really useful?


-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Posted by "Sandish Kumar HN (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7421

to look at the new patch set (#7).

Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro)
......................................................................

kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Change-Id: If462af948651f3869b444e82151c3559fde19142
---
M java/kudu-client-tools/pom.xml
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java
A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java
A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java
A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquetPreCheck.java
M java/kudu-spark-tools/pom.xml
A java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala
A java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala
M java/pom.xml
13 files changed, 1,202 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/7421/7
-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>

[kudu-CR] kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import parquet files.

Posted by "Sandish Kumar HN (Code Review)" <ge...@cloudera.org>.
Sandish Kumar HN has posted comments on this change.

Change subject: kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import parquet files.
......................................................................


Patch Set 2:

(3 comments)

Thanks for reviewing. Few Q7A's

http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java:

Line 114:         default:
> Missing UNIXTIME_MICROS?
is UNIXTIME_MICROS is LONG type??


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java:

Line 88:     FileInputFormat.setInputPaths(job, inputDir);
> You could run some pre-flight checks like making sure that the columns matc
you want me to read column names from Kudu table and check with input parquet schema?


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java:

Line 101:           case DOUBLE:
> UNIXTIME_MICROS would be recognized but not supported, someone might have T
If My understanding is correct. we should identify TIMESTAMP in the input parquet file and warn the users saying that it's not supported? please correct me if I'm wrong. 

should we do this in Map phase or Driver phase??


-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import parquet files.

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Jean-Daniel Cryans has posted comments on this change.

Change subject: kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import parquet files.
......................................................................


Patch Set 2:

(29 comments)

http://gerrit.cloudera.org:8080/#/c/7421/2//COMMIT_MSG
Commit Message:

Line 7: kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import parquet files.
Please follow best practices for git commit messages, such as https://chris.beams.io/posts/git-commit/


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java:

PS2, Line 35: a CSV files.
nit: is there one or many files?


Line 39: public class ExportCsv extends Configured implements Tool {
This needs a test.


PS2, Line 50: The current configuration.
I know the ImportCsv* classes is like this, but the javadoc format we use dictates starting @param line with a lower case and and not ending with a period. See http://www.oracle.com/technetwork/java/javase/documentation/index-137868.html#format

No need to fix ImportCsv in this patch but you'd be welcome to contribute a separate patch for it :)


PS2, Line 85: with columns
nit: rephrase to something like "the given table and columns into..."


PS2, Line 86: "T
nit: mind putting this on a new line? Then the code looks like what the error would look like.


PS2, Line 86: kudu
nit: Kudu


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java:

Line 32: 
nit: remove the extra blank line


PS2, Line 48: (i.e., the parsing).
That made more sense in ImportCsbMapper, I'd just remove this javadoc completely.


Line 54:     this.separator = conf.get(ExportCsv.SEPARATOR_CONF_KEY);
single line:

this.separator = conf.get(ExportCsv.SEPARATOR_CONF_KEY, ExportCsv.DEFAULT_SEPARATOR);


PS2, Line 61: Insert
nits:
s/Convert/Converts
s/Insert/RowResult
also end the sentence with a period.


Line 74:    * converts RowResult $this.separator string
Please revise this whole javadoc section. FWIW it might not be necessary, since it's a private method. Also it's not doing anything surprising.


Line 114:         default:
Missing UNIXTIME_MICROS?


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java:

PS2, Line 43: PARQUET
nit: Apache Parquet


Line 47: public class ImportParquet extends Configured implements Tool {
This needs a test.


Line 59:    * @param conf The current configuration.
Same comment as before regarding javadoc.


Line 75:     LOG.info(schema);
Did you forget to remove this?


Line 88:     FileInputFormat.setInputPaths(job, inputDir);
You could run some pre-flight checks like making sure that the columns match.


PS2, Line 105: PARQUET
Apache Parquet


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java:

PS2, Line 39: PARQUET lines and turns them into Kudu Inserts.
some comments as before regarding PARQUET and Inserts


Line 101:           case DOUBLE:
UNIXTIME_MICROS would be recognized but not supported, someone might have TIMESTAMP and think it's compatible, maybe you can catch that before like in createSubmittableJob?


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java:

Line 27:  * Created by sany on 26/06/2017 AD.
Remove.


PS2, Line 29:  
nit


Line 37:     Set<String> counts = context.getKeyValueMetadata().get("my.count");
What's this about?


Line 38:     // assertTrue("counts: " + counts, counts.size() > 0);
Remove.


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala
File java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala:

Line 1: 
remove this blank line


Line 18: package org.apache.kudu.spark.tools
add a blank line


Line 24: import org.slf4j.{Logger, LoggerFactory}
please re-order this


Line 29: object ImportExportKudu {
I'm less familiar with scala but at least we'll need a test.


-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Posted by "Sandish Kumar HN (Code Review)" <ge...@cloudera.org>.
Sandish Kumar HN has posted comments on this change.

Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro)
......................................................................


Patch Set 3:

(31 comments)

http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java:

PS2, Line 35: a CSV file.
> nit: is there one or many files?
Done


Line 39: public class ExportCsv extends Configured implements Tool {
> This needs a test.
Done


PS2, Line 50: the current configuration
> I know the ImportCsv* classes is like this, but the javadoc format we use d
Done


PS2, Line 85: lumns into t
> nit: rephrase to something like "the given table and columns into..."
Done


PS2, Line 86: ble 
> nit: Kudu
Done


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java:

Line 32: /**
> nit: remove the extra blank line
Done


PS2, Line 48: 
> That made more sense in ImportCsbMapper, I'd just remove this javadoc compl
Done


Line 54: 
> single line:
Done


PS2, Line 61: 
> nits:
Done


Line 74:     StringBuilder buf = new StringBuilder();
> Please revise this whole javadoc section. FWIW it might not be necessary, s
Done


Line 114:           break;
> Missing UNIXTIME_MICROS?
Done


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java:

PS2, Line 43: nt.Kudu
> nit: Apache Parquet
Done


Line 47: 
> This needs a test.
Done


Line 59:   static final String JOB_NAME_CONF_KEY = "importparquet.job.name";
> Same comment as before regarding javadoc.
Done


Line 75:     Path inputDir = new Path(args[1]);
> Did you forget to remove this?
Done


Line 88:     job.setNumReduceTasks(0);
> You could run some pre-flight checks like making sure that the columns matc
Done


Line 88:     job.setNumReduceTasks(0);
> you want me to read column names from Kudu table and check with input parqu
Done


Line 88:     job.setNumReduceTasks(0);
> Yes, this way the job won't start if some things just don't match.
Done


PS2, Line 105: Columns
> Apache Parquet
Done


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java:

PS2, Line 39: Apache Parquet lines and turns them into Kudu I
> some comments as before regarding PARQUET and Inserts
Done


Line 101:           case DOUBLE:
> I'd recommend in the Driver as part of checking the schema.
Done


Line 101:           case DOUBLE:
> If My understanding is correct. we should identify TIMESTAMP in the input p
Done


Line 101:           case DOUBLE:
> UNIXTIME_MICROS would be recognized but not supported, someone might have T
Done


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java:

Line 27: 
> Remove.
Done


PS2, Line 29: e
> nit
Done


Line 37
> What's this about?
Done


Line 38
> Remove.
Done


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala
File java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala:

Line 1: // Licensed to the Apache Software Foundation (ASF) under one
> remove this blank line
Done


Line 18: package org.apache.kudu.spark.tools
> add a blank line
Done


Line 24: import org.slf4j.{Logger, LoggerFactory}
> please re-order this
Done


Line 29: object ImportExportKudu {
> I'm less familiar with scala but at least we'll need a test.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import parquet files.

Posted by "Sandish Kumar HN (Code Review)" <ge...@cloudera.org>.
Sandish Kumar HN has posted comments on this change.

Change subject: kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import parquet files.
......................................................................


Patch Set 2:

Hi Jean-Daniel Cryans,
the patch seems to be correct. but there is a failure from this " DEBUG completed with result FAILURE" did not understand much.

and I don't see any issues from patches from kudu-spark-util and kudu-client-util.

-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Jean-Daniel Cryans has posted comments on this change.

Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro)
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7421/5/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java:

Line 99:       if (!schema.containsField(columnSchema.getName())) {
> Kudu has primitive-types, parquet has just Type. Example: In Kudu binary ty
The goal is to not hit weird errors in ImportParquetMapper#map when doing data conversions, so you have to find the equivalences.


-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Posted by "Sandish Kumar HN (Code Review)" <ge...@cloudera.org>.
Sandish Kumar HN has posted comments on this change.

Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro)
......................................................................


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7421/6//COMMIT_MSG
Commit Message:

PS6, Line 7: avro
> Did you mean to include the avro part?
Avro is available in spark client tools. people didn't want me to add much to MapReduce client tools.


http://gerrit.cloudera.org:8080/#/c/7421/6/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java:

Line 32: import org.apache.parquet.column.ColumnDescriptor;
> nit: remove
Done


PS6, Line 103: 
> nit: Parquet is always stylized with an upper case P, please fix here and b
Done


PS6, Line 103: 
> nit: exist, same below
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Posted by "Sandish Kumar HN (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7421

to look at the new patch set (#6).

Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro)
......................................................................

kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Change-Id: If462af948651f3869b444e82151c3559fde19142
---
M java/kudu-client-tools/pom.xml
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java
A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java
A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java
A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquetPreCheck.java
M java/kudu-spark-tools/pom.xml
A java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala
A java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala
M java/pom.xml
13 files changed, 1,203 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/7421/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>

[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Posted by "Sandish Kumar HN (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7421

to look at the new patch set (#3).

Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro)
......................................................................

kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Change-Id: If462af948651f3869b444e82151c3559fde19142
---
M java/kudu-client-tools/pom.xml
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java
A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java
A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java
M java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java
M java/kudu-spark-tools/pom.xml
A java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala
A java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala
M java/pom.xml
13 files changed, 1,041 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/7421/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>

[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Posted by "Sandish Kumar HN (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7421

to look at the new patch set (#8).

Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro)
......................................................................

kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Change-Id: If462af948651f3869b444e82151c3559fde19142
---
M java/kudu-client-tools/pom.xml
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java
A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java
A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java
A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquetPreCheck.java
M java/kudu-spark-tools/pom.xml
A java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala
A java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala
M java/pom.xml
13 files changed, 1,200 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/7421/8
-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>

[kudu-CR] kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import parquet files.

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Jean-Daniel Cryans has posted comments on this change.

Change subject: kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import parquet files.
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java:

Line 114:         default:
> is UNIXTIME_MICROS is LONG type??
It is.


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java:

Line 88:     FileInputFormat.setInputPaths(job, inputDir);
> you want me to read column names from Kudu table and check with input parqu
Yes, this way the job won't start if some things just don't match.


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java:

Line 101:           case DOUBLE:
> If My understanding is correct. we should identify TIMESTAMP in the input p
I'd recommend in the Driver as part of checking the schema.


-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import parquet files.

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Jean-Daniel Cryans has posted comments on this change.

Change subject: kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import parquet files.
......................................................................


Patch Set 1:

Hi Sandish,

Thank you for your contribution. Please look at the Kudu Jenkins output, there's bunch of checkstyle issues with the code. Also, the dependency you added to kudu-spark-tools isn't found in any of the maven repos we currently list.

-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No

[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Jean-Daniel Cryans has posted comments on this change.

Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro)
......................................................................


Patch Set 8: Verified+1

Overriding Jenkins since it's an unrelated failure in the master's stress test.

Thank you for your contribution!

-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Jean-Daniel Cryans has posted comments on this change.

Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro)
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java:

Line 56:   private static final Log LOG = Log.getLog(ImportParquet.class);
I just noticed, you shouldn't use this Log that comes from Parquet. It's best to use System.err.println like ImportCsv does.


Line 101:         System.exit(0);
> what should I use instead of system.exit(0). just LOG.error("") enough??
I'd suggestion throwing an exception and catching it in run() below.


http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java
File java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java:

Line 50: public class ITImportParquet extends BaseKuduTest {
> separate test-case for pre-flight checks? and what should be the expected r
The goal is to test all the major code paths, right now your patch isn't testing error cases.


-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Posted by "Sandish Kumar HN (Code Review)" <ge...@cloudera.org>.
Sandish Kumar HN has posted comments on this change.

Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro)
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7421/5/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java:

Line 99:       if (!schema.containsField(columnSchema.getName())) {
Kudu has primitive-types, parquet has just Type. Example: In Kudu binary type and in parquet its string type. How can I compare ?. please give some idea on this??


Line 100:         LOG.error("The column " + columnSchema.getName() + " does not exists in parquet schema");
what should I use instead of system.exit(0). just LOG.error("") enough??


-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Posted by "Sandish Kumar HN (Code Review)" <ge...@cloudera.org>.
Sandish Kumar HN has posted comments on this change.

Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro)
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java:

Line 101:         System.exit(0);
> Having System.exits in the code isn't good, ideally this case would be test
what should I use instead of system.exit(0). just LOG.error("") enough??


http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java
File java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java:

Line 50: public class ITImportParquet extends BaseKuduTest {
> I'd suggest having a separate test that specifically verifies the pre-fligh
separate test-case for pre-flight checks? and what should be the expected result ??


-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Jean-Daniel Cryans has posted comments on this change.

Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro)
......................................................................


Patch Set 8: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Jean-Daniel Cryans has submitted this change and it was merged.

Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro)
......................................................................


kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Change-Id: If462af948651f3869b444e82151c3559fde19142
Reviewed-on: http://gerrit.cloudera.org:8080/7421
Reviewed-by: Jean-Daniel Cryans <jd...@apache.org>
Tested-by: Jean-Daniel Cryans <jd...@apache.org>
---
M java/kudu-client-tools/pom.xml
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java
A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java
A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java
A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquetPreCheck.java
M java/kudu-spark-tools/pom.xml
A java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala
A java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala
M java/pom.xml
13 files changed, 1,200 insertions(+), 7 deletions(-)

Approvals:
  Jean-Daniel Cryans: Looks good to me, approved; Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>

[kudu-CR] kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import parquet files.

Posted by "Sandish Kumar HN (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7421

to look at the new patch set (#2).

Change subject: kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import parquet files.
......................................................................

kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import parquet files.

Change-Id: If462af948651f3869b444e82151c3559fde19142
---
M java/kudu-client-tools/pom.xml
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java
M java/kudu-spark-tools/pom.xml
A java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala
M java/pom.xml
9 files changed, 691 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/7421/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Posted by "Sandish Kumar HN (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7421

to look at the new patch set (#4).

Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro)
......................................................................

kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Change-Id: If462af948651f3869b444e82151c3559fde19142
---
M java/kudu-client-tools/pom.xml
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java
A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java
A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java
A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquetPreCheck.java
M java/kudu-spark-tools/pom.xml
A java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala
A java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala
M java/pom.xml
13 files changed, 1,167 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/7421/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>

[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Posted by "Sandish Kumar HN (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7421

to look at the new patch set (#5).

Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro)
......................................................................

kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Change-Id: If462af948651f3869b444e82151c3559fde19142
---
M java/kudu-client-tools/pom.xml
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java
A java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java
A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java
A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java
A java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquetPreCheck.java
M java/kudu-spark-tools/pom.xml
A java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala
A java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala
M java/pom.xml
13 files changed, 1,167 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/7421/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>

[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Jean-Daniel Cryans has posted comments on this change.

Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro)
......................................................................


Patch Set 3:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java:

Line 69:    * converts RowResult to string.
I'd still advocate removing this javadoc section, it adds nothing and it's a private method.


http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java:

Line 97:     //pre-flight checks of input parquet schema and table schema
nit: missing space after //m
Also end your sentence with a period.


Line 99:       if (!schema.containsField(sche.getName())) {
Why do you not also check the type?


Line 101:         System.exit(0);
Having System.exits in the code isn't good, ideally this case would be tested and if you exit then how can you catch the error?


Line 104:     //Kudu does not recommend using TIMESTAMP
Well Kudu doesn't support Parquet's TIMESTAMP, it's not about a recommendation. Also same nit as the comment above, and some comment regarding exit.


http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java
File java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITExportCsv.java:

Line 17: package org.apache.kudu.mapreduce.tools;
nit: missing a blank line.


Line 68:     // Create a 2 lines input file
nit: end comments with a period.
Also I'm not following this comment. The next line creates a table with 4 tablets, 3 of which have 3 rows. Where's the 2 lines input file coming from?


http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java
File java/kudu-client-tools/src/test/java/org/apache/kudu/mapreduce/tools/ITImportParquet.java:

Line 17: package org.apache.kudu.mapreduce.tools;
nit: missing blank line.


Line 50: public class ITImportParquet extends BaseKuduTest {
I'd suggest having a separate test that specifically verifies the pre-flight checks that are running.


Line 107:     String[] args = new String[] { "-D" + CommandLineParser.MASTER_ADDRESSES_KEY + "=" + getMasterAddresses(),
nit: long line


Line 111:     Job job = ImportParquet.createSubmittableJob(parser.getConfiguration(), parser.getRemainingArgs());
nit: long line


Line 115:       client.newScannerBuilder(openTable(TABLE_NAME)).build()));
openTable isn't a cheap call, do it only once.


Line 116:     assertEquals(4,getTableRows(openTable(TABLE_NAME)).get(0).getInt("key"));
Use scanTableToStrings and verify all the rows instead. Better for type conversion checking.


Line 130:     ParquetWriter<Group> writer = new ParquetWriter<Group>(data, new GroupWriteSupport(), UNCOMPRESSED, 1024, 1024, 512,
nit: long line


Line 133:       writer.write(f.newGroup().append("key", 1).append("column1_i", 3).append("column2_d", 2.3).append("column3_s",
Those lines are all too long, also you could probably refactor this?


http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java
File java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java:

PS3, Line 204: rowStrings
What strings?


http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala
File java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala:

Line 119:             LOG.info(args.header+":"+args.delimiter+":"+args.path)
Forgot to remove?


http://gerrit.cloudera.org:8080/#/c/7421/3/java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala
File java/kudu-spark-tools/src/test/scala/org/apache/kudu/spark/tools/TestImportExportFiles.scala:

Line 17: package org.apache.kudu.spark.tools
nit: add a blank line.


Line 66:     //val table = kuduClient.openTable(TABLE_NAME)
?


-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Jean-Daniel Cryans has posted comments on this change.

Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro)
......................................................................


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7421/6//COMMIT_MSG
Commit Message:

PS6, Line 7: avro
Did you mean to include the avro part?


http://gerrit.cloudera.org:8080/#/c/7421/6/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java
File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java:

Line 32: import org.apache.parquet.Log;
nit: remove


PS6, Line 103: exists
nit: exist, same below


PS6, Line 103: parquet
nit: Parquet is always stylized with an upper case P, please fix here and below.


-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)

Posted by "Sandish Kumar HN (Code Review)" <ge...@cloudera.org>.
Sandish Kumar HN has posted comments on this change.

Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro)
......................................................................


Patch Set 8:

(5 comments)

nit

http://gerrit.cloudera.org:8080/#/c/7421/7/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala
File java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala:

PS7, Line 41: from
> nit: from
Done


PS7, Line 41: Kudu
> nit: Kudu
Done


PS7, Line 42: the following format
> replace with "the following formats are supported"
Done


PS7, Line 107: doesn't exi
> huh? :)
Done


PS7, Line 154: CSV
> nit: CSV
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sandish Kumar HN <sa...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sandish Kumar HN <sa...@gmail.com>
Gerrit-HasComments: Yes