You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Veena Basavaraj <vb...@cloudera.com> on 2014/11/18 00:57:03 UTC

Review Request 28139: Support Time and List Type in CSV IDF

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/
-----------------------------------------------------------

Review request for Sqoop.


Bugs: SQOOP-1749
    https://issues.apache.org/jira/browse/SQOOP-1749


Repository: sqoop-sqoop2


Description
-------

see JIRA


Diffs
-----

  common/src/main/java/org/apache/sqoop/schema/type/AbstractComplexListType.java 5d8f28b 
  common/src/main/java/org/apache/sqoop/schema/type/Array.java af13eb7 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
  connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 

Diff: https://reviews.apache.org/r/28139/diff/


Testing
-------

yes unit tests added. 


Thanks,

Veena Basavaraj


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.

> On Nov. 26, 2014, 1:51 a.m., Abraham Elmahrek wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java, line 397
> > <https://reviews.apache.org/r/28139/diff/7/?file=776412#file776412line397>
> >
> >     What if one of the elements is a string with the quote character '?

there is test case for this      public void testArrayOfComplexStrings() {


- Veena


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review63103
-----------------------------------------------------------


On Nov. 25, 2014, 9:23 p.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 25, 2014, 9:23 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Arrays/ Nested/ Set
> 
> The latest rev does not have time related changed, it is separate ticket
> See SQOOP-1771 for the detail of the JSON format.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.

> On Nov. 26, 2014, 1:51 a.m., Abraham Elmahrek wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java, line 391
> > <https://reviews.apache.org/r/28139/diff/7/?file=776412#file776412line391>
> >
> >     Arrays.deepToString produces non-json output. Is this intended?

yes, There is no reason to deconstuct the nested objects, There is test case that passes for these objects and gives the expected list objects back.

I have pasted the sample output I tried for a nested objects
["["A", "B]","[14, 15]"]


- Veena


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review63103
-----------------------------------------------------------


On Nov. 25, 2014, 9:23 p.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 25, 2014, 9:23 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Arrays/ Nested/ Set
> 
> The latest rev does not have time related changed, it is separate ticket
> See SQOOP-1771 for the detail of the JSON format.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Abraham Elmahrek <ab...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review63103
-----------------------------------------------------------



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/28139/#comment105278>

    How about:
        if (array == null) {
          return null;
        } else {
          return array.toArray();
        }



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/28139/#comment105279>

    Arrays.deepToString produces non-json output. Is this intended?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/28139/#comment105275>

    What if one of the elements is a string with the quote character '?



connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java
<https://reviews.apache.org/r/28139/#comment105276>

    Add 'setTextData' tests as well?


- Abraham Elmahrek


On Nov. 26, 2014, 5:23 a.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 26, 2014, 5:23 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Arrays/ Nested/ Set
> 
> The latest rev does not have time related changed, it is separate ticket
> See SQOOP-1771 for the detail of the JSON format.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/
-----------------------------------------------------------

(Updated Nov. 28, 2014, 8:01 p.m.)


Review request for Sqoop.


Bugs: SQOOP-1749
    https://issues.apache.org/jira/browse/SQOOP-1749


Repository: sqoop-sqoop2


Description
-------

see JIRA

Arrays/ Nested/ Set

The latest rev does not have time related changed, it is separate ticket
See SQOOP-1771 for the detail of the JSON format.


NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.


Diffs (updated)
-----

  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
  connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 

Diff: https://reviews.apache.org/r/28139/diff/


Testing
-------

yes unit tests added. 


Thanks,

Veena Basavaraj


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.

> On Nov. 28, 2014, 5:27 p.m., Abraham Elmahrek wrote:
> > connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java, line 355
> > <https://reviews.apache.org/r/28139/diff/7-8/?file=776415#file776415line355>
> >
> >     Can we test nested array case as well?
> >     Object in - Text out
> >     Object in - Object out
> >     Text in - Object out
> >     Text in - Text out
> 
> Veena Basavaraj wrote:
>     I will for sure, just noticed that such echaustive ones are missing for date time and bit use cases. We should create a ticket to add those as well,

Note, the 4 combinations were added for the Array fo strings and not for all the array types, since it is the exact code path

I have added the it for the nested case as you requested


- Veena


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review63273
-----------------------------------------------------------


On Nov. 27, 2014, 7:58 a.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 27, 2014, 7:58 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Arrays/ Nested/ Set
> 
> The latest rev does not have time related changed, it is separate ticket
> See SQOOP-1771 for the detail of the JSON format.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.

> On Nov. 28, 2014, 5:27 p.m., Abraham Elmahrek wrote:
> > connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java, line 355
> > <https://reviews.apache.org/r/28139/diff/7-8/?file=776415#file776415line355>
> >
> >     Can we test nested array case as well?
> >     Object in - Text out
> >     Object in - Object out
> >     Text in - Object out
> >     Text in - Text out

I will for sure, just noticed that such echaustive ones are missing for date time and bit use cases. We should create a ticket to add those as well,


- Veena


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review63273
-----------------------------------------------------------


On Nov. 27, 2014, 7:58 a.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 27, 2014, 7:58 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Arrays/ Nested/ Set
> 
> The latest rev does not have time related changed, it is separate ticket
> See SQOOP-1771 for the detail of the JSON format.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.

> On Nov. 28, 2014, 5:27 p.m., Abraham Elmahrek wrote:
> > connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java, line 355
> > <https://reviews.apache.org/r/28139/diff/7-8/?file=776415#file776415line355>
> >
> >     Can we test nested array case as well?
> >     Object in - Text out
> >     Object in - Object out
> >     Text in - Object out
> >     Text in - Text out
> 
> Veena Basavaraj wrote:
>     I will for sure, just noticed that such echaustive ones are missing for date time and bit use cases. We should create a ticket to add those as well,
> 
> Veena Basavaraj wrote:
>     Note, the 4 combinations were added for the Array fo strings and not for all the array types, since it is the exact code path
>     
>     I have added the it for the nested case as you requested

ticket added for bit/ date time.

https://issues.apache.org/jira/browse/SQOOP-1817


- Veena


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review63273
-----------------------------------------------------------


On Nov. 28, 2014, 8:01 p.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 28, 2014, 8:01 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Arrays/ Nested/ Set
> 
> The latest rev does not have time related changed, it is separate ticket
> See SQOOP-1771 for the detail of the JSON format.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Abraham Elmahrek <ab...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review63273
-----------------------------------------------------------

Ship it!


Looks great IMO. Just a couple of more tests (more exhaustive) and we're good.


connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java
<https://reviews.apache.org/r/28139/#comment105431>

    Can we test nested array case as well?
    Object in - Text out
    Object in - Object out
    Text in - Object out
    Text in - Text out


- Abraham Elmahrek


On Nov. 27, 2014, 3:58 p.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 27, 2014, 3:58 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Arrays/ Nested/ Set
> 
> The latest rev does not have time related changed, it is separate ticket
> See SQOOP-1771 for the detail of the JSON format.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/
-----------------------------------------------------------

(Updated Nov. 27, 2014, 7:58 a.m.)


Review request for Sqoop.


Bugs: SQOOP-1749
    https://issues.apache.org/jira/browse/SQOOP-1749


Repository: sqoop-sqoop2


Description
-------

see JIRA

Arrays/ Nested/ Set

The latest rev does not have time related changed, it is separate ticket
See SQOOP-1771 for the detail of the JSON format.


NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.


Diffs (updated)
-----

  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
  connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 

Diff: https://reviews.apache.org/r/28139/diff/


Testing
-------

yes unit tests added. 


Thanks,

Veena Basavaraj


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/
-----------------------------------------------------------

(Updated Nov. 25, 2014, 9:23 p.m.)


Review request for Sqoop.


Bugs: SQOOP-1749
    https://issues.apache.org/jira/browse/SQOOP-1749


Repository: sqoop-sqoop2


Description
-------

see JIRA

Arrays/ Nested/ Set

The latest rev does not have time related changed, it is separate ticket
See SQOOP-1771 for the detail of the JSON format.


NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.


Diffs (updated)
-----

  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
  connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 

Diff: https://reviews.apache.org/r/28139/diff/


Testing
-------

yes unit tests added. 


Thanks,

Veena Basavaraj


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/
-----------------------------------------------------------

(Updated Nov. 25, 2014, 1:51 p.m.)


Review request for Sqoop.


Bugs: SQOOP-1749
    https://issues.apache.org/jira/browse/SQOOP-1749


Repository: sqoop-sqoop2


Description (updated)
-------

see JIRA

Arrays/ Nested/ Set

The latest rev does not have time related changed, it is separate ticket
See SQOOP-1771 for the detail of the JSON format.


NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.


Diffs (updated)
-----

  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
  connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 

Diff: https://reviews.apache.org/r/28139/diff/


Testing
-------

yes unit tests added. 


Thanks,

Veena Basavaraj


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/
-----------------------------------------------------------

(Updated Nov. 19, 2014, 9:08 a.m.)


Review request for Sqoop.


Changes
-------

remove some lingering WS


Bugs: SQOOP-1749
    https://issues.apache.org/jira/browse/SQOOP-1749


Repository: sqoop-sqoop2


Description
-------

see JIRA

Supporting Nested arrays is going to be dody logic

some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.

Welcome feedback if I can improve this more.

I am sure mores tests can be added.


NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.


Diffs (updated)
-----

  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1b5b290903a6c93dbb8427af525515e2b2 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc64ec695fbdf2f9b6acec1eb0235fd675be 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679d0b4b9e04d05eef4dd5280f201bbe58e2 
  connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c15c6558cf21a0d82c78a51903d8fb371c 

Diff: https://reviews.apache.org/r/28139/diff/


Testing
-------

yes unit tests added. 


File Attachments
----------------

SQOOP-1749-v2.patch
  https://reviews.apache.org/media/uploaded/files/2014/11/19/cfdfc0ed-57c9-40d3-959b-f482f1600c03__SQOOP-1749-v2.patch


Thanks,

Veena Basavaraj


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/
-----------------------------------------------------------

(Updated Nov. 18, 2014, 10:58 p.m.)


Review request for Sqoop.


Bugs: SQOOP-1749
    https://issues.apache.org/jira/browse/SQOOP-1749


Repository: sqoop-sqoop2


Description
-------

see JIRA

Supporting Nested arrays is going to be dody logic

some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.

Welcome feedback if I can improve this more.

I am sure mores tests can be added.


NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.


Diffs (updated)
-----

  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1b5b290903a6c93dbb8427af525515e2b2 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc64ec695fbdf2f9b6acec1eb0235fd675be 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679d0b4b9e04d05eef4dd5280f201bbe58e2 
  connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c15c6558cf21a0d82c78a51903d8fb371c 

Diff: https://reviews.apache.org/r/28139/diff/


Testing
-------

yes unit tests added. 


File Attachments
----------------

SQOOP-1749-v2.patch
  https://reviews.apache.org/media/uploaded/files/2014/11/19/cfdfc0ed-57c9-40d3-959b-f482f1600c03__SQOOP-1749-v2.patch


Thanks,

Veena Basavaraj


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/
-----------------------------------------------------------

(Updated Nov. 18, 2014, 10:02 p.m.)


Review request for Sqoop.


Bugs: SQOOP-1749
    https://issues.apache.org/jira/browse/SQOOP-1749


Repository: sqoop-sqoop2


Description
-------

see JIRA

Supporting Nested arrays is going to be dody logic

some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.

Welcome feedback if I can improve this more.

I am sure mores tests can be added.


NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.


Diffs (updated)
-----

  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1b5b290903a6c93dbb8427af525515e2b2 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc64ec695fbdf2f9b6acec1eb0235fd675be 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679d0b4b9e04d05eef4dd5280f201bbe58e2 
  connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c15c6558cf21a0d82c78a51903d8fb371c 

Diff: https://reviews.apache.org/r/28139/diff/


Testing
-------

yes unit tests added. 


File Attachments
----------------

SQOOP-1749-v2.patch
  https://reviews.apache.org/media/uploaded/files/2014/11/19/cfdfc0ed-57c9-40d3-959b-f482f1600c03__SQOOP-1749-v2.patch


Thanks,

Veena Basavaraj


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/
-----------------------------------------------------------

(Updated Nov. 18, 2014, 9:59 p.m.)


Review request for Sqoop.


Bugs: SQOOP-1749
    https://issues.apache.org/jira/browse/SQOOP-1749


Repository: sqoop-sqoop2


Description
-------

see JIRA

Supporting Nested arrays is going to be dody logic

some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.

Welcome feedback if I can improve this more.

I am sure mores tests can be added.


NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.


Diffs (updated)
-----

  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1b5b290903a6c93dbb8427af525515e2b2 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc64ec695fbdf2f9b6acec1eb0235fd675be 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679d0b4b9e04d05eef4dd5280f201bbe58e2 
  connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c15c6558cf21a0d82c78a51903d8fb371c 

Diff: https://reviews.apache.org/r/28139/diff/


Testing
-------

yes unit tests added. 


File Attachments (updated)
----------------

SQOOP-1749-v2.patch
  https://reviews.apache.org/media/uploaded/files/2014/11/19/cfdfc0ed-57c9-40d3-959b-f482f1600c03__SQOOP-1749-v2.patch


Thanks,

Veena Basavaraj


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.

> On Nov. 18, 2014, 5 p.m., Jarek Cecho wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java, lines 78-79
> > <https://reviews.apache.org/r/28139/diff/1/?file=766590#file766590line78>
> >
> >     Can you plase document how is array expected to be coded up in the IDF? It seems that we don't have this in the code yet, so I would simply update the wiki:
> >     
> >     https://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Intermediate+representation#Sqoop2Intermediaterepresentation-Intermediateformatrepresentationproposal
> >     
> >     (We can move the IDF proposal from wiki to code in subsequent JIRA)

I have a ticket for that. I would rather keep this in one place
https://issues.apache.org/jira/browse/SQOOP-1706


> On Nov. 18, 2014, 5 p.m., Jarek Cecho wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java, line 99
> > <https://reviews.apache.org/r/28139/diff/1/?file=766590#file766590line99>
> >
> >     I'm wondering if this one really needs to be public?

it is default access so we can use it in unit tests, there is no need to be public yes


> On Nov. 18, 2014, 5 p.m., Jarek Cecho wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java, lines 21-62
> > <https://reviews.apache.org/r/28139/diff/1/?file=766590#file766590line21>
> >
> >     Nit: Seems as unncecessary change in ordering.

I used the IDE ordering, I am not sure if there a standard for this.


> On Nov. 18, 2014, 5 p.m., Jarek Cecho wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java, line 321
> > <https://reviews.apache.org/r/28139/diff/1/?file=766590#file766590line321>
> >
> >     Why space as a separator for nested data types when we are already using comma as a separator?

this is for the elements inside the array. We use comma between the 2 columns.


Another alternative would be encode this as JSON string, which I might try as Abe suggested below.

['A' 'B'], 'text'


> On Nov. 18, 2014, 5 p.m., Jarek Cecho wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java, line 102
> > <https://reviews.apache.org/r/28139/diff/1/?file=766590#file766590line102>
> >
> >     Based on the intermediate format proposal the correct Time format should be: HH:MM:DD[.ZZZZZZ][+/-XX]
> >     
> >     https://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Intermediate+representation#Sqoop2Intermediaterepresentation-Intermediateformatrepresentationproposal
> >     
> >     Can we add the missing parts?

that is support this as Time?

so what are some of the examples you would like to be tested.?

would be easy for me to add those to test cases


- Veena


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62081
-----------------------------------------------------------


On Nov. 18, 2014, 11:52 a.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 18, 2014, 11:52 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/sqoop/schema/type/AbstractComplexListType.java 5d8f28b 
>   common/src/main/java/org/apache/sqoop/schema/type/Array.java af13eb7 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Jarek Cecho <ja...@apache.org>.

On Nov. 19, 2014, 1 a.m., Veena Basavaraj wrote:
> > Jarcec
> 
> Veena Basavaraj wrote:
>     please see the latest revision., it uses JSON. It is pretty universal so I rather use JSON.
> 
> Jarek Cecho wrote:
>     Why JSON for intermediate data format? I would much rather use something that is used by pg_dump and mysqldump (or Hive) rather then JSON. IDF is just intermediate format, we should be as effective here as possible and we should not be doing conversions from format to format.
> 
> Veena Basavaraj wrote:
>     it so happens that they use the same too, I would prefer you look at the patch on why we are using an encoding such as JSON to represent arrays and maps, imagine a case of [[a,b] [z,q] [w,e]] - will be just represented as JSONstring, so when parsing it back to actual arays or arrays object, it is no brainer and we use the JSON parse method, the code also says the CSV IDF to ignore any commans within the JSON string
> 
> Jarek Cecho wrote:
>     As long as pg_dump and Hive are using the same format I'm fine. But I don't want to introduce different format then those two without a good reason.
> 
> Veena Basavaraj wrote:
>     I dont know about hive and if there is no standard all of them use, then we cannot do much.
>     And unless we have hive / post gres connectors using this code, I would not over optimize for it. JSON serializers are pretty fast esp jackson which we shoudl replace it to many places and there is ticket for it as well.
>     I would cross the bridge when we get there, if performance is a concern for certain Data sources, changing this to any other type should be  strighforward.

I believe that part of this JIRA is to go and explore how Hive and pg_dump are serializing data to provide the best way in Sqoop 2. That is what we've done when designing CSV IDF and we should keep doing that. The CSV IDF is not internal as it's exposed as a standard for other connectors and we should not change that in the future in backward incompatible way. Hence we should be careful when introducing new syntax and rules and the due diligence in investigating what is best.


- Jarek


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62081
-----------------------------------------------------------


On Nov. 19, 2014, 5:08 p.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2014, 5:08 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1b5b290903a6c93dbb8427af525515e2b2 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc64ec695fbdf2f9b6acec1eb0235fd675be 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679d0b4b9e04d05eef4dd5280f201bbe58e2 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c15c6558cf21a0d82c78a51903d8fb371c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> File Attachments
> ----------------
> 
> SQOOP-1749-v2.patch
>   https://reviews.apache.org/media/uploaded/files/2014/11/19/cfdfc0ed-57c9-40d3-959b-f482f1600c03__SQOOP-1749-v2.patch
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Abraham Elmahrek <ab...@cloudera.com>.

On Nov. 19, 2014, 1 a.m., Veena Basavaraj wrote:
> > Jarcec
> 
> Veena Basavaraj wrote:
>     please see the latest revision., it uses JSON. It is pretty universal so I rather use JSON.
> 
> Jarek Cecho wrote:
>     Why JSON for intermediate data format? I would much rather use something that is used by pg_dump and mysqldump (or Hive) rather then JSON. IDF is just intermediate format, we should be as effective here as possible and we should not be doing conversions from format to format.
> 
> Veena Basavaraj wrote:
>     it so happens that they use the same too, I would prefer you look at the patch on why we are using an encoding such as JSON to represent arrays and maps, imagine a case of [[a,b] [z,q] [w,e]] - will be just represented as JSONstring, so when parsing it back to actual arays or arrays object, it is no brainer and we use the JSON parse method, the code also says the CSV IDF to ignore any commans within the JSON string
> 
> Jarek Cecho wrote:
>     As long as pg_dump and Hive are using the same format I'm fine. But I don't want to introduce different format then those two without a good reason.
> 
> Veena Basavaraj wrote:
>     I dont know about hive and if there is no standard all of them use, then we cannot do much.
>     And unless we have hive / post gres connectors using this code, I would not over optimize for it. JSON serializers are pretty fast esp jackson which we shoudl replace it to many places and there is ticket for it as well.
>     I would cross the bridge when we get there, if performance is a concern for certain Data sources, changing this to any other type should be  strighforward.
> 
> Jarek Cecho wrote:
>     I believe that part of this JIRA is to go and explore how Hive and pg_dump are serializing data to provide the best way in Sqoop 2. That is what we've done when designing CSV IDF and we should keep doing that. The CSV IDF is not internal as it's exposed as a standard for other connectors and we should not change that in the future in backward incompatible way. Hence we should be careful when introducing new syntax and rules and the due diligence in investigating what is best.
> 
> Veena Basavaraj wrote:
>     I am still not sure what the concern is , how we store the IDF is and should be internal to sqoop. I will let Abe weigh in as well. The fact that is it called IDF means, sqoop is responsible to read from FROM..store it in its IDF and then when it passes data to the outputcommitter/ format ( in case of MR) it uses this and converts it back to the java objects. So If I modify this internally string to CSV to something else, it should not affect any conenctor develoepr.
>     
>     if course if I change what an Array means then it will affect

I agree with Jarcec on this. JSON was a suggestion that I gave, but I do think that we should try to align with pgdump, mysqldump, oracle export to csv, hive, etc. If there is no standard, we should choose the most generic and frequently used one.


- Abraham


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62081
-----------------------------------------------------------


On Nov. 19, 2014, 5:08 p.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2014, 5:08 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1b5b290903a6c93dbb8427af525515e2b2 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc64ec695fbdf2f9b6acec1eb0235fd675be 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679d0b4b9e04d05eef4dd5280f201bbe58e2 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c15c6558cf21a0d82c78a51903d8fb371c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> File Attachments
> ----------------
> 
> SQOOP-1749-v2.patch
>   https://reviews.apache.org/media/uploaded/files/2014/11/19/cfdfc0ed-57c9-40d3-959b-f482f1600c03__SQOOP-1749-v2.patch
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.

> On Nov. 18, 2014, 5 p.m., Jarek Cecho wrote:
> > As usual, I do see several trailing whitespace characters.

As usual, I always put a disclaimer in the top of the ticket:) 

I tend not to do the formatting since sometimes it formats it more than required and it frowned upon for unrelated changed


- Veena


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62081
-----------------------------------------------------------


On Nov. 18, 2014, 11:52 a.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 18, 2014, 11:52 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/sqoop/schema/type/AbstractComplexListType.java 5d8f28b 
>   common/src/main/java/org/apache/sqoop/schema/type/Array.java af13eb7 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.

> On Nov. 18, 2014, 5 p.m., Jarek Cecho wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java, line 102
> > <https://reviews.apache.org/r/28139/diff/1/?file=766590#file766590line102>
> >
> >     Based on the intermediate format proposal the correct Time format should be: HH:MM:DD[.ZZZZZZ][+/-XX]
> >     
> >     https://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Intermediate+representation#Sqoop2Intermediaterepresentation-Intermediateformatrepresentationproposal
> >     
> >     Can we add the missing parts?
> 
> Veena Basavaraj wrote:
>     that is support this as Time?
>     
>     so what are some of the examples you would like to be tested.?
>     
>     would be easy for me to add those to test cases

I am going to fix this is a follow up ticket, might be worth discussiin timestamp as well.


- Veena


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62081
-----------------------------------------------------------


On Nov. 18, 2014, 11:52 a.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 18, 2014, 11:52 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/sqoop/schema/type/AbstractComplexListType.java 5d8f28b 
>   common/src/main/java/org/apache/sqoop/schema/type/Array.java af13eb7 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.

On Nov. 18, 2014, 5 p.m., Veena Basavaraj wrote:
> > Jarcec
> 
> Veena Basavaraj wrote:
>     please see the latest revision., it uses JSON. It is pretty universal so I rather use JSON.
> 
> Jarek Cecho wrote:
>     Why JSON for intermediate data format? I would much rather use something that is used by pg_dump and mysqldump (or Hive) rather then JSON. IDF is just intermediate format, we should be as effective here as possible and we should not be doing conversions from format to format.
> 
> Veena Basavaraj wrote:
>     it so happens that they use the same too, I would prefer you look at the patch on why we are using an encoding such as JSON to represent arrays and maps, imagine a case of [[a,b] [z,q] [w,e]] - will be just represented as JSONstring, so when parsing it back to actual arays or arrays object, it is no brainer and we use the JSON parse method, the code also says the CSV IDF to ignore any commans within the JSON string
> 
> Jarek Cecho wrote:
>     As long as pg_dump and Hive are using the same format I'm fine. But I don't want to introduce different format then those two without a good reason.

I dont know about hive and if there is no standard all of them use, then we cannot do much.
And unless we have hive / post gres connectors using this code, I would not over optimize for it. JSON serializers are pretty fast esp jackson which we shoudl replace it to many places and there is ticket for it as well.
I would cross the bridge when we get there, if performance is a concern for certain Data sources, changing this to any other type should be  strighforward.


- Veena


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62081
-----------------------------------------------------------


On Nov. 19, 2014, 9:08 a.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2014, 9:08 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1b5b290903a6c93dbb8427af525515e2b2 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc64ec695fbdf2f9b6acec1eb0235fd675be 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679d0b4b9e04d05eef4dd5280f201bbe58e2 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c15c6558cf21a0d82c78a51903d8fb371c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> File Attachments
> ----------------
> 
> SQOOP-1749-v2.patch
>   https://reviews.apache.org/media/uploaded/files/2014/11/19/cfdfc0ed-57c9-40d3-959b-f482f1600c03__SQOOP-1749-v2.patch
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Jarek Cecho <ja...@apache.org>.

On Nov. 19, 2014, 1 a.m., Veena Basavaraj wrote:
> > Jarcec
> 
> Veena Basavaraj wrote:
>     please see the latest revision., it uses JSON. It is pretty universal so I rather use JSON.
> 
> Jarek Cecho wrote:
>     Why JSON for intermediate data format? I would much rather use something that is used by pg_dump and mysqldump (or Hive) rather then JSON. IDF is just intermediate format, we should be as effective here as possible and we should not be doing conversions from format to format.
> 
> Veena Basavaraj wrote:
>     it so happens that they use the same too, I would prefer you look at the patch on why we are using an encoding such as JSON to represent arrays and maps, imagine a case of [[a,b] [z,q] [w,e]] - will be just represented as JSONstring, so when parsing it back to actual arays or arrays object, it is no brainer and we use the JSON parse method, the code also says the CSV IDF to ignore any commans within the JSON string
> 
> Jarek Cecho wrote:
>     As long as pg_dump and Hive are using the same format I'm fine. But I don't want to introduce different format then those two without a good reason.
> 
> Veena Basavaraj wrote:
>     I dont know about hive and if there is no standard all of them use, then we cannot do much.
>     And unless we have hive / post gres connectors using this code, I would not over optimize for it. JSON serializers are pretty fast esp jackson which we shoudl replace it to many places and there is ticket for it as well.
>     I would cross the bridge when we get there, if performance is a concern for certain Data sources, changing this to any other type should be  strighforward.
> 
> Jarek Cecho wrote:
>     I believe that part of this JIRA is to go and explore how Hive and pg_dump are serializing data to provide the best way in Sqoop 2. That is what we've done when designing CSV IDF and we should keep doing that. The CSV IDF is not internal as it's exposed as a standard for other connectors and we should not change that in the future in backward incompatible way. Hence we should be careful when introducing new syntax and rules and the due diligence in investigating what is best.
> 
> Veena Basavaraj wrote:
>     I am still not sure what the concern is , how we store the IDF is and should be internal to sqoop. I will let Abe weigh in as well. The fact that is it called IDF means, sqoop is responsible to read from FROM..store it in its IDF and then when it passes data to the outputcommitter/ format ( in case of MR) it uses this and converts it back to the java objects. So If I modify this internally string to CSV to something else, it should not affect any conenctor develoepr.
>     
>     if course if I change what an Array means then it will affect
> 
> Abraham Elmahrek wrote:
>     I agree with Jarcec on this. JSON was a suggestion that I gave, but I do think that we should try to align with pgdump, mysqldump, oracle export to csv, hive, etc. If there is no standard, we should choose the most generic and frequently used one.
> 
> Veena Basavaraj wrote:
>     there are 2 different thigns going on this thread. I am not sure what you agree on.
>     
>     if you agree on using a standard that is used my most, that I do too. and it happens to be JSON
> 
> Veena Basavaraj wrote:
>     So a bit more discsusion with Abe and here is a detailed summary of why I think we should standardize on what the setText shud give and what getText will give in the IDF and that should be for every Column type we support and not jsut arrays.
>     
>     Lets all be on the same page of how we use terminologies. 
>     
>     1. IDF is a abstract class that supports serializing / de ser data in sqoop. It exposes many methods in its apis
>     relevant ones
>      - setTextData getTextData - deals with entire row as one string.
>      - setObjectData  getObjectData - deals with entire row and an object array
>      - setSchema ( with column types) - very relevant when we use the setObjectData, since we want to know what every object in that array is really, a string, a snumber, a map, a list, a enum etc...This information of the types of the columns within the object array is very much required when we ser/de-ser between 2 different types. Read below to see when this column type comes handy
>      - 
>     2. CSV happens to be one such implementation of IDF. But really what makes different impleemntations different is how they treat the object data internally. Read below to understand why I say this
>     
>     
>     Note that the getText and setText method implementations  do not vary per IDF implementation at all. it is dumb method that will put all the burden on the connector to give a correctly formatted string that makes sense for its corresponding TO side to infer what the data is. HDFS to HDFS, is one such trivial use case, where no matter what IDFimplementation I use, it does not matter, since I will write as text and retrieve as text.
>     
>     Since sqoop does not even validate what format the text is, it is upto the hdfs connector to store it any format.
>     
>     
>     The more interesting use cases are this
>     
>     Say thr FROM data source writes the data as a text and the TO data source now wants to read it as a object array to load it into its data base. Now the whole point of schema and column types becomes very important since we are dealing with 2 different types.
>     
>     When the FROM sets the text data, we expect it to be really in a format Sqoop can parse it as objects.
>     
>     Lets take a example:
>     
>     Say I have a table "users" with the field "name" "description" that are varchar. In the schema I map these varchar sto the IDF's TEXT type. 
>     all good so far. @@
>     
>     Now the data for the name is (vee,na), and description is (she is an enginner),  yes the name seems to have a comma within it, totally possible, it can have a quote or any other special character... 
>     
>     Now we are hoping that when vee,na is set, it is properly escaped for comma and then quoted. 
>     
>     This means the connector developers now have to be aware of all these rules. They cannot just dump the data.
>     
>     You may ask why is escaping required.? 
>     
>     So if they did not escape the ",", when sqoop stores this in the text it will store is as it was given ( say the conenctor developer decided to store this as comma separate, they could as well decide to do toJSONString ans give us the string, since we clearly have not not documented what we expect, but thats a different story)...so lets say they do csv
>     
>     vee,na,she is an engineer
>     
>     On the TO side, as I said before, say the data source wants all this data as object array. 
>     
>     So CSV implmentation of IDF will use the "," as a separator, 
>     
>     it will give create 3 array object since it splits it based on the ","!!!
>     
>     object[0] = vee
>     object[1]= na
>     object[2] =she,
>     
>     So this is clearly not what we wanted. The above was very simple use case with strings that fails. Show we just bail out and say the connector did not encode it well, or rather be more proactive and have a standardized format that we will use to endoe every element within the CSV ?
>     
>     Now imagine the use cases with numbers, floating points, maps, arrays and nested arrays. Comma will appear if proper encoding is not used. 
>     
>     
>     So if we standardize and say all the text data shud be encoded as JSON and any text data you will get will be in JSON, then ists pretty simple rule to follow, there is no need of custom regex and escaping, and unescaping and unecoding rules in either connectors not sqoop when we trasnfer data between text and object formats and viceversa
>     
>     
>     After writing this, I feel the name Internal Data Format is compeltely misleading. 
>     
>     For the use cases I talked about where, FROM does setObjectData(Object[]) and to does a getText() the internal format is totally exposed, so it is no longer internal anymore.
>     
>     
>     But if we say the 2 supported uses cases are setText and getText , and setObject and getObject, then the internal representation is truly internal. 
>     
>     But the former is more of a reality than the latter, unless we mandate using object[] array for both extract/load phase

Thank you for the example Veena. This scenario actually should not happen as the returned "text" format is documented on our wiki [1]. All connectors are required to use this format for text representation and that is why it's so important to define it clearly. E.g. the text format is not internal to Sqoop, it's exposed to the connector developpers. 

The reason for that is that most of high performant data transfer mechanisms (mysqldump, pg_dump, netezza external tables) will get us the data in text and we want to take them and with the smallest CPU burder possible get them through Sqoop internals to target destionation. Thee most peformant example would be database to text on HDFS and in this case we don't want to do any conversions. E.g. we don't want to read and construct JSON objects at all. That is why we've defined the IDF text format on the wiki by following what is already there and I would like to stick to it.

Links:
1: https://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Intermediate+representation#Sqoop2Intermediaterepresentation-Intermediateformatrepresentationproposal


- Jarek


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62081
-----------------------------------------------------------


On Nov. 19, 2014, 5:08 p.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2014, 5:08 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1b5b290903a6c93dbb8427af525515e2b2 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc64ec695fbdf2f9b6acec1eb0235fd675be 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679d0b4b9e04d05eef4dd5280f201bbe58e2 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c15c6558cf21a0d82c78a51903d8fb371c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> File Attachments
> ----------------
> 
> SQOOP-1749-v2.patch
>   https://reviews.apache.org/media/uploaded/files/2014/11/19/cfdfc0ed-57c9-40d3-959b-f482f1600c03__SQOOP-1749-v2.patch
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.

On Nov. 18, 2014, 5 p.m., Veena Basavaraj wrote:
> > Jarcec
> 
> Veena Basavaraj wrote:
>     please see the latest revision., it uses JSON. It is pretty universal so I rather use JSON.
> 
> Jarek Cecho wrote:
>     Why JSON for intermediate data format? I would much rather use something that is used by pg_dump and mysqldump (or Hive) rather then JSON. IDF is just intermediate format, we should be as effective here as possible and we should not be doing conversions from format to format.
> 
> Veena Basavaraj wrote:
>     it so happens that they use the same too, I would prefer you look at the patch on why we are using an encoding such as JSON to represent arrays and maps, imagine a case of [[a,b] [z,q] [w,e]] - will be just represented as JSONstring, so when parsing it back to actual arays or arrays object, it is no brainer and we use the JSON parse method, the code also says the CSV IDF to ignore any commans within the JSON string
> 
> Jarek Cecho wrote:
>     As long as pg_dump and Hive are using the same format I'm fine. But I don't want to introduce different format then those two without a good reason.
> 
> Veena Basavaraj wrote:
>     I dont know about hive and if there is no standard all of them use, then we cannot do much.
>     And unless we have hive / post gres connectors using this code, I would not over optimize for it. JSON serializers are pretty fast esp jackson which we shoudl replace it to many places and there is ticket for it as well.
>     I would cross the bridge when we get there, if performance is a concern for certain Data sources, changing this to any other type should be  strighforward.
> 
> Jarek Cecho wrote:
>     I believe that part of this JIRA is to go and explore how Hive and pg_dump are serializing data to provide the best way in Sqoop 2. That is what we've done when designing CSV IDF and we should keep doing that. The CSV IDF is not internal as it's exposed as a standard for other connectors and we should not change that in the future in backward incompatible way. Hence we should be careful when introducing new syntax and rules and the due diligence in investigating what is best.
> 
> Veena Basavaraj wrote:
>     I am still not sure what the concern is , how we store the IDF is and should be internal to sqoop. I will let Abe weigh in as well. The fact that is it called IDF means, sqoop is responsible to read from FROM..store it in its IDF and then when it passes data to the outputcommitter/ format ( in case of MR) it uses this and converts it back to the java objects. So If I modify this internally string to CSV to something else, it should not affect any conenctor develoepr.
>     
>     if course if I change what an Array means then it will affect
> 
> Abraham Elmahrek wrote:
>     I agree with Jarcec on this. JSON was a suggestion that I gave, but I do think that we should try to align with pgdump, mysqldump, oracle export to csv, hive, etc. If there is no standard, we should choose the most generic and frequently used one.
> 
> Veena Basavaraj wrote:
>     there are 2 different thigns going on this thread. I am not sure what you agree on.
>     
>     if you agree on using a standard that is used my most, that I do too. and it happens to be JSON

So a bit more discsusion with Abe and here is a detailed summary of why I think we should standardize on what the setText shud give and what getText will give in the IDF and that should be for every Column type we support and not jsut arrays.

Lets all be on the same page of how we use terminologies. 

1. IDF is a abstract class that supports serializing / de ser data in sqoop. It exposes many methods in its apis
relevant ones
 - setTextData getTextData - deals with entire row as one string.
 - setObjectData  getObjectData - deals with entire row and an object array
 - setSchema ( with column types) - very relevant when we use the setObjectData, since we want to know what every object in that array is really, a string, a snumber, a map, a list, a enum etc...This information of the types of the columns within the object array is very much required when we ser/de-ser between 2 different types. Read below to see when this column type comes handy
 - 
2. CSV happens to be one such implementation of IDF. But really what makes different impleemntations different is how they treat the object data internally. Read below to understand why I say this


Note that the getText and setText method implementations  do not vary per IDF implementation at all. it is dumb method that will put all the burden on the connector to give a correctly formatted string that makes sense for its corresponding TO side to infer what the data is. HDFS to HDFS, is one such trivial use case, where no matter what IDFimplementation I use, it does not matter, since I will write as text and retrieve as text.

Since sqoop does not even validate what format the text is, it is upto the hdfs connector to store it any format.


The more interesting use cases are this

Say thr FROM data source writes the data as a text and the TO data source now wants to read it as a object array to load it into its data base. Now the whole point of schema and column types becomes very important since we are dealing with 2 different types.

When the FROM sets the text data, we expect it to be really in a format Sqoop can parse it as objects.

Lets take a example:

Say I have a table "users" with the field "name" "description" that are varchar. In the schema I map these varchar sto the IDF's TEXT type. 
all good so far. @@

Now the data for the name is (vee,na), and description is (she is an enginner),  yes the name seems to have a comma within it, totally possible, it can have a quote or any other special character... 

Now we are hoping that when vee,na is set, it is properly escaped for comma and then quoted. 

This means the connector developers now have to be aware of all these rules. They cannot just dump the data.

You may ask why is escaping required.? 

So if they did not escape the ",", when sqoop stores this in the text it will store is as it was given ( say the conenctor developer decided to store this as comma separate, they could as well decide to do toJSONString ans give us the string, since we clearly have not not documented what we expect, but thats a different story)...so lets say they do csv

vee,na,she is an engineer

On the TO side, as I said before, say the data source wants all this data as object array. 

So CSV implmentation of IDF will use the "," as a separator, 

it will give create 3 array object since it splits it based on the ","!!!

object[0] = vee
object[1]= na
object[2] =she,

So this is clearly not what we wanted. The above was very simple use case with strings that fails. Show we just bail out and say the connector did not encode it well, or rather be more proactive and have a standardized format that we will use to endoe every element within the CSV ?

Now imagine the use cases with numbers, floating points, maps, arrays and nested arrays. Comma will appear if proper encoding is not used. 


So if we standardize and say all the text data shud be encoded as JSON and any text data you will get will be in JSON, then ists pretty simple rule to follow, there is no need of custom regex and escaping, and unescaping and unecoding rules in either connectors not sqoop when we trasnfer data between text and object formats and viceversa


After writing this, I feel the name Internal Data Format is compeltely misleading. 

For the use cases I talked about where, FROM does setObjectData(Object[]) and to does a getText() the internal format is totally exposed, so it is no longer internal anymore.


But if we say the 2 supported uses cases are setText and getText , and setObject and getObject, then the internal representation is truly internal. 

But the former is more of a reality than the latter, unless we mandate using object[] array for both extract/load phase


- Veena


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62081
-----------------------------------------------------------


On Nov. 19, 2014, 9:08 a.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2014, 9:08 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1b5b290903a6c93dbb8427af525515e2b2 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc64ec695fbdf2f9b6acec1eb0235fd675be 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679d0b4b9e04d05eef4dd5280f201bbe58e2 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c15c6558cf21a0d82c78a51903d8fb371c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> File Attachments
> ----------------
> 
> SQOOP-1749-v2.patch
>   https://reviews.apache.org/media/uploaded/files/2014/11/19/cfdfc0ed-57c9-40d3-959b-f482f1600c03__SQOOP-1749-v2.patch
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.

On Nov. 18, 2014, 5 p.m., Veena Basavaraj wrote:
> > Jarcec
> 
> Veena Basavaraj wrote:
>     please see the latest revision., it uses JSON. It is pretty universal so I rather use JSON.
> 
> Jarek Cecho wrote:
>     Why JSON for intermediate data format? I would much rather use something that is used by pg_dump and mysqldump (or Hive) rather then JSON. IDF is just intermediate format, we should be as effective here as possible and we should not be doing conversions from format to format.
> 
> Veena Basavaraj wrote:
>     it so happens that they use the same too, I would prefer you look at the patch on why we are using an encoding such as JSON to represent arrays and maps, imagine a case of [[a,b] [z,q] [w,e]] - will be just represented as JSONstring, so when parsing it back to actual arays or arrays object, it is no brainer and we use the JSON parse method, the code also says the CSV IDF to ignore any commans within the JSON string
> 
> Jarek Cecho wrote:
>     As long as pg_dump and Hive are using the same format I'm fine. But I don't want to introduce different format then those two without a good reason.
> 
> Veena Basavaraj wrote:
>     I dont know about hive and if there is no standard all of them use, then we cannot do much.
>     And unless we have hive / post gres connectors using this code, I would not over optimize for it. JSON serializers are pretty fast esp jackson which we shoudl replace it to many places and there is ticket for it as well.
>     I would cross the bridge when we get there, if performance is a concern for certain Data sources, changing this to any other type should be  strighforward.
> 
> Jarek Cecho wrote:
>     I believe that part of this JIRA is to go and explore how Hive and pg_dump are serializing data to provide the best way in Sqoop 2. That is what we've done when designing CSV IDF and we should keep doing that. The CSV IDF is not internal as it's exposed as a standard for other connectors and we should not change that in the future in backward incompatible way. Hence we should be careful when introducing new syntax and rules and the due diligence in investigating what is best.

I am still not sure what the concern is , how we store the IDF is and should be internal to sqoop. I will let Abe weigh in as well. The fact that is it called IDF means, sqoop is responsible to read from FROM..store it in its IDF and then when it passes data to the outputcommitter/ format ( in case of MR) it uses this and converts it back to the java objects. So If I modify this internally string to CSV to something else, it should not affect any conenctor develoepr.

if course if I change what an Array means then it will affect


- Veena


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62081
-----------------------------------------------------------


On Nov. 19, 2014, 9:08 a.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2014, 9:08 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1b5b290903a6c93dbb8427af525515e2b2 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc64ec695fbdf2f9b6acec1eb0235fd675be 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679d0b4b9e04d05eef4dd5280f201bbe58e2 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c15c6558cf21a0d82c78a51903d8fb371c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> File Attachments
> ----------------
> 
> SQOOP-1749-v2.patch
>   https://reviews.apache.org/media/uploaded/files/2014/11/19/cfdfc0ed-57c9-40d3-959b-f482f1600c03__SQOOP-1749-v2.patch
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.

On Nov. 18, 2014, 5 p.m., Veena Basavaraj wrote:
> > Jarcec
> 
> Veena Basavaraj wrote:
>     please see the latest revision., it uses JSON. It is pretty universal so I rather use JSON.
> 
> Jarek Cecho wrote:
>     Why JSON for intermediate data format? I would much rather use something that is used by pg_dump and mysqldump (or Hive) rather then JSON. IDF is just intermediate format, we should be as effective here as possible and we should not be doing conversions from format to format.
> 
> Veena Basavaraj wrote:
>     it so happens that they use the same too, I would prefer you look at the patch on why we are using an encoding such as JSON to represent arrays and maps, imagine a case of [[a,b] [z,q] [w,e]] - will be just represented as JSONstring, so when parsing it back to actual arays or arrays object, it is no brainer and we use the JSON parse method, the code also says the CSV IDF to ignore any commans within the JSON string
> 
> Jarek Cecho wrote:
>     As long as pg_dump and Hive are using the same format I'm fine. But I don't want to introduce different format then those two without a good reason.
> 
> Veena Basavaraj wrote:
>     I dont know about hive and if there is no standard all of them use, then we cannot do much.
>     And unless we have hive / post gres connectors using this code, I would not over optimize for it. JSON serializers are pretty fast esp jackson which we shoudl replace it to many places and there is ticket for it as well.
>     I would cross the bridge when we get there, if performance is a concern for certain Data sources, changing this to any other type should be  strighforward.
> 
> Jarek Cecho wrote:
>     I believe that part of this JIRA is to go and explore how Hive and pg_dump are serializing data to provide the best way in Sqoop 2. That is what we've done when designing CSV IDF and we should keep doing that. The CSV IDF is not internal as it's exposed as a standard for other connectors and we should not change that in the future in backward incompatible way. Hence we should be careful when introducing new syntax and rules and the due diligence in investigating what is best.
> 
> Veena Basavaraj wrote:
>     I am still not sure what the concern is , how we store the IDF is and should be internal to sqoop. I will let Abe weigh in as well. The fact that is it called IDF means, sqoop is responsible to read from FROM..store it in its IDF and then when it passes data to the outputcommitter/ format ( in case of MR) it uses this and converts it back to the java objects. So If I modify this internally string to CSV to something else, it should not affect any conenctor develoepr.
>     
>     if course if I change what an Array means then it will affect
> 
> Abraham Elmahrek wrote:
>     I agree with Jarcec on this. JSON was a suggestion that I gave, but I do think that we should try to align with pgdump, mysqldump, oracle export to csv, hive, etc. If there is no standard, we should choose the most generic and frequently used one.

there are 2 different thigns going on this thread. I am not sure what you agree on.

if you agree on using a standard that is used my most, that I do too. and it happens to be JSON


- Veena


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62081
-----------------------------------------------------------


On Nov. 19, 2014, 9:08 a.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2014, 9:08 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1b5b290903a6c93dbb8427af525515e2b2 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc64ec695fbdf2f9b6acec1eb0235fd675be 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679d0b4b9e04d05eef4dd5280f201bbe58e2 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c15c6558cf21a0d82c78a51903d8fb371c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> File Attachments
> ----------------
> 
> SQOOP-1749-v2.patch
>   https://reviews.apache.org/media/uploaded/files/2014/11/19/cfdfc0ed-57c9-40d3-959b-f482f1600c03__SQOOP-1749-v2.patch
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.

On Nov. 18, 2014, 5 p.m., Veena Basavaraj wrote:
> > Jarcec
> 
> Veena Basavaraj wrote:
>     please see the latest revision., it uses JSON. It is pretty universal so I rather use JSON.
> 
> Jarek Cecho wrote:
>     Why JSON for intermediate data format? I would much rather use something that is used by pg_dump and mysqldump (or Hive) rather then JSON. IDF is just intermediate format, we should be as effective here as possible and we should not be doing conversions from format to format.

it so happens that they use the same too, I would prefer you look at the patch on why we are using an encoding such as JSON to represent arrays and maps, imagine a case of [[a,b] [z,q] [w,e]] - will be just represented as JSONstring, so when parsing it back to actual arays or arrays object, it is no brainer and we use the JSON parse method, the code also says the CSV IDF to ignore any commans within the JSON string


- Veena


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62081
-----------------------------------------------------------


On Nov. 19, 2014, 9:08 a.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2014, 9:08 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1b5b290903a6c93dbb8427af525515e2b2 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc64ec695fbdf2f9b6acec1eb0235fd675be 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679d0b4b9e04d05eef4dd5280f201bbe58e2 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c15c6558cf21a0d82c78a51903d8fb371c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> File Attachments
> ----------------
> 
> SQOOP-1749-v2.patch
>   https://reviews.apache.org/media/uploaded/files/2014/11/19/cfdfc0ed-57c9-40d3-959b-f482f1600c03__SQOOP-1749-v2.patch
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Jarek Cecho <ja...@apache.org>.

On Nov. 19, 2014, 1 a.m., Veena Basavaraj wrote:
> > Jarcec
> 
> Veena Basavaraj wrote:
>     please see the latest revision., it uses JSON. It is pretty universal so I rather use JSON.
> 
> Jarek Cecho wrote:
>     Why JSON for intermediate data format? I would much rather use something that is used by pg_dump and mysqldump (or Hive) rather then JSON. IDF is just intermediate format, we should be as effective here as possible and we should not be doing conversions from format to format.
> 
> Veena Basavaraj wrote:
>     it so happens that they use the same too, I would prefer you look at the patch on why we are using an encoding such as JSON to represent arrays and maps, imagine a case of [[a,b] [z,q] [w,e]] - will be just represented as JSONstring, so when parsing it back to actual arays or arrays object, it is no brainer and we use the JSON parse method, the code also says the CSV IDF to ignore any commans within the JSON string

As long as pg_dump and Hive are using the same format I'm fine. But I don't want to introduce different format then those two without a good reason.


- Jarek


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62081
-----------------------------------------------------------


On Nov. 19, 2014, 5:08 p.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2014, 5:08 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1b5b290903a6c93dbb8427af525515e2b2 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc64ec695fbdf2f9b6acec1eb0235fd675be 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679d0b4b9e04d05eef4dd5280f201bbe58e2 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c15c6558cf21a0d82c78a51903d8fb371c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> File Attachments
> ----------------
> 
> SQOOP-1749-v2.patch
>   https://reviews.apache.org/media/uploaded/files/2014/11/19/cfdfc0ed-57c9-40d3-959b-f482f1600c03__SQOOP-1749-v2.patch
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.

On Nov. 18, 2014, 5 p.m., Veena Basavaraj wrote:
> > Jarcec
> 
> Veena Basavaraj wrote:
>     please see the latest revision., it uses JSON. It is pretty universal so I rather use JSON.
> 
> Jarek Cecho wrote:
>     Why JSON for intermediate data format? I would much rather use something that is used by pg_dump and mysqldump (or Hive) rather then JSON. IDF is just intermediate format, we should be as effective here as possible and we should not be doing conversions from format to format.
> 
> Veena Basavaraj wrote:
>     it so happens that they use the same too, I would prefer you look at the patch on why we are using an encoding such as JSON to represent arrays and maps, imagine a case of [[a,b] [z,q] [w,e]] - will be just represented as JSONstring, so when parsing it back to actual arays or arrays object, it is no brainer and we use the JSON parse method, the code also says the CSV IDF to ignore any commans within the JSON string
> 
> Jarek Cecho wrote:
>     As long as pg_dump and Hive are using the same format I'm fine. But I don't want to introduce different format then those two without a good reason.
> 
> Veena Basavaraj wrote:
>     I dont know about hive and if there is no standard all of them use, then we cannot do much.
>     And unless we have hive / post gres connectors using this code, I would not over optimize for it. JSON serializers are pretty fast esp jackson which we shoudl replace it to many places and there is ticket for it as well.
>     I would cross the bridge when we get there, if performance is a concern for certain Data sources, changing this to any other type should be  strighforward.
> 
> Jarek Cecho wrote:
>     I believe that part of this JIRA is to go and explore how Hive and pg_dump are serializing data to provide the best way in Sqoop 2. That is what we've done when designing CSV IDF and we should keep doing that. The CSV IDF is not internal as it's exposed as a standard for other connectors and we should not change that in the future in backward incompatible way. Hence we should be careful when introducing new syntax and rules and the due diligence in investigating what is best.
> 
> Veena Basavaraj wrote:
>     I am still not sure what the concern is , how we store the IDF is and should be internal to sqoop. I will let Abe weigh in as well. The fact that is it called IDF means, sqoop is responsible to read from FROM..store it in its IDF and then when it passes data to the outputcommitter/ format ( in case of MR) it uses this and converts it back to the java objects. So If I modify this internally string to CSV to something else, it should not affect any conenctor develoepr.
>     
>     if course if I change what an Array means then it will affect
> 
> Abraham Elmahrek wrote:
>     I agree with Jarcec on this. JSON was a suggestion that I gave, but I do think that we should try to align with pgdump, mysqldump, oracle export to csv, hive, etc. If there is no standard, we should choose the most generic and frequently used one.
> 
> Veena Basavaraj wrote:
>     there are 2 different thigns going on this thread. I am not sure what you agree on.
>     
>     if you agree on using a standard that is used my most, that I do too. and it happens to be JSON
> 
> Veena Basavaraj wrote:
>     So a bit more discsusion with Abe and here is a detailed summary of why I think we should standardize on what the setText shud give and what getText will give in the IDF and that should be for every Column type we support and not jsut arrays.
>     
>     Lets all be on the same page of how we use terminologies. 
>     
>     1. IDF is a abstract class that supports serializing / de ser data in sqoop. It exposes many methods in its apis
>     relevant ones
>      - setTextData getTextData - deals with entire row as one string.
>      - setObjectData  getObjectData - deals with entire row and an object array
>      - setSchema ( with column types) - very relevant when we use the setObjectData, since we want to know what every object in that array is really, a string, a snumber, a map, a list, a enum etc...This information of the types of the columns within the object array is very much required when we ser/de-ser between 2 different types. Read below to see when this column type comes handy
>      - 
>     2. CSV happens to be one such implementation of IDF. But really what makes different impleemntations different is how they treat the object data internally. Read below to understand why I say this
>     
>     
>     Note that the getText and setText method implementations  do not vary per IDF implementation at all. it is dumb method that will put all the burden on the connector to give a correctly formatted string that makes sense for its corresponding TO side to infer what the data is. HDFS to HDFS, is one such trivial use case, where no matter what IDFimplementation I use, it does not matter, since I will write as text and retrieve as text.
>     
>     Since sqoop does not even validate what format the text is, it is upto the hdfs connector to store it any format.
>     
>     
>     The more interesting use cases are this
>     
>     Say thr FROM data source writes the data as a text and the TO data source now wants to read it as a object array to load it into its data base. Now the whole point of schema and column types becomes very important since we are dealing with 2 different types.
>     
>     When the FROM sets the text data, we expect it to be really in a format Sqoop can parse it as objects.
>     
>     Lets take a example:
>     
>     Say I have a table "users" with the field "name" "description" that are varchar. In the schema I map these varchar sto the IDF's TEXT type. 
>     all good so far. @@
>     
>     Now the data for the name is (vee,na), and description is (she is an enginner),  yes the name seems to have a comma within it, totally possible, it can have a quote or any other special character... 
>     
>     Now we are hoping that when vee,na is set, it is properly escaped for comma and then quoted. 
>     
>     This means the connector developers now have to be aware of all these rules. They cannot just dump the data.
>     
>     You may ask why is escaping required.? 
>     
>     So if they did not escape the ",", when sqoop stores this in the text it will store is as it was given ( say the conenctor developer decided to store this as comma separate, they could as well decide to do toJSONString ans give us the string, since we clearly have not not documented what we expect, but thats a different story)...so lets say they do csv
>     
>     vee,na,she is an engineer
>     
>     On the TO side, as I said before, say the data source wants all this data as object array. 
>     
>     So CSV implmentation of IDF will use the "," as a separator, 
>     
>     it will give create 3 array object since it splits it based on the ","!!!
>     
>     object[0] = vee
>     object[1]= na
>     object[2] =she,
>     
>     So this is clearly not what we wanted. The above was very simple use case with strings that fails. Show we just bail out and say the connector did not encode it well, or rather be more proactive and have a standardized format that we will use to endoe every element within the CSV ?
>     
>     Now imagine the use cases with numbers, floating points, maps, arrays and nested arrays. Comma will appear if proper encoding is not used. 
>     
>     
>     So if we standardize and say all the text data shud be encoded as JSON and any text data you will get will be in JSON, then ists pretty simple rule to follow, there is no need of custom regex and escaping, and unescaping and unecoding rules in either connectors not sqoop when we trasnfer data between text and object formats and viceversa
>     
>     
>     After writing this, I feel the name Internal Data Format is compeltely misleading. 
>     
>     For the use cases I talked about where, FROM does setObjectData(Object[]) and to does a getText() the internal format is totally exposed, so it is no longer internal anymore.
>     
>     
>     But if we say the 2 supported uses cases are setText and getText , and setObject and getObject, then the internal representation is truly internal. 
>     
>     But the former is more of a reality than the latter, unless we mandate using object[] array for both extract/load phase
> 
> Jarek Cecho wrote:
>     Thank you for the example Veena. This scenario actually should not happen as the returned "text" format is documented on our wiki [1]. All connectors are required to use this format for text representation and that is why it's so important to define it clearly. E.g. the text format is not internal to Sqoop, it's exposed to the connector developpers. 
>     
>     The reason for that is that most of high performant data transfer mechanisms (mysqldump, pg_dump, netezza external tables) will get us the data in text and we want to take them and with the smallest CPU burder possible get them through Sqoop internals to target destionation. Thee most peformant example would be database to text on HDFS and in this case we don't want to do any conversions. E.g. we don't want to read and construct JSON objects at all. That is why we've defined the IDF text format on the wiki by following what is already there and I would like to stick to it.
>     
>     Links:
>     1: https://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Intermediate+representation#Sqoop2Intermediaterepresentation-Intermediateformatrepresentationproposal

yes, my bad that thought it is internal and it truly not the case.

Second, it was not clear to me that we will trust all data sources to adhere to this. Initially sqoop was oriented towards relation dbs and hence it would ok to assume that this standard encoding of data will happen, but will now we can support any form of data source, even a simple ftp or web scraping docs can be the FROM and it cane be persisted in any data source. 

In such case assuming that all of the connectors will encode it in the expected way is a bit too far fetched. So I wondered if atleast one of  2 things are in place.
1. we have utility methods we provide to encode data in the format we expect (OR)
2. we validate 
Looks like we dont do either and it was hard for me to believe that we are going to trust all conenctors. The concern is we will have to debug a ton of custormer issues later just related to malformed data that might be so hard to track down without proper validation/ error messages. Atleast we should have a mode where debug happens for troubleshooting, else it will be quite a nightmare to look at every row of data manually. I totally see your point on why we dont want validation by default. 

Next, the doc is good but not complete. It does say the rules for strings and byte array, but the rules for Set/ Array/ Map are missing, as part of 1350 if it required to do the analysis, it would have been easier if this was made explisit in SQOOP-1350 ticket, 

SET
String with comma separated enumerated values ( what about nested arrays and what is the delimiter, are we enclosing them in quotes? )


It is hard for one to assume what the design goals are when it is not clearly stated in the wiki backed up by some benchmarking

I am going to be very sure that all data sources we will end up havingi connectors for may not follow these data base rules. Now it might be too early to tell what those connectors will be.


Long story short, I will add a ticket to 1350 sub task to write up what pg_dump hive use to represent complex types such as array/sets and maps.


- Veena


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62081
-----------------------------------------------------------


On Nov. 19, 2014, 9:08 a.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2014, 9:08 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1b5b290903a6c93dbb8427af525515e2b2 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc64ec695fbdf2f9b6acec1eb0235fd675be 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679d0b4b9e04d05eef4dd5280f201bbe58e2 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c15c6558cf21a0d82c78a51903d8fb371c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> File Attachments
> ----------------
> 
> SQOOP-1749-v2.patch
>   https://reviews.apache.org/media/uploaded/files/2014/11/19/cfdfc0ed-57c9-40d3-959b-f482f1600c03__SQOOP-1749-v2.patch
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Jarek Cecho <ja...@apache.org>.

> On Nov. 19, 2014, 1 a.m., Jarek Cecho wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java, lines 21-62
> > <https://reviews.apache.org/r/28139/diff/1/?file=766590#file766590line21>
> >
> >     Nit: Seems as unncecessary change in ordering.
> 
> Veena Basavaraj wrote:
>     I used the IDE ordering, I am not sure if there a standard for this.

You're correct that we don't have standard for import ordering, which is not a big deal in my mind. We however don't like unnecessary changes such as changing import order just for the sake of changing the order.


> On Nov. 19, 2014, 1 a.m., Jarek Cecho wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java, lines 78-79
> > <https://reviews.apache.org/r/28139/diff/1/?file=766590#file766590line78>
> >
> >     Can you plase document how is array expected to be coded up in the IDF? It seems that we don't have this in the code yet, so I would simply update the wiki:
> >     
> >     https://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Intermediate+representation#Sqoop2Intermediaterepresentation-Intermediateformatrepresentationproposal
> >     
> >     (We can move the IDF proposal from wiki to code in subsequent JIRA)
> 
> Veena Basavaraj wrote:
>     I have a ticket for that. I would rather keep this in one place
>     https://issues.apache.org/jira/browse/SQOOP-1706

Ack.


> On Nov. 19, 2014, 1 a.m., Jarek Cecho wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java, line 99
> > <https://reviews.apache.org/r/28139/diff/1/?file=766590#file766590line99>
> >
> >     I'm wondering if this one really needs to be public?
> 
> Veena Basavaraj wrote:
>     it is default access so we can use it in unit tests, there is no need to be public yes

I see.


> On Nov. 19, 2014, 1 a.m., Jarek Cecho wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java, line 102
> > <https://reviews.apache.org/r/28139/diff/1/?file=766590#file766590line102>
> >
> >     Based on the intermediate format proposal the correct Time format should be: HH:MM:DD[.ZZZZZZ][+/-XX]
> >     
> >     https://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Intermediate+representation#Sqoop2Intermediaterepresentation-Intermediateformatrepresentationproposal
> >     
> >     Can we add the missing parts?
> 
> Veena Basavaraj wrote:
>     that is support this as Time?
>     
>     so what are some of the examples you would like to be tested.?
>     
>     would be easy for me to add those to test cases
> 
> Veena Basavaraj wrote:
>     I am going to fix this is a follow up ticket, might be worth discussiin timestamp as well.

I'm fine with finishing the work in subsequent ticket.


> On Nov. 19, 2014, 1 a.m., Jarek Cecho wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java, line 321
> > <https://reviews.apache.org/r/28139/diff/1/?file=766590#file766590line321>
> >
> >     Why space as a separator for nested data types when we are already using comma as a separator?
> 
> Veena Basavaraj wrote:
>     this is for the elements inside the array. We use comma between the 2 columns.
>     
>     
>     Another alternative would be encode this as JSON string, which I might try as Abe suggested below.
>     
>     ['A' 'B'], 'text'

I don't see a reason why we need to have two different separators depending whether we are in nested type or not, why not simply have: ["A","B"],"C" for type "Array(String),String"? I would assume that this will become much easier for nested types.

It seems that PostgreSQL is the only database that supports Arrays from the dbs that I've explored back in SQOOP-515, so I'm wondering how does pg_dump serializes the array? I would suggest to follow that so that we can limit the number of transformations when reading from pg_dump in the future. Another project worth exploring would be Hive to see how arrays are serialized.

The goal here should be to enable the direct tools (mysqldump, pg_dump) to be equivalent with the CSV IDF with as few changes as possible to avoid unnecessary CPU cycles when using those utilities.


- Jarek


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62081
-----------------------------------------------------------


On Nov. 19, 2014, 6:58 a.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2014, 6:58 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1b5b290903a6c93dbb8427af525515e2b2 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc64ec695fbdf2f9b6acec1eb0235fd675be 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679d0b4b9e04d05eef4dd5280f201bbe58e2 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c15c6558cf21a0d82c78a51903d8fb371c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> File Attachments
> ----------------
> 
> SQOOP-1749-v2.patch
>   https://reviews.apache.org/media/uploaded/files/2014/11/19/cfdfc0ed-57c9-40d3-959b-f482f1600c03__SQOOP-1749-v2.patch
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Jarek Cecho <ja...@apache.org>.

On Nov. 19, 2014, 1 a.m., Veena Basavaraj wrote:
> > Jarcec
> 
> Veena Basavaraj wrote:
>     please see the latest revision., it uses JSON. It is pretty universal so I rather use JSON.

Why JSON for intermediate data format? I would much rather use something that is used by pg_dump and mysqldump (or Hive) rather then JSON. IDF is just intermediate format, we should be as effective here as possible and we should not be doing conversions from format to format.


- Jarek


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62081
-----------------------------------------------------------


On Nov. 19, 2014, 5:08 p.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2014, 5:08 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1b5b290903a6c93dbb8427af525515e2b2 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc64ec695fbdf2f9b6acec1eb0235fd675be 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679d0b4b9e04d05eef4dd5280f201bbe58e2 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c15c6558cf21a0d82c78a51903d8fb371c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> File Attachments
> ----------------
> 
> SQOOP-1749-v2.patch
>   https://reviews.apache.org/media/uploaded/files/2014/11/19/cfdfc0ed-57c9-40d3-959b-f482f1600c03__SQOOP-1749-v2.patch
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.

> On Nov. 18, 2014, 5 p.m., Jarek Cecho wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java, lines 21-62
> > <https://reviews.apache.org/r/28139/diff/1/?file=766590#file766590line21>
> >
> >     Nit: Seems as unncecessary change in ordering.
> 
> Veena Basavaraj wrote:
>     I used the IDE ordering, I am not sure if there a standard for this.
> 
> Jarek Cecho wrote:
>     You're correct that we don't have standard for import ordering, which is not a big deal in my mind. We however don't like unnecessary changes such as changing import order just for the sake of changing the order.

No one is doing it for the sake for it, when we use the IDE formatting shortcut it does it.


On Nov. 18, 2014, 5 p.m., Veena Basavaraj wrote:
> > Jarcec

please see the latest revision., it uses JSON. It is pretty universal so I rather use JSON.


- Veena


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62081
-----------------------------------------------------------


On Nov. 18, 2014, 10:58 p.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 18, 2014, 10:58 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1b5b290903a6c93dbb8427af525515e2b2 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc64ec695fbdf2f9b6acec1eb0235fd675be 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679d0b4b9e04d05eef4dd5280f201bbe58e2 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c15c6558cf21a0d82c78a51903d8fb371c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> File Attachments
> ----------------
> 
> SQOOP-1749-v2.patch
>   https://reviews.apache.org/media/uploaded/files/2014/11/19/cfdfc0ed-57c9-40d3-959b-f482f1600c03__SQOOP-1749-v2.patch
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62081
-----------------------------------------------------------


As usual, I do see several trailing whitespace characters.


common/src/main/java/org/apache/sqoop/schema/type/Array.java
<https://reviews.apache.org/r/28139/#comment104014>

    Nit: Not sure that this change is necessary?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/28139/#comment104015>

    Nit: Seems as unncecessary change in ordering.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/28139/#comment104019>

    Can you plase document how is array expected to be coded up in the IDF? It seems that we don't have this in the code yet, so I would simply update the wiki:
    
    https://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Intermediate+representation#Sqoop2Intermediaterepresentation-Intermediateformatrepresentationproposal
    
    (We can move the IDF proposal from wiki to code in subsequent JIRA)



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/28139/#comment104018>

    I'm wondering if this one really needs to be public?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/28139/#comment104017>

    Based on the intermediate format proposal the correct Time format should be: HH:MM:DD[.ZZZZZZ][+/-XX]
    
    https://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Intermediate+representation#Sqoop2Intermediaterepresentation-Intermediateformatrepresentationproposal
    
    Can we add the missing parts?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/28139/#comment104021>

    Nit: Whitespace



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/28139/#comment104025>

    Why space as a separator for nested data types when we are already using comma as a separator?


Jarcec

- Jarek Cecho


On Nov. 18, 2014, 7:52 p.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 18, 2014, 7:52 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/sqoop/schema/type/AbstractComplexListType.java 5d8f28b 
>   common/src/main/java/org/apache/sqoop/schema/type/Array.java af13eb7 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Abraham Elmahrek <ab...@cloudera.com>.

> On Nov. 18, 2014, 9:30 p.m., Abraham Elmahrek wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java, lines 183-189
> > <https://reviews.apache.org/r/28139/diff/1/?file=766590#file766590line183>
> >
> >     Should vary based on 'escaped'.
> 
> Jarek Cecho wrote:
>     I would advise not to. IDF is internal representation and I think that we should be as strict as possible without a lot of conditional formating. The "escaped" is more a matter of how it should look like on the output (or input).

Perhaps I don't fully understand this section of code or the argument. If the IDF receives a CSV record, it some times has to parse it (hence the CSV parser). The normal producer of CSV records are extractors. Isn't it entirely possible that this particular of this section of code will be parsing data provided by the extractors? In that case, not honoring quotes would be a problem.


- Abraham


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62009
-----------------------------------------------------------


On Nov. 19, 2014, 5:08 p.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2014, 5:08 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1b5b290903a6c93dbb8427af525515e2b2 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc64ec695fbdf2f9b6acec1eb0235fd675be 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679d0b4b9e04d05eef4dd5280f201bbe58e2 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c15c6558cf21a0d82c78a51903d8fb371c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> File Attachments
> ----------------
> 
> SQOOP-1749-v2.patch
>   https://reviews.apache.org/media/uploaded/files/2014/11/19/cfdfc0ed-57c9-40d3-959b-f482f1600c03__SQOOP-1749-v2.patch
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Jarek Cecho <ja...@apache.org>.

> On Nov. 18, 2014, 9:30 p.m., Abraham Elmahrek wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java, lines 183-189
> > <https://reviews.apache.org/r/28139/diff/1/?file=766590#file766590line183>
> >
> >     Should vary based on 'escaped'.

I would advise not to. IDF is internal representation and I think that we should be as strict as possible without a lot of conditional formating. The "escaped" is more a matter of how it should look like on the output (or input).


- Jarek


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62009
-----------------------------------------------------------


On Nov. 18, 2014, 7:52 p.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 18, 2014, 7:52 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/sqoop/schema/type/AbstractComplexListType.java 5d8f28b 
>   common/src/main/java/org/apache/sqoop/schema/type/Array.java af13eb7 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.

> On Nov. 18, 2014, 1:30 p.m., Abraham Elmahrek wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java, lines 183-189
> > <https://reviews.apache.org/r/28139/diff/1/?file=766590#file766590line183>
> >
> >     Should vary based on 'escaped'.
> 
> Jarek Cecho wrote:
>     I would advise not to. IDF is internal representation and I think that we should be as strict as possible without a lot of conditional formating. The "escaped" is more a matter of how it should look like on the output (or input).
> 
> Abraham Elmahrek wrote:
>     Perhaps I don't fully understand this section of code or the argument. If the IDF receives a CSV record, it some times has to parse it (hence the CSV parser). The normal producer of CSV records are extractors. Isn't it entirely possible that this particular of this section of code will be parsing data provided by the extractors? In that case, not honoring quotes would be a problem.

this is how I see it, 

if you see an example such as GenericJDBC and how it uses the IDF it is clear.

IDF has a schema and schema define column types, for a connector only this matters. 

How we represent the data representing each row is an internal impelentation. Today we have one of it that is storing a row as a comma separated list, and the schema maintains what each comman separated value is.

so  by csv records if you mean the objects array, yes connectors create it and call setObjectData. set obejct data encodes these obejcts when it is converting to a string.

so v''eena might be encoded to make sure we escape the quotes within.


the above  piece of code that we are commenting about is used to convert the string back to the objects. 

Now if we put all complex types inside the { } using the toJSONString then we should neither encode not decode what is inside it.


- Veena


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62009
-----------------------------------------------------------


On Nov. 19, 2014, 9:08 a.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2014, 9:08 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1b5b290903a6c93dbb8427af525515e2b2 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc64ec695fbdf2f9b6acec1eb0235fd675be 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679d0b4b9e04d05eef4dd5280f201bbe58e2 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c15c6558cf21a0d82c78a51903d8fb371c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> File Attachments
> ----------------
> 
> SQOOP-1749-v2.patch
>   https://reviews.apache.org/media/uploaded/files/2014/11/19/cfdfc0ed-57c9-40d3-959b-f482f1600c03__SQOOP-1749-v2.patch
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.

> On Nov. 18, 2014, 1:30 p.m., Abraham Elmahrek wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java, line 442
> > <https://reviews.apache.org/r/28139/diff/1/?file=766590#file766590line442>
> >
> >     Space limited? Why not simple JSON?

let me think about it more, good point encoding as JSON, I wish we just encoded all of it in JSOn instead of CSV.


- Veena


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62009
-----------------------------------------------------------


On Nov. 18, 2014, 11:52 a.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 18, 2014, 11:52 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/sqoop/schema/type/AbstractComplexListType.java 5d8f28b 
>   common/src/main/java/org/apache/sqoop/schema/type/Array.java af13eb7 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Abraham Elmahrek <ab...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62009
-----------------------------------------------------------


Sorry about the "ship it" before. it was a mouse slip.

Normal WS clean up as well.


connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/28139/#comment103932>

    Should vary based on 'escaped'.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/28139/#comment103933>

    Cool.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/28139/#comment103936>

    Space limited? Why not simple JSON?


- Abraham Elmahrek


On Nov. 18, 2014, 7:52 p.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 18, 2014, 7:52 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/sqoop/schema/type/AbstractComplexListType.java 5d8f28b 
>   common/src/main/java/org/apache/sqoop/schema/type/Array.java af13eb7 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.

> On Nov. 18, 2014, 7:02 p.m., Abraham Elmahrek wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java, line 168
> > <https://reviews.apache.org/r/28139/diff/1/?file=766590#file766590line168>
> >
> >     Not a big deal, but I prefer "getFields". It makes more sense in the context of CSV parsing.
> 
> Veena Basavaraj wrote:
>     fields is generic, All of the CSV code is between strings to objects ( getObjectData ) and vice versa ( setObjectData ). So making it explicit so it is clear that every field is internally kept as a string.

well if you prefer to keep the fields since you added it, I can plug that in as well along with String. !


- Veena


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62115
-----------------------------------------------------------


On Nov. 18, 2014, 11:52 a.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 18, 2014, 11:52 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/sqoop/schema/type/AbstractComplexListType.java 5d8f28b 
>   common/src/main/java/org/apache/sqoop/schema/type/Array.java af13eb7 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.

> On Nov. 18, 2014, 7:02 p.m., Abraham Elmahrek wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java, line 168
> > <https://reviews.apache.org/r/28139/diff/1/?file=766590#file766590line168>
> >
> >     Not a big deal, but I prefer "getFields". It makes more sense in the context of CSV parsing.

fields is generic, All of the CSV code is between strings to objects ( getObjectData ) and vice versa ( setObjectData ). So making it explicit so it is clear that every field is internally kept as a string.


- Veena


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62115
-----------------------------------------------------------


On Nov. 18, 2014, 11:52 a.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 18, 2014, 11:52 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/sqoop/schema/type/AbstractComplexListType.java 5d8f28b 
>   common/src/main/java/org/apache/sqoop/schema/type/Array.java af13eb7 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Abraham Elmahrek <ab...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62115
-----------------------------------------------------------


Just another nit pick


connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/28139/#comment104065>

    Not a big deal, but I prefer "getFields". It makes more sense in the context of CSV parsing.


- Abraham Elmahrek


On Nov. 18, 2014, 7:52 p.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 18, 2014, 7:52 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/sqoop/schema/type/AbstractComplexListType.java 5d8f28b 
>   common/src/main/java/org/apache/sqoop/schema/type/Array.java af13eb7 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Abraham Elmahrek <ab...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62008
-----------------------------------------------------------

Ship it!


Ship It!

- Abraham Elmahrek


On Nov. 18, 2014, 7:52 p.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 18, 2014, 7:52 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/sqoop/schema/type/AbstractComplexListType.java 5d8f28b 
>   common/src/main/java/org/apache/sqoop/schema/type/Array.java af13eb7 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/
-----------------------------------------------------------

(Updated Nov. 18, 2014, 11:52 a.m.)


Review request for Sqoop.


Bugs: SQOOP-1749
    https://issues.apache.org/jira/browse/SQOOP-1749


Repository: sqoop-sqoop2


Description (updated)
-------

see JIRA

Supporting Nested arrays is going to be dody logic

some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.

Welcome feedback if I can improve this more.

I am sure mores tests can be added.


NOTE: WS will be addressed once the patch is reviewed for functionality and this needs a rebase since I have other patches pending review that overlap.


Diffs
-----

  common/src/main/java/org/apache/sqoop/schema/type/AbstractComplexListType.java 5d8f28b 
  common/src/main/java/org/apache/sqoop/schema/type/Array.java af13eb7 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
  connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 

Diff: https://reviews.apache.org/r/28139/diff/


Testing
-------

yes unit tests added. 


Thanks,

Veena Basavaraj


Re: Review Request 28139: Support Time and List Type in CSV IDF

Posted by Veena Basavaraj <vb...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/
-----------------------------------------------------------

(Updated Nov. 17, 2014, 4 p.m.)


Review request for Sqoop.


Bugs: SQOOP-1749
    https://issues.apache.org/jira/browse/SQOOP-1749


Repository: sqoop-sqoop2


Description (updated)
-------

see JIRA

Supporting Nested arrays is going to be dody logic

some of the assumptions here in parsing need more docs. Esp use of special characters to parse nested arrays.

Welcome feedback if I can improve this more.

I am sure mores tests can be added.


Diffs
-----

  common/src/main/java/org/apache/sqoop/schema/type/AbstractComplexListType.java 5d8f28b 
  common/src/main/java/org/apache/sqoop/schema/type/Array.java af13eb7 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java 39a01c1 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java 5ef6fc6 
  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java 4d41679 
  connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java fcf6c3c 

Diff: https://reviews.apache.org/r/28139/diff/


Testing
-------

yes unit tests added. 


Thanks,

Veena Basavaraj