You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Qian Xu (JIRA)" <ji...@apache.org> on 2014/09/01 11:06:20 UTC

[jira] [Comment Edited] (SQOOP-1395) Use random generated class name for SqoopRecord

    [ https://issues.apache.org/jira/browse/SQOOP-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117221#comment-14117221 ] 

Qian Xu edited comment on SQOOP-1395 at 9/1/14 9:06 AM:
--------------------------------------------------------

[~jarcec] There are actually two places that will use reflection to do class lookup.

When a Kite's Dataset is being created, an Avro schema should be provided. In the schema, the type is actually the table name. Kite will try to verify schema. The writeSchema is the Avro schema. But the readerSchema will be the descent SqoopRecord entity class. 

{code}
  DataModelUtil.java

  public static <E> DatumReader<E> getDatumReaderForType(Class<E> type, Schema writerSchema) {
    Schema readerSchema = getReaderSchema(type, writerSchema);
    GenericData dataModel = getDataModelForType(type);
{code}

When export Parquet files back to RDBMS, {{AvroIndexedRecordConverter}} will instantiate a class regarding the avroSchema. If the record type hits our entity class name, we will be unlucky.
{code}
  AvroIndexedRecordConverter

  public AvroIndexedRecordConverter(ParentValueContainer parent, GroupType
      parquetSchema, Schema avroSchema) {
    this.specificClass = SpecificData.get().getClass(avroSchema);
    // ...
  }

  public void start() {
    // Should do the right thing whether it is generic or specific
    this.currentRecord = (T) ((this.specificClass == null) ?
            new GenericData.Record(avroSchema) :
            SpecificData.newInstance(specificClass, avroSchema));
  }
{code}


was (Author: stanleyxu2005):
[~jarcec] There are actually two places that will use reflection to do class lookup.

When a Kite's Dataset is being created, an Avro schema should be provided. In the schema, the type is actually the table name. Kite will try to verify schema. The writeSchema is the Avro schema. But the readerSchema will be the descent SqoopRecord entity class. 

{{
  DataModelUtil.java

  public static <E> DatumReader<E> getDatumReaderForType(Class<E> type, Schema writerSchema) {
    Schema readerSchema = getReaderSchema(type, writerSchema);
    GenericData dataModel = getDataModelForType(type);
}}

When export Parquet files back to RDBMS, {{AvroIndexedRecordConverter}} will instantiate a class regarding the avroSchema. If the record type hits our entity class name, we will be unlucky.
{{
  AvroIndexedRecordConverter

  public AvroIndexedRecordConverter(ParentValueContainer parent, GroupType
      parquetSchema, Schema avroSchema) {
    this.specificClass = SpecificData.get().getClass(avroSchema);
    // ...
  }

  public void start() {
    // Should do the right thing whether it is generic or specific
    this.currentRecord = (T) ((this.specificClass == null) ?
            new GenericData.Record(avroSchema) :
            SpecificData.newInstance(specificClass, avroSchema));
  }
}}

> Use random generated class name for SqoopRecord
> -----------------------------------------------
>
>                 Key: SQOOP-1395
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1395
>             Project: Sqoop
>          Issue Type: Sub-task
>          Components: tools
>            Reporter: Qian Xu
>            Assignee: Qian Xu
>            Priority: Minor
>
> Sqoop will generate an entity class to hold values of every database record for mapreduce. The class is inherited from the abstract class SqoopRecord. The name of the class is by default the table name. 
> When export records as Parquet files, the internal logic will attempt to instantiate another entity class or create it on demand. Unfortunately, the target class has the same name of the one Sqoop generated. 
> The JIRA propose to use random class name to avoid the potential problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)