You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Hrishikesh P <hr...@gmail.com> on 2014/06/27 21:17:04 UTC

Re: Schema import dependencies

I seem to be running into issues with schema imports a lot..anyway, for the
time being I am using schemas instead of IDL. The issue that I am running
into currently has to do with referencing schema files contained in a
sources jar. I have a schema that I am trying to compile but it's failing.
In the schema, I am trying to refer to another schema that is contained in
a sources jar.

Here's my configuration for the avro-maven-plugin:

              <plugin>
                <groupId>org.apache.avro</groupId>
                <artifactId>avro-maven-plugin</artifactId>
                <version>1.7.6</version>
                <dependencies>
                    <dependency>
                        <groupId>test.utils</groupId>
                        <artifactId>test-utils-bigdata</artifactId>
                        <version>1.0.0-SNAPSHOT</version>
                        <type>jar</type>
                        <classifier>sources</classifier>
                    </dependency>
                </dependencies>
                <executions>
                    <execution>
                        <phase>generate-sources</phase>
                        <goals>
                            <goal>schema</goal>
                        </goals>
                        <configuration>

<sourceDirectory>${project.basedir}/src/main/resources/avro</sourceDirectory>

<outputDirectory>${project.basedir}/target/generated-sources</outputDirectory>
                        </configuration>
                    </execution>
                </executions>
                <configuration>
                    <fieldVisibility>private</fieldVisibility>
                    <imports>
                        <import>avro/data.avsc</import>
                    </imports>
                    <stringType>String</stringType>
                </configuration>
            </plugin>


and here's the avro schema that I am trying to compile (in a different
project):


# entity.avsc

{
"namespace":"utils.datamodel",
"type":"record",
"name":"Entity",
"imports": [ "avro/data.avsc" ],
"fields":[
            { "name": "ID", "type": "string" },
            { "name": "dataType", "type": [ "null", "test.utils.model.Data"
], "default": null }
  ]
}


However the <imports> tag gives me a class cast exception when I try to
generate the source.


java.lang.ClassCastException: SetBase.addExcludes(string) parameter must be
instanceof java.lang.String
at
org.apache.maven.shared.model.fileset.SetBase.addExclude(SetBase.java:123)
at
org.apache.avro.mojo.AbstractAvroMojo.getIncludedFiles(AbstractAvroMojo.java:201)
at org.apache.avro.mojo.AbstractAvroMojo.execute(AbstractAvroMojo.java:163)
at
org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:490)
at
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:694)
at
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalWithLifecycle(DefaultLifecycleExecutor.java:556)
at
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:535)
at
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:387)
at
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:348)
at
org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:180)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:328)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:138)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:362)
at org.apache.maven.cli.compat.CompatibleMain.main(CompatibleMain.java:60)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315)
at org.codehaus.classworlds.Launcher.launch(Launcher.java:255)
at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430)
at org.codehaus.classworlds.Launcher.main(Launcher.java:375)


If I comment the <imports> section, I get a
org.apache.avro.SchemaParseException: Undefined name:
"test.utils.model.Data" as expected. Has anyone run into this problem
before and/or know if I am missing anything? Is it possible to reference
external schemas in this way?


Thanks in advance.


On Wed, May 28, 2014 at 5:54 PM, Doug Cutting <cu...@apache.org> wrote:

> IDL is a language-independent way let you merge two schema files into one
> standalone schema file.
>
> Doug
>
>
> On Wed, May 28, 2014 at 3:40 PM, Wai Yip Tung <wy...@tungwaiyip.info> wrote:
>
>> Let's say we are interested to keep 2 schema file because they come from
>> 2 separate organization. When we generate a data file they need to be
>> merged into one standalone schema. The maven plugin does this. Otherwise we
>> have to merge it ourselves. This is not too hard to merge. I just want make
>> sure I'm not missing some exiting tool or API available.
>>
>> Wai Yip
>>
>>   Doug Cutting <cu...@apache.org>
>>  Wednesday, May 28, 2014 12:09 PM
>> Your userInfo.avsc is not a standalone schema since it depends on
>> mailing_address already being defined.  A schema included in a data file is
>> always standalone, and would include the mailing_address schema definition
>> within the userInfo schema's "address" field.
>>
>> Some tools will process such non-standalone schemas in separate files.
>>  For example, the Java schema compiler will accept multiple schema files on
>> the command line, and those later on the command line may reference types
>> defined earlier.  Java's maven tasks also permit references to other files,
>> but these are probably not of interest to a Python developer.
>>
>> The IDL tool uses the JVM as its runtime but is not Java-specific.
>>
>> Doug
>>
>>
>>
>>   Wai Yip Tung <wy...@tungwaiyip.info>
>>  Wednesday, May 28, 2014 11:53 AM
>>  I want to extend this question somewhat. I begin to realized avro has
>> accommodation to compose schema from user defined type. I want to check if
>> I understand it correctly and also the proper way to use it.
>>
>> I take a single, two level nested schema from the web (see using an
>> embedded record").
>>
>> http://docs.oracle.com/cd/E26161_02/html/GettingStartedGuide/avroschemas.html
>>
>> I break it down to two separate records. The main `userInfo` record and
>> the embedded `mailing_address` record as two separate JSON object.
>>
>>
>> ------------------------------------------------------------------------
>> userInfo.avsc
>>
>> {
>> "type" : "record",
>> "name" : "userInfo",
>> "namespace" : "my.example",
>> "fields" : [{"name" : "username",
>>              "type" : "string",
>>              "default" : "NONE"},
>>
>>             {"name" : "age",
>>              "type" : "int",
>>              "default" : -1},
>>
>>              {"name" : "phone",
>>               "type" : "string",
>>               "default" : "NONE"},
>>
>>              {"name" : "housenum",
>>               "type" : "string",
>>               "default" : "NONE"},
>>
>>              {"name" : "address",
>>               "type" : "mailing_address",   <--- user defined type
>>               "default" : "NONE"},
>> ]
>> }
>>
>> ------------------------------------------------------------------------
>> mailing_address.avsc
>>
>> {
>>  "type" : "record",
>>  "name" : "mailing_address",                 <--- defined here
>>  "fields" : [
>>     {"name" : "street",
>>      "type" : "string",
>>      "default" : "NONE"},
>>
>>     {"name" : "city",
>>      "type" : "string",
>>      "default" : "NONE"},
>>
>>     {"name" : "state_prov",
>>      "type" : "string",
>>      "default" : "NONE"},
>>
>>     {"name" : "country",
>>      "type" : "string",
>>      "default" : "NONE"},
>>
>>     {"name" : "zip",
>>      "type" : "string",
>>      "default" : "NONE"}
>>     ]}
>> }
>> ------------------------------------------------------------------------
>>
>> Is this a valid composite avro schema definition?
>>
>> The second question is how can we actually use this in practice. If we
>> have two separate file, is there a standard API that load them both.
>> Hrishikesh P mentions avro maven plugin. I mainly use the Python API so I
>> am unfamiliar with this. Is a comparable API exist?
>>
>> I understand the IDL form has explicit linking of schema files. I will
>> look into it next.
>>
>> Wai Yip
>>
>>
>>   Doug Cutting <cu...@apache.org>
>>  Thursday, May 22, 2014 2:57 PM
>> You might instead use Avro IDL to define your schemas. It permits you
>> define multiple schemas in a single file, so that you can determine
>> the order they're defined in. It also permits ordered inclusion of
>> types from other files, both IDL files and schema files.
>>
>> Doug
>>
>> On Thu, May 22, 2014 at 10:46 AM, Hrishikesh P
>>
>>   Hrishikesh P <hr...@gmail.com>
>>  Thursday, May 22, 2014 10:46 AM
>> I have a few avro schemas that I am generating the code from using the
>> avro maven plugin. I have dependencies in the schemas which I was able to
>> resolve by putting the schemas in separate folders and/or renaming the
>> schema file names with 01-, 02-, ...etc so that the dependencies get
>> compiled first. However, this only works on mac but not on RHEL (probably
>> because of the different ways the directories are read on them?). Anybody
>> knows the best way to handle schema dependencies? If I specify individual
>> schema names in the POM in the imports section, the schemas get compiled
>> but I have listed the folders and I would like to avoid listing individual
>> files if possible.
>>
>> Here's a related issue: https://issues.apache.org/jira/browse/AVRO-1367
>>
>> Thanks in advance.
>>
>>
>