You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Vincenz Priesnitz (JIRA)" <ji...@apache.org> on 2013/04/16 11:21:16 UTC

[jira] [Updated] (AVRO-867) Allow tools to read files via hadoop FileSystem class

     [ https://issues.apache.org/jira/browse/AVRO-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vincenz Priesnitz updated AVRO-867:
-----------------------------------

    Affects Version/s: 1.7.5
         Release Note: avro-tools can now access Hadoop supported filesystem when started via hadoop jar.
               Status: Patch Available  (was: Open)

Attached you find a patch that changes the Utils class to use the hadoop FileSystem class. It is now possible to use any supported filesystem for input or output files in more tools. 

Without any configurations, the tools behave as before:
{noformat}
# reads from local file system by default
# supports relative paths
java -jar avro-tools-1.7.5.jar tojson ~/myDir/myData.avro
{noformat}

If invoked via hadoop jar, the tools support more filesystems. Different filesystems can be used in a single call. Furthermore, any default filesystem that might be specified in core-site.xml is respected.
{noformat}
# combines an ftp file and a local file and writes result file combinedData.avro directly on the default hdfs server.
hadoop jar avro-tools-1.7.5.jar concat ftp://myFtpServer/data1.avro file:///home/user/data2.avro combinedData.avro
{noformat}

Now it is possible to take a look at remote files quicker, e.g.:
{noformat}
hadoop jar avro-Tools getschema Data_on_hdfs.avro
hadoop jar avro-Tools tojson ftp://server-address/Data_on_ftp.avro 
{noformat}

The following tools now use Utils for accessing files: concat, fragtojson, fromjson, fromtext, getmeta, getschema, jsontofrag, recodec, tojson, totext.
                
> Allow tools to read files via hadoop FileSystem class
> -----------------------------------------------------
>
>                 Key: AVRO-867
>                 URL: https://issues.apache.org/jira/browse/AVRO-867
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.7.5
>            Reporter: Joe Crobak
>            Assignee: Joe Crobak
>
> It would be great if I could use the various tools to read/parse files that are in HDFS, S3, etc via the [FileSystem|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html] api. We could retain backwards compatibility by assuming that unqualified urls are "file://" but allow reading of files from fully qualified urls such as hdfs://. The required apis are already part of the avro-tools uber jar to support the TetherTool.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira