You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samoa.apache.org by "jayadeepj (JIRA)" <ji...@apache.org> on 2015/10/29 12:33:27 UTC

[jira] [Commented] (SAMOA-47) Integrate Avro Streams with SAMOA

    [ https://issues.apache.org/jira/browse/SAMOA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14980297#comment-14980297 ] 

jayadeepj commented on SAMOA-47:
--------------------------------

The updated Input Format document for handling Apache Avro  for SAMOA is present @
https://drive.google.com/file/d/0B844rHJZHzKMdk5oMHZWREdxMnM/view?usp=sharing 

Avro allows two encodings for the data: Binary & JSON. Test data built according to the document is below.

The JSON encoded AVRO File for the Forest CoverType dataset is @
https://drive.google.com/file/d/0B844rHJZHzKMSlRRaVA0TU0zRjQ/view?usp=sharing 

The BINARY encoded AVRO File for the Forest CoverType dataset is @
https://drive.google.com/file/d/0B844rHJZHzKMSFVwVVRPVjhCOTA/view?usp=sharing 

Original for the Forest CoverType dataset ARFF file from which Avro file was built is @ 
http://downloads.sourceforge.net/project/moa-datastream/Datasets/Classification/covtypeNorm.arff.zip 

> Integrate Avro Streams with SAMOA
> ---------------------------------
>
>                 Key: SAMOA-47
>                 URL: https://issues.apache.org/jira/browse/SAMOA-47
>             Project: SAMOA
>          Issue Type: New Feature
>          Components: SAMOA-API, SAMOA-Instances
>            Reporter: jayadeepj
>            Priority: Minor
>              Labels: patch
>
> The current SAMOA readers can only support data streams in ARFF format. Hence SAMOA as a distributed streaming machine learning framework is limited in scope since end users may have to transform their data to ARFF . Apache Avro is a data serialization system that handles data streams in compact binary format and is typically used in conjunction with with Big Data eco-system tools. Avro allows two encodings for the data: Binary & JSON. Hence an Avro support may allow users with JSON data also to use SAMOA seamlessly.
> The GOAL is to build support for Avro Streams into SAMOA by adding Avro File Stream Handler, Avro Loader to read records & transform to instances and  a user option to switch between JSON/Binary encodings. The input format with representation of meta-data for both JSON/Binary data to be finalized along with build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)