You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@asterixdb.apache.org by ti...@apache.org on 2016/05/13 16:41:43 UTC

incubator-asterixdb git commit: Add List of Supported Adapters to Doc

Repository: incubator-asterixdb
Updated Branches:
  refs/heads/master 1defc92ae -> 0716dc062


Add List of Supported Adapters to Doc

Change-Id: I2bb98477e144e78e9983d33f9dd2f89a547aeccf
Reviewed-on: https://asterix-gerrit.ics.uci.edu/802
Reviewed-by: Till Westmann <ti...@apache.org>
Tested-by: Jenkins <je...@fulliautomatix.ics.uci.edu>


Project: http://git-wip-us.apache.org/repos/asf/incubator-asterixdb/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-asterixdb/commit/0716dc06
Tree: http://git-wip-us.apache.org/repos/asf/incubator-asterixdb/tree/0716dc06
Diff: http://git-wip-us.apache.org/repos/asf/incubator-asterixdb/diff/0716dc06

Branch: refs/heads/master
Commit: 0716dc062d8cf6be45a0f8ad7b09ffc7ad681f64
Parents: 1defc92
Author: Abdullah Alamoudi <ba...@gmail.com>
Authored: Wed Apr 20 09:51:00 2016 +0300
Committer: Till Westmann <ti...@apache.org>
Committed: Fri May 13 09:41:49 2016 -0700

----------------------------------------------------------------------
 .../src/site/markdown/aql/externaldata.md       | 38 +++++++++++++++++++-
 .../provider/DatasourceFactoryProvider.java     |  3 ++
 2 files changed, 40 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-asterixdb/blob/0716dc06/asterixdb/asterix-doc/src/site/markdown/aql/externaldata.md
----------------------------------------------------------------------
diff --git a/asterixdb/asterix-doc/src/site/markdown/aql/externaldata.md b/asterixdb/asterix-doc/src/site/markdown/aql/externaldata.md
index d5281cb..5095b97 100644
--- a/asterixdb/asterix-doc/src/site/markdown/aql/externaldata.md
+++ b/asterixdb/asterix-doc/src/site/markdown/aql/externaldata.md
@@ -23,6 +23,7 @@
 
 * [Introduction](#Introduction)
 * [Adapter for an External Dataset](#IntroductionAdapterForAnExternalDataset)
+* [Builtin Adapters](#BuiltinAdapters)
 * [Creating an External Dataset](#IntroductionCreatingAnExternalDataset)
 * [Writing Queries against an External Dataset](#WritingQueriesAgainstAnExternalDataset)
 * [Building Indexes over External Datasets](#BuildingIndexesOverExternalDatasets)
@@ -35,8 +36,43 @@ Data that needs to be processed by AsterixDB could be residing outside AsterixDB
 ### <a id="IntroductionAdapterForAnExternalDataset">Adapter for an External Dataset</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ###
 External data is accessed using wrappers (adapters in AsterixDB) that abstract away the mechanism of connecting with an external service, receiving its data and transforming the data into ADM records that are understood by AsterixDB. AsterixDB comes with built-in adapters for common storage systems such as HDFS or the local file system.
 
-### <a id="IntroductionCreatingAnExternalDataset">Creating an External Dataset</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ###
+### <a id="BuiltinAdapters">Builtin Adapters</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ###
+AsterixDB offers a set of builtin adapters that can be used to query external data or for loading data into an internal dataset using a load statement or a data feed. Each adapter requires specifying the `format` of the data in order to be able to parse records correctly. Using adapters with feeds, the parameter `output-type` must also be specified.
+
+Following is a listing of existing built-in adapters and their configuration parameters:
+
+1. ___localfs___: used for reading data stored in a local filesystem in one or more of the node controllers
+     * `path`: A fully qualified path of the form `host://absolute_path`. Comma separated list if there are
+     multiple directories or files
+     * `expression`: A [regular expression](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html) to match and filter against file names
+2. ___hdfs___: used for reading data stored in an HDFS instance
+     * `path`: A fully qualified path of the form `host://absolute_path`. Comma separated list if there are
+     multiple directories or files
+     * `expression`: A [regular expression](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html) to match and filter against file names
+     * `input-format`: A fully qualified name or an alias for a class of HDFS input format
+     * `hdfs`: The HDFS name node URL
+3. ___socket___: used for listening to connections that sends data streams through one or more sockets
+     * `sockets`: comma separated list of sockets to listen to
+     * `address-type`: either IP if the list uses IP addresses, or NC if the list uses NC names
+4. ___socket\_client___: used for connecting to one or more sockets and reading data streams
+     * `sockets`: comma separated list of sockets to connect to
+5. ___twitter\_push___: used for establishing a connection and subscribing to a twitter feed
+     * `consumer.key`: access parameter provided by twitter OAuth
+     * `consumer.secret`: access parameter provided by twitter OAuth
+     * `access.token`: access parameter provided by twitter OAuth
+     * `access.token.secret`: access parameter provided by twitter OAuth
+6. ___twitter\_pull___: used for polling a twitter feed for tweets based on a configurable frequency
+     * `consumer.key`: access parameter provided by twitter OAuth
+     * `consumer.secret`: access parameter provided by twitter OAuth
+     * `access.token`: access parameter provided by twitter OAuth
+     * `access.token.secret`: access parameter provided by twitter OAuth
+     * `query`: twitter query string
+     * `interval`: poll interval in seconds
+7. ___rss___: used for reading RSS feed
+     * `url`: a comma separated list of RSS urls
+
 
+### <a id="IntroductionCreatingAnExternalDataset">Creating an External Dataset</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ###
 As an example we consider the Lineitem dataset from the [TPCH schema](http://www.openlinksw.com/dataspace/doc/dav/wiki/Main/VOSTPCHLinkedData/tpch.sql).
 We assume that you have successfully created an AsterixDB instance following the instructions at [Installing AsterixDB Using Managix](../install.html). _For constructing an example, we assume a single machine setup.._
 

http://git-wip-us.apache.org/repos/asf/incubator-asterixdb/blob/0716dc06/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/provider/DatasourceFactoryProvider.java
----------------------------------------------------------------------
diff --git a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/provider/DatasourceFactoryProvider.java b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/provider/DatasourceFactoryProvider.java
index 0954fca..1dd2fe8 100644
--- a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/provider/DatasourceFactoryProvider.java
+++ b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/provider/DatasourceFactoryProvider.java
@@ -29,6 +29,7 @@ import org.apache.asterix.external.input.HDFSDataSourceFactory;
 import org.apache.asterix.external.input.record.reader.RecordWithPKTestReaderFactory;
 import org.apache.asterix.external.input.record.reader.kv.KVReaderFactory;
 import org.apache.asterix.external.input.record.reader.kv.KVTestReaderFactory;
+import org.apache.asterix.external.input.record.reader.rss.RSSRecordReaderFactory;
 import org.apache.asterix.external.input.record.reader.stream.StreamRecordReaderFactory;
 import org.apache.asterix.external.input.record.reader.twitter.TwitterRecordReaderFactory;
 import org.apache.asterix.external.input.stream.factory.LocalFSInputStreamFactory;
@@ -112,6 +113,8 @@ public class DatasourceFactoryProvider {
                 return new StreamRecordReaderFactory(new SocketServerInputStreamFactory());
             case ExternalDataConstants.STREAM_SOCKET_CLIENT:
                 return new StreamRecordReaderFactory(new SocketClientInputStreamFactory());
+            case ExternalDataConstants.READER_RSS:
+                return new RSSRecordReaderFactory();
             default:
                 try {
                     return (IRecordReaderFactory<?>) Class.forName(reader).newInstance();