You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@asterixdb.apache.org by Michael Carey <mj...@ics.uci.edu> on 2015/10/28 05:54:03 UTC

Re: Socket feed questions

Thanks!

On 10/27/15 9:48 AM, Raman Grover wrote:
> Hi,
>
>
> In the case when data is being received from an external source (e.g. 
> during feed ingestion), a slow rate of arrival of data may result in 
> excessive delays until the data is deposited into the target dataset 
> and made accessible to queries. Data moves along a data ingestion 
> pipeline between operators as packed fixed size frames. The default 
> behavior is to wait for the frame to be full before dispatching the 
> contained data to the downstream operator. However, as noted, this may 
> not suit all scenarios particularly when data source is sending data 
> at a low rate. To cater to different scenarios, AsterixDB allows 
> configuring the behavior. The different options are described next.
>
> *Push data downstream when*
> (a) Frame is full (default)
> (b) At least N records (data items) have been collected into a 
> partially filled frame
> (c) At least T seconds have elapsed since the last record was put into 
> the frame
>
> *How to configure the behavior?*
> At the time of defining a feed, an end-user may specify configuration 
> parameters that determine the runtime behavior (options (a), (b) or 
> (c) from above).
>
> The parameters are described below:
>
> /"parser-policy"/: A specific strategy chosen from a set of 
> pre-defined values -
>   (i) / "frame_full"/
>  This is the default value. As the name suggests, this choice causes 
> frames to be pushed by the feed adaptor only when there isn't 
> sufficient space for an additional record to fit in. This corresponds 
> to option (a).
>
>  (ii) / "counter_timer_expired" /
>  Use this as the value if you wish to set either option (b) or (c)  or 
> a combination of both.
>
> *Some Examples*
> *
> *
> 1) Pack a maximum of 100 records into a data frame and push it 
> downstream.
>
>  create feed my_feed using my_adaptor
> (("parser-policy"="counter_timer_expired"), ("batch-size"="100"), ... 
> other parameters);
>
> 2) Wait till 2 seconds and send however many records collected in a 
> frame downstream.
>  create feed my_feed using my_adaptor
> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2")... 
> other parameters);
>
> 3) Wait till 100 records have been collected into a data frame or 2 
> seconds have elapsed since the last record was put into the current 
> data frame.
>  create feed my_feed using my_adaptor
> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2"), 
> ("batch-size"="100"),... other parameters);
>
>
> *Note*
> The above config parameters are not specific to using a particular 
> implementation of an adaptor but are available for use with any feed 
> adaptor. Some adaptors that ship with AsterixDB use different default 
> values for above to suit their specific scenario. E.g. the pull-based 
> twitter adaptor uses "counter_timer_expired" as the "parser-policy" 
> and sets the  parameter "batch-interval".
>
>
> Regards,
> Raman
> PS: The names of the parameters described above are not as intuitive 
> as one would like them to be. The names need to be changed.
>
>
>
>
>
>
>
>
> On Thu, Oct 22, 2015 at 9:09 AM, Mike Carey <dtabass@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     I think we need to have tuning parameters - like batch size and
>     maximum tolerable latency (in case there's a lull and you still
>     want to push stuff with some worst-case delay). @Raman Grover -
>     remind me (us) what's available in this regard?
>
>     On 10/22/15 4:29 AM, Pääkkönen Pekka wrote:
>>
>>     Hi,
>>
>>     Yes, you are right. I tried sending a larger amount of data, and
>>     data is now stored to the database.
>>
>>     Does it make sense to configure a smaller batch size in order to
>>     get more frequent writes?
>>
>>     Or would it significantly impact performance?
>>
>>     -Pekka
>>
>>     Data moves through the pipeline in frame-sized batches, so one
>>
>>     (uniformed :-)) guess is that you aren't running very long, and
>>     you're
>>
>>     only seeing the data flow when you close because only then do you
>>     have a
>>
>>     batch's worth.  Is that possible?  You can test this by running
>>     longer
>>
>>     (more data) and seeing if you start to see the expected incremental
>>
>>     flow/inserts. (And we need tunability in this area, e.g.,
>>     parameters on
>>
>>     how much batching and/or low much latency to tolerate on each feed.)
>>
>>     On 10/21/15 4:45 AM, Pääkkönen Pekka wrote:
>>
>>     >
>>
>>     > Hi,
>>
>>     >
>>
>>     > Thanks, now I am able to create a socket feed, and save items to the
>>
>>     > dataset from the feed.
>>
>>     >
>>
>>     > It seems that data items are written to the dataset after I close the
>>
>>     > socket at the client.
>>
>>     >
>>
>>     > Is there some way to indicate to AsterixDB feed (with a newline or
>>
>>     > other indicator) that data can be written to the database, when the
>>
>>     > connection is open?
>>
>>     >
>>
>>     > After I close the socket at the client, the feed seems to close down.
>>
>>     > Or is it only paused, until it is resumed?
>>
>>     >
>>
>>     > -Pekka
>>
>>     >
>>
>>     > Hi Pekka,
>>
>>     >
>>
>>     > That's interesting, I'm not sure why the CC would appear as being down
>>
>>     >
>>
>>     > to Managix. However if you can access the web console, it that
>>
>>     >
>>
>>     > evidently isn't the case.
>>
>>     >
>>
>>     > As for data ingestion via sockets, yes it is possible, but it kind of
>>
>>     >
>>
>>     > depends on what's meant by sockets. There's no tutorial for it, but
>>
>>     >
>>
>>     > take a look at SocketBasedFeedAdapter in the source, as well as
>>
>>     >
>>
>>     > https://github.com/kisskys/incubator-asterixdb/blob/kisskys/indexonlyhilbertbtree/asterix-experiments/src/main/java/org/apache/asterix/experiment/client/SocketTweetGenerator.java
>>
>>     >
>>
>>     > for some examples of how it works.
>>
>>     >
>>
>>     > Hope that helps!
>>
>>     >
>>
>>     > Thanks,
>>
>>     >
>>
>>     > -Ian
>>
>>     >
>>
>>     > On Mon, Oct 19, 2015 at 10:15 PM, Pääkkönen Pekka
>>
>>     ><Pe...@vtt.fi> <ma...@vtt.fi> wrote:
>>
>>     > > Hi Ian,
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > > Thanks for the reply.
>>
>>     > >
>>
>>     > > I compiled AsterixDB v0.87 and started it.
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > > However, I get the following warnings:
>>
>>     > >
>>
>>     > > INFO: Name:my_asterix
>>
>>     > >
>>
>>     > > Created:Mon Oct 19 08:37:16 UTC 2015
>>
>>     > >
>>
>>     > > Web-Url:http://192.168.101.144:19001
>>
>>     > >
>>
>>     > > State:UNUSABLE
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > > WARNING!:Cluster Controller not running at master
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > > Also, I see the following warnings in my_asterixdb1.log. there
>>     are no
>>
>>     > > warnings or errors in cc.log
>>
>>     > >
>>
>>     > > “
>>
>>     > >
>>
>>     > > Oct 19, 2015 8:37:39 AM
>>
>>     > > org.apache.hyracks.api.lifecycle.LifeCycleComponentManager configure
>>
>>     > >
>>
>>     > > SEVERE: LifecycleComponentManager configured
>>
>>     > > org.apache.hyracks.api.lifecycle.LifeCycleComponentManager@7559ec47
>>
>>     > >
>>
>>     > > ..
>>
>>     > >
>>
>>     > > INFO: Completed sharp checkpoint.
>>
>>     > >
>>
>>     > > Oct 19, 2015 8:37:40 AM
>>     org.apache.asterix.om.util.AsterixClusterProperties
>>
>>     > > getIODevices
>>
>>     > >
>>
>>     > > WARNING: Configuration parameters for nodeId my_asterix_node1
>>     not found. The
>>
>>     > > node has not joined yet or has left.
>>
>>     > >
>>
>>     > > Oct 19, 2015 8:37:40 AM
>>     org.apache.asterix.om.util.AsterixClusterProperties
>>
>>     > > getIODevices
>>
>>     > >
>>
>>     > > WARNING: Configuration parameters for nodeId my_asterix_node1
>>     not found. The
>>
>>     > > node has not joined yet or has left.
>>
>>     > >
>>
>>     > > Oct 19, 2015 8:38:38 AM
>>
>>     > > org.apache.hyracks.control.common.dataset.ResultStateSweeper sweep
>>
>>     > >
>>
>>     > > INFO: Result state cleanup instance successfully completed.”
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > > I seems that AsterixDB is running, and I can access it at port 19001.
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > > The documentation shows ingestion of tweets, but I would be
>>     interested in
>>
>>     > > using sockets.
>>
>>     > >
>>
>>     > > Is it possible to ingest data from sockets?
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > > Regards,
>>
>>     > >
>>
>>     > > -Pekka
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > > Hey there Pekka,
>>
>>     > >
>>
>>     > > Your intuition is correct, most of the newer feeds features are in the
>>
>>     > >
>>
>>     > > current master branch and not in the (very) old 0.8.6 release.
>>     If you'd
>>
>>     > >
>>
>>     > > like to experiment with them you'll have to build from source. The
>>     details
>>
>>     > >
>>
>>     > > about that are here:
>>
>>     > >
>>
>>     > > https://asterixdb.incubator.apache.org/dev-setup.html#setting-up-an-asterix-development-environment-in-eclipse
>>
>>     > >
>>
>>     > > , but they're probably a bit overkill for just trying to get the
>>     compiled
>>
>>     > >
>>
>>     > > binaries. For that all you really need to do is :
>>
>>     > >
>>
>>     > > - Clone Hyracks from git
>>
>>     > >
>>
>>     > > - 'mvn clean install -DskipTests'
>>
>>     > >
>>
>>     > > - Clone AsterixDB
>>
>>     > >
>>
>>     > > - 'mvn clean package -DskipTests'
>>
>>     > >
>>
>>     > > Then, the binaries will sit in asterix-installer/target
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > > For an example, the documentation shows how to set up a feed that's
>>
>>     > >
>>
>>     > > ingesting Tweets:
>>
>>     > >
>>
>>     > > https://asterix-jenkins.ics.uci.edu/job/asterix-test-full/site/asterix-doc/feeds/tutorial.html
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > > Thanks,
>>
>>     > >
>>
>>     > > -Ian
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > > On Wed, Oct 7, 2015 at 9:48 PM, Pääkkönen Pekka
>>     <Pe...@vtt.fi> <ma...@vtt.fi>
>>
>>     > >
>>
>>     > > wrote:
>>
>>     > >
>>
>>     > >
>>
>>     > >
>>
>>     > >> Hi,
>>
>>     > >
>>
>>     > >>
>>
>>     > >
>>
>>     > >>
>>
>>     > >
>>
>>     > >>
>>
>>     > >
>>
>>     > >> I would like to experiment with a socket-based feed.
>>
>>     > >
>>
>>     > >>
>>
>>     > >
>>
>>     > >> Can you point me to an example on how to utilize them?
>>
>>     > >
>>
>>     > >>
>>
>>     > >
>>
>>     > >> Do I need to install 0.8.7-snapshot version of AsterixDB in order to
>>
>>     > >
>>
>>     > >> experiment with feeds?
>>
>>     > >
>>
>>     > >>
>>
>>     > >
>>
>>     > >>
>>
>>     > >
>>
>>     > >>
>>
>>     > >
>>
>>     > >> Regards,
>>
>>     > >
>>
>>     > >>
>>
>>     > >
>>
>>     > >> -Pekka Pääkkönen
>>
>>     > >
>>
>>     > >>
>>
>>     > >
>>
>>     > >
>>
>>     >
>>
>
>
>
>
> -- 
> Raman


Re: Socket feed questions

Posted by Chen Li <ch...@gmail.com>.
I think Raman knows where to look for the test case(s) for AQL UDFs?  (The
answer to question 2 is presumably Yes.)

Chen

On Thu, Oct 29, 2015 at 12:22 PM, Jianfeng Jia <ji...@gmail.com>
wrote:

> Hi Devs,
>
> I have two related questions,
> 1. Is there any example code of using UDF in feed-adapter?
> 2. Can we use AQL function in those kind of feed UDFs?
>
> Thank you.
>
> On Tue, Oct 27, 2015 at 9:54 PM, Michael Carey <mj...@ics.uci.edu>
> wrote:
>
>> Thanks!
>>
>> On 10/27/15 9:48 AM, Raman Grover wrote:
>>
>>> Hi,
>>>
>>>
>>> In the case when data is being received from an external source (e.g.
>>> during feed ingestion), a slow rate of arrival of data may result in
>>> excessive delays until the data is deposited into the target dataset and
>>> made accessible to queries. Data moves along a data ingestion pipeline
>>> between operators as packed fixed size frames. The default behavior is to
>>> wait for the frame to be full before dispatching the contained data to the
>>> downstream operator. However, as noted, this may not suit all scenarios
>>> particularly when data source is sending data at a low rate. To cater to
>>> different scenarios, AsterixDB allows configuring the behavior. The
>>> different options are described next.
>>>
>>> *Push data downstream when*
>>> (a) Frame is full (default)
>>> (b) At least N records (data items) have been collected into a partially
>>> filled frame
>>> (c) At least T seconds have elapsed since the last record was put into
>>> the frame
>>>
>>> *How to configure the behavior?*
>>> At the time of defining a feed, an end-user may specify configuration
>>> parameters that determine the runtime behavior (options (a), (b) or (c)
>>> from above).
>>>
>>> The parameters are described below:
>>>
>>> /"parser-policy"/: A specific strategy chosen from a set of pre-defined
>>> values -
>>>   (i) / "frame_full"/
>>>  This is the default value. As the name suggests, this choice causes
>>> frames to be pushed by the feed adaptor only when there isn't sufficient
>>> space for an additional record to fit in. This corresponds to option (a).
>>>
>>>  (ii) / "counter_timer_expired" /
>>>  Use this as the value if you wish to set either option (b) or (c)  or a
>>> combination of both.
>>>
>>> *Some Examples*
>>> *
>>> *
>>> 1) Pack a maximum of 100 records into a data frame and push it
>>> downstream.
>>>
>>>  create feed my_feed using my_adaptor
>>> (("parser-policy"="counter_timer_expired"), ("batch-size"="100"), ...
>>> other parameters);
>>>
>>> 2) Wait till 2 seconds and send however many records collected in a
>>> frame downstream.
>>>  create feed my_feed using my_adaptor
>>> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2")...
>>> other parameters);
>>>
>>> 3) Wait till 100 records have been collected into a data frame or 2
>>> seconds have elapsed since the last record was put into the current data
>>> frame.
>>>  create feed my_feed using my_adaptor
>>> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2"),
>>> ("batch-size"="100"),... other parameters);
>>>
>>>
>>> *Note*
>>> The above config parameters are not specific to using a particular
>>> implementation of an adaptor but are available for use with any feed
>>> adaptor. Some adaptors that ship with AsterixDB use different default
>>> values for above to suit their specific scenario. E.g. the pull-based
>>> twitter adaptor uses "counter_timer_expired" as the "parser-policy" and
>>> sets the  parameter "batch-interval".
>>>
>>>
>>> Regards,
>>> Raman
>>> PS: The names of the parameters described above are not as intuitive as
>>> one would like them to be. The names need to be changed.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Oct 22, 2015 at 9:09 AM, Mike Carey <dtabass@gmail.com <mailto:
>>> dtabass@gmail.com>> wrote:
>>>
>>>     I think we need to have tuning parameters - like batch size and
>>>     maximum tolerable latency (in case there's a lull and you still
>>>     want to push stuff with some worst-case delay). @Raman Grover -
>>>     remind me (us) what's available in this regard?
>>>
>>>     On 10/22/15 4:29 AM, Pääkkönen Pekka wrote:
>>>
>>>>
>>>>     Hi,
>>>>
>>>>     Yes, you are right. I tried sending a larger amount of data, and
>>>>     data is now stored to the database.
>>>>
>>>>     Does it make sense to configure a smaller batch size in order to
>>>>     get more frequent writes?
>>>>
>>>>     Or would it significantly impact performance?
>>>>
>>>>     -Pekka
>>>>
>>>>     Data moves through the pipeline in frame-sized batches, so one
>>>>
>>>>     (uniformed :-)) guess is that you aren't running very long, and
>>>>     you're
>>>>
>>>>     only seeing the data flow when you close because only then do you
>>>>     have a
>>>>
>>>>     batch's worth.  Is that possible?  You can test this by running
>>>>     longer
>>>>
>>>>     (more data) and seeing if you start to see the expected incremental
>>>>
>>>>     flow/inserts. (And we need tunability in this area, e.g.,
>>>>     parameters on
>>>>
>>>>     how much batching and/or low much latency to tolerate on each feed.)
>>>>
>>>>     On 10/21/15 4:45 AM, Pääkkönen Pekka wrote:
>>>>
>>>>     >
>>>>
>>>>     > Hi,
>>>>
>>>>     >
>>>>
>>>>     > Thanks, now I am able to create a socket feed, and save items to
>>>> the
>>>>
>>>>     > dataset from the feed.
>>>>
>>>>     >
>>>>
>>>>     > It seems that data items are written to the dataset after I close
>>>> the
>>>>
>>>>     > socket at the client.
>>>>
>>>>     >
>>>>
>>>>     > Is there some way to indicate to AsterixDB feed (with a newline or
>>>>
>>>>     > other indicator) that data can be written to the database, when
>>>> the
>>>>
>>>>     > connection is open?
>>>>
>>>>     >
>>>>
>>>>     > After I close the socket at the client, the feed seems to close
>>>> down.
>>>>
>>>>     > Or is it only paused, until it is resumed?
>>>>
>>>>     >
>>>>
>>>>     > -Pekka
>>>>
>>>>     >
>>>>
>>>>     > Hi Pekka,
>>>>
>>>>     >
>>>>
>>>>     > That's interesting, I'm not sure why the CC would appear as being
>>>> down
>>>>
>>>>     >
>>>>
>>>>     > to Managix. However if you can access the web console, it that
>>>>
>>>>     >
>>>>
>>>>     > evidently isn't the case.
>>>>
>>>>     >
>>>>
>>>>     > As for data ingestion via sockets, yes it is possible, but it
>>>> kind of
>>>>
>>>>     >
>>>>
>>>>     > depends on what's meant by sockets. There's no tutorial for it,
>>>> but
>>>>
>>>>     >
>>>>
>>>>     > take a look at SocketBasedFeedAdapter in the source, as well as
>>>>
>>>>     >
>>>>
>>>>     >
>>>> https://github.com/kisskys/incubator-asterixdb/blob/kisskys/indexonlyhilbertbtree/asterix-experiments/src/main/java/org/apache/asterix/experiment/client/SocketTweetGenerator.java
>>>>
>>>>     >
>>>>
>>>>     > for some examples of how it works.
>>>>
>>>>     >
>>>>
>>>>     > Hope that helps!
>>>>
>>>>     >
>>>>
>>>>     > Thanks,
>>>>
>>>>     >
>>>>
>>>>     > -Ian
>>>>
>>>>     >
>>>>
>>>>     > On Mon, Oct 19, 2015 at 10:15 PM, Pääkkönen Pekka
>>>>
>>>>     ><Pe...@vtt.fi> <ma...@vtt.fi> wrote:
>>>>
>>>>     > > Hi Ian,
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > Thanks for the reply.
>>>>
>>>>     > >
>>>>
>>>>     > > I compiled AsterixDB v0.87 and started it.
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > However, I get the following warnings:
>>>>
>>>>     > >
>>>>
>>>>     > > INFO: Name:my_asterix
>>>>
>>>>     > >
>>>>
>>>>     > > Created:Mon Oct 19 08:37:16 UTC 2015
>>>>
>>>>     > >
>>>>
>>>>     > > Web-Url:http://192.168.101.144:19001
>>>>
>>>>     > >
>>>>
>>>>     > > State:UNUSABLE
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > WARNING!:Cluster Controller not running at master
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > Also, I see the following warnings in my_asterixdb1.log. there
>>>>     are no
>>>>
>>>>     > > warnings or errors in cc.log
>>>>
>>>>     > >
>>>>
>>>>     > > “
>>>>
>>>>     > >
>>>>
>>>>     > > Oct 19, 2015 8:37:39 AM
>>>>
>>>>     > > org.apache.hyracks.api.lifecycle.LifeCycleComponentManager
>>>> configure
>>>>
>>>>     > >
>>>>
>>>>     > > SEVERE: LifecycleComponentManager configured
>>>>
>>>>     > >
>>>> org.apache.hyracks.api.lifecycle.LifeCycleComponentManager@7559ec47
>>>>
>>>>     > >
>>>>
>>>>     > > ..
>>>>
>>>>     > >
>>>>
>>>>     > > INFO: Completed sharp checkpoint.
>>>>
>>>>     > >
>>>>
>>>>     > > Oct 19, 2015 8:37:40 AM
>>>>     org.apache.asterix.om.util.AsterixClusterProperties
>>>>
>>>>     > > getIODevices
>>>>
>>>>     > >
>>>>
>>>>     > > WARNING: Configuration parameters for nodeId my_asterix_node1
>>>>     not found. The
>>>>
>>>>     > > node has not joined yet or has left.
>>>>
>>>>     > >
>>>>
>>>>     > > Oct 19, 2015 8:37:40 AM
>>>>     org.apache.asterix.om.util.AsterixClusterProperties
>>>>
>>>>     > > getIODevices
>>>>
>>>>     > >
>>>>
>>>>     > > WARNING: Configuration parameters for nodeId my_asterix_node1
>>>>     not found. The
>>>>
>>>>     > > node has not joined yet or has left.
>>>>
>>>>     > >
>>>>
>>>>     > > Oct 19, 2015 8:38:38 AM
>>>>
>>>>     > > org.apache.hyracks.control.common.dataset.ResultStateSweeper
>>>> sweep
>>>>
>>>>     > >
>>>>
>>>>     > > INFO: Result state cleanup instance successfully completed.”
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > I seems that AsterixDB is running, and I can access it at port
>>>> 19001.
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > The documentation shows ingestion of tweets, but I would be
>>>>     interested in
>>>>
>>>>     > > using sockets.
>>>>
>>>>     > >
>>>>
>>>>     > > Is it possible to ingest data from sockets?
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > Regards,
>>>>
>>>>     > >
>>>>
>>>>     > > -Pekka
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > Hey there Pekka,
>>>>
>>>>     > >
>>>>
>>>>     > > Your intuition is correct, most of the newer feeds features are
>>>> in the
>>>>
>>>>     > >
>>>>
>>>>     > > current master branch and not in the (very) old 0.8.6 release.
>>>>     If you'd
>>>>
>>>>     > >
>>>>
>>>>     > > like to experiment with them you'll have to build from source.
>>>> The
>>>>     details
>>>>
>>>>     > >
>>>>
>>>>     > > about that are here:
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>> https://asterixdb.incubator.apache.org/dev-setup.html#setting-up-an-asterix-development-environment-in-eclipse
>>>>
>>>>     > >
>>>>
>>>>     > > , but they're probably a bit overkill for just trying to get the
>>>>     compiled
>>>>
>>>>     > >
>>>>
>>>>     > > binaries. For that all you really need to do is :
>>>>
>>>>     > >
>>>>
>>>>     > > - Clone Hyracks from git
>>>>
>>>>     > >
>>>>
>>>>     > > - 'mvn clean install -DskipTests'
>>>>
>>>>     > >
>>>>
>>>>     > > - Clone AsterixDB
>>>>
>>>>     > >
>>>>
>>>>     > > - 'mvn clean package -DskipTests'
>>>>
>>>>     > >
>>>>
>>>>     > > Then, the binaries will sit in asterix-installer/target
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > For an example, the documentation shows how to set up a feed
>>>> that's
>>>>
>>>>     > >
>>>>
>>>>     > > ingesting Tweets:
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>> https://asterix-jenkins.ics.uci.edu/job/asterix-test-full/site/asterix-doc/feeds/tutorial.html
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > Thanks,
>>>>
>>>>     > >
>>>>
>>>>     > > -Ian
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > On Wed, Oct 7, 2015 at 9:48 PM, Pääkkönen Pekka
>>>>     <Pe...@vtt.fi> <ma...@vtt.fi>
>>>>
>>>>     > >
>>>>
>>>>     > > wrote:
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >> Hi,
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >> I would like to experiment with a socket-based feed.
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >> Can you point me to an example on how to utilize them?
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >> Do I need to install 0.8.7-snapshot version of AsterixDB in
>>>> order to
>>>>
>>>>     > >
>>>>
>>>>     > >> experiment with feeds?
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >> Regards,
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >> -Pekka Pääkkönen
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     >
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Raman
>>>
>>
>>
>
>
> --
>
> -----------------
> Best Regards
>
> Jianfeng Jia
> Ph.D. Candidate of Computer Science
> University of California, Irvine
>

Re: Socket feed questions

Posted by Heri Ramampiaro <he...@gmail.com>.
Hi,

Attached is a UDF example (including a maven jar and zip file generation for the
twitter-example in the feed tutorial).

Please let me know if you need further assistance.

Best,
-heri

Re: Socket feed questions

Posted by Chen Li <ch...@gmail.com>.
I think Raman knows where to look for the test case(s) for AQL UDFs?  (The
answer to question 2 is presumably Yes.)

Chen

On Thu, Oct 29, 2015 at 12:22 PM, Jianfeng Jia <ji...@gmail.com>
wrote:

> Hi Devs,
>
> I have two related questions,
> 1. Is there any example code of using UDF in feed-adapter?
> 2. Can we use AQL function in those kind of feed UDFs?
>
> Thank you.
>
> On Tue, Oct 27, 2015 at 9:54 PM, Michael Carey <mj...@ics.uci.edu>
> wrote:
>
>> Thanks!
>>
>> On 10/27/15 9:48 AM, Raman Grover wrote:
>>
>>> Hi,
>>>
>>>
>>> In the case when data is being received from an external source (e.g.
>>> during feed ingestion), a slow rate of arrival of data may result in
>>> excessive delays until the data is deposited into the target dataset and
>>> made accessible to queries. Data moves along a data ingestion pipeline
>>> between operators as packed fixed size frames. The default behavior is to
>>> wait for the frame to be full before dispatching the contained data to the
>>> downstream operator. However, as noted, this may not suit all scenarios
>>> particularly when data source is sending data at a low rate. To cater to
>>> different scenarios, AsterixDB allows configuring the behavior. The
>>> different options are described next.
>>>
>>> *Push data downstream when*
>>> (a) Frame is full (default)
>>> (b) At least N records (data items) have been collected into a partially
>>> filled frame
>>> (c) At least T seconds have elapsed since the last record was put into
>>> the frame
>>>
>>> *How to configure the behavior?*
>>> At the time of defining a feed, an end-user may specify configuration
>>> parameters that determine the runtime behavior (options (a), (b) or (c)
>>> from above).
>>>
>>> The parameters are described below:
>>>
>>> /"parser-policy"/: A specific strategy chosen from a set of pre-defined
>>> values -
>>>   (i) / "frame_full"/
>>>  This is the default value. As the name suggests, this choice causes
>>> frames to be pushed by the feed adaptor only when there isn't sufficient
>>> space for an additional record to fit in. This corresponds to option (a).
>>>
>>>  (ii) / "counter_timer_expired" /
>>>  Use this as the value if you wish to set either option (b) or (c)  or a
>>> combination of both.
>>>
>>> *Some Examples*
>>> *
>>> *
>>> 1) Pack a maximum of 100 records into a data frame and push it
>>> downstream.
>>>
>>>  create feed my_feed using my_adaptor
>>> (("parser-policy"="counter_timer_expired"), ("batch-size"="100"), ...
>>> other parameters);
>>>
>>> 2) Wait till 2 seconds and send however many records collected in a
>>> frame downstream.
>>>  create feed my_feed using my_adaptor
>>> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2")...
>>> other parameters);
>>>
>>> 3) Wait till 100 records have been collected into a data frame or 2
>>> seconds have elapsed since the last record was put into the current data
>>> frame.
>>>  create feed my_feed using my_adaptor
>>> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2"),
>>> ("batch-size"="100"),... other parameters);
>>>
>>>
>>> *Note*
>>> The above config parameters are not specific to using a particular
>>> implementation of an adaptor but are available for use with any feed
>>> adaptor. Some adaptors that ship with AsterixDB use different default
>>> values for above to suit their specific scenario. E.g. the pull-based
>>> twitter adaptor uses "counter_timer_expired" as the "parser-policy" and
>>> sets the  parameter "batch-interval".
>>>
>>>
>>> Regards,
>>> Raman
>>> PS: The names of the parameters described above are not as intuitive as
>>> one would like them to be. The names need to be changed.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Oct 22, 2015 at 9:09 AM, Mike Carey <dtabass@gmail.com <mailto:
>>> dtabass@gmail.com>> wrote:
>>>
>>>     I think we need to have tuning parameters - like batch size and
>>>     maximum tolerable latency (in case there's a lull and you still
>>>     want to push stuff with some worst-case delay). @Raman Grover -
>>>     remind me (us) what's available in this regard?
>>>
>>>     On 10/22/15 4:29 AM, Pääkkönen Pekka wrote:
>>>
>>>>
>>>>     Hi,
>>>>
>>>>     Yes, you are right. I tried sending a larger amount of data, and
>>>>     data is now stored to the database.
>>>>
>>>>     Does it make sense to configure a smaller batch size in order to
>>>>     get more frequent writes?
>>>>
>>>>     Or would it significantly impact performance?
>>>>
>>>>     -Pekka
>>>>
>>>>     Data moves through the pipeline in frame-sized batches, so one
>>>>
>>>>     (uniformed :-)) guess is that you aren't running very long, and
>>>>     you're
>>>>
>>>>     only seeing the data flow when you close because only then do you
>>>>     have a
>>>>
>>>>     batch's worth.  Is that possible?  You can test this by running
>>>>     longer
>>>>
>>>>     (more data) and seeing if you start to see the expected incremental
>>>>
>>>>     flow/inserts. (And we need tunability in this area, e.g.,
>>>>     parameters on
>>>>
>>>>     how much batching and/or low much latency to tolerate on each feed.)
>>>>
>>>>     On 10/21/15 4:45 AM, Pääkkönen Pekka wrote:
>>>>
>>>>     >
>>>>
>>>>     > Hi,
>>>>
>>>>     >
>>>>
>>>>     > Thanks, now I am able to create a socket feed, and save items to
>>>> the
>>>>
>>>>     > dataset from the feed.
>>>>
>>>>     >
>>>>
>>>>     > It seems that data items are written to the dataset after I close
>>>> the
>>>>
>>>>     > socket at the client.
>>>>
>>>>     >
>>>>
>>>>     > Is there some way to indicate to AsterixDB feed (with a newline or
>>>>
>>>>     > other indicator) that data can be written to the database, when
>>>> the
>>>>
>>>>     > connection is open?
>>>>
>>>>     >
>>>>
>>>>     > After I close the socket at the client, the feed seems to close
>>>> down.
>>>>
>>>>     > Or is it only paused, until it is resumed?
>>>>
>>>>     >
>>>>
>>>>     > -Pekka
>>>>
>>>>     >
>>>>
>>>>     > Hi Pekka,
>>>>
>>>>     >
>>>>
>>>>     > That's interesting, I'm not sure why the CC would appear as being
>>>> down
>>>>
>>>>     >
>>>>
>>>>     > to Managix. However if you can access the web console, it that
>>>>
>>>>     >
>>>>
>>>>     > evidently isn't the case.
>>>>
>>>>     >
>>>>
>>>>     > As for data ingestion via sockets, yes it is possible, but it
>>>> kind of
>>>>
>>>>     >
>>>>
>>>>     > depends on what's meant by sockets. There's no tutorial for it,
>>>> but
>>>>
>>>>     >
>>>>
>>>>     > take a look at SocketBasedFeedAdapter in the source, as well as
>>>>
>>>>     >
>>>>
>>>>     >
>>>> https://github.com/kisskys/incubator-asterixdb/blob/kisskys/indexonlyhilbertbtree/asterix-experiments/src/main/java/org/apache/asterix/experiment/client/SocketTweetGenerator.java
>>>>
>>>>     >
>>>>
>>>>     > for some examples of how it works.
>>>>
>>>>     >
>>>>
>>>>     > Hope that helps!
>>>>
>>>>     >
>>>>
>>>>     > Thanks,
>>>>
>>>>     >
>>>>
>>>>     > -Ian
>>>>
>>>>     >
>>>>
>>>>     > On Mon, Oct 19, 2015 at 10:15 PM, Pääkkönen Pekka
>>>>
>>>>     ><Pe...@vtt.fi> <ma...@vtt.fi> wrote:
>>>>
>>>>     > > Hi Ian,
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > Thanks for the reply.
>>>>
>>>>     > >
>>>>
>>>>     > > I compiled AsterixDB v0.87 and started it.
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > However, I get the following warnings:
>>>>
>>>>     > >
>>>>
>>>>     > > INFO: Name:my_asterix
>>>>
>>>>     > >
>>>>
>>>>     > > Created:Mon Oct 19 08:37:16 UTC 2015
>>>>
>>>>     > >
>>>>
>>>>     > > Web-Url:http://192.168.101.144:19001
>>>>
>>>>     > >
>>>>
>>>>     > > State:UNUSABLE
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > WARNING!:Cluster Controller not running at master
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > Also, I see the following warnings in my_asterixdb1.log. there
>>>>     are no
>>>>
>>>>     > > warnings or errors in cc.log
>>>>
>>>>     > >
>>>>
>>>>     > > “
>>>>
>>>>     > >
>>>>
>>>>     > > Oct 19, 2015 8:37:39 AM
>>>>
>>>>     > > org.apache.hyracks.api.lifecycle.LifeCycleComponentManager
>>>> configure
>>>>
>>>>     > >
>>>>
>>>>     > > SEVERE: LifecycleComponentManager configured
>>>>
>>>>     > >
>>>> org.apache.hyracks.api.lifecycle.LifeCycleComponentManager@7559ec47
>>>>
>>>>     > >
>>>>
>>>>     > > ..
>>>>
>>>>     > >
>>>>
>>>>     > > INFO: Completed sharp checkpoint.
>>>>
>>>>     > >
>>>>
>>>>     > > Oct 19, 2015 8:37:40 AM
>>>>     org.apache.asterix.om.util.AsterixClusterProperties
>>>>
>>>>     > > getIODevices
>>>>
>>>>     > >
>>>>
>>>>     > > WARNING: Configuration parameters for nodeId my_asterix_node1
>>>>     not found. The
>>>>
>>>>     > > node has not joined yet or has left.
>>>>
>>>>     > >
>>>>
>>>>     > > Oct 19, 2015 8:37:40 AM
>>>>     org.apache.asterix.om.util.AsterixClusterProperties
>>>>
>>>>     > > getIODevices
>>>>
>>>>     > >
>>>>
>>>>     > > WARNING: Configuration parameters for nodeId my_asterix_node1
>>>>     not found. The
>>>>
>>>>     > > node has not joined yet or has left.
>>>>
>>>>     > >
>>>>
>>>>     > > Oct 19, 2015 8:38:38 AM
>>>>
>>>>     > > org.apache.hyracks.control.common.dataset.ResultStateSweeper
>>>> sweep
>>>>
>>>>     > >
>>>>
>>>>     > > INFO: Result state cleanup instance successfully completed.”
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > I seems that AsterixDB is running, and I can access it at port
>>>> 19001.
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > The documentation shows ingestion of tweets, but I would be
>>>>     interested in
>>>>
>>>>     > > using sockets.
>>>>
>>>>     > >
>>>>
>>>>     > > Is it possible to ingest data from sockets?
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > Regards,
>>>>
>>>>     > >
>>>>
>>>>     > > -Pekka
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > Hey there Pekka,
>>>>
>>>>     > >
>>>>
>>>>     > > Your intuition is correct, most of the newer feeds features are
>>>> in the
>>>>
>>>>     > >
>>>>
>>>>     > > current master branch and not in the (very) old 0.8.6 release.
>>>>     If you'd
>>>>
>>>>     > >
>>>>
>>>>     > > like to experiment with them you'll have to build from source.
>>>> The
>>>>     details
>>>>
>>>>     > >
>>>>
>>>>     > > about that are here:
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>> https://asterixdb.incubator.apache.org/dev-setup.html#setting-up-an-asterix-development-environment-in-eclipse
>>>>
>>>>     > >
>>>>
>>>>     > > , but they're probably a bit overkill for just trying to get the
>>>>     compiled
>>>>
>>>>     > >
>>>>
>>>>     > > binaries. For that all you really need to do is :
>>>>
>>>>     > >
>>>>
>>>>     > > - Clone Hyracks from git
>>>>
>>>>     > >
>>>>
>>>>     > > - 'mvn clean install -DskipTests'
>>>>
>>>>     > >
>>>>
>>>>     > > - Clone AsterixDB
>>>>
>>>>     > >
>>>>
>>>>     > > - 'mvn clean package -DskipTests'
>>>>
>>>>     > >
>>>>
>>>>     > > Then, the binaries will sit in asterix-installer/target
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > For an example, the documentation shows how to set up a feed
>>>> that's
>>>>
>>>>     > >
>>>>
>>>>     > > ingesting Tweets:
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>> https://asterix-jenkins.ics.uci.edu/job/asterix-test-full/site/asterix-doc/feeds/tutorial.html
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > Thanks,
>>>>
>>>>     > >
>>>>
>>>>     > > -Ian
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > > On Wed, Oct 7, 2015 at 9:48 PM, Pääkkönen Pekka
>>>>     <Pe...@vtt.fi> <ma...@vtt.fi>
>>>>
>>>>     > >
>>>>
>>>>     > > wrote:
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     > >> Hi,
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >> I would like to experiment with a socket-based feed.
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >> Can you point me to an example on how to utilize them?
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >> Do I need to install 0.8.7-snapshot version of AsterixDB in
>>>> order to
>>>>
>>>>     > >
>>>>
>>>>     > >> experiment with feeds?
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >> Regards,
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >> -Pekka Pääkkönen
>>>>
>>>>     > >
>>>>
>>>>     > >>
>>>>
>>>>     > >
>>>>
>>>>     > >
>>>>
>>>>     >
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Raman
>>>
>>
>>
>
>
> --
>
> -----------------
> Best Regards
>
> Jianfeng Jia
> Ph.D. Candidate of Computer Science
> University of California, Irvine
>

Re: Socket feed questions

Posted by Michael Carey <mj...@ics.uci.edu>.
I know that Raman definitely had such a thing "running in the lab" here 
at some point.
(It was a similar join-with-a-reference-dataset use case.)
That was his AQL-bodied use case - so yes!
He had to "cheat" in one way - his query had a return clause - so he had 
to know the desired output schema ahead of time.
With Heri's new work on being able to "add one more field to whatever", 
one could now do this without that "cheat".

On 10/29/15 4:18 PM, Young-Seok Kim wrote:
> I'm not sure whether the following UDF is possible or not, but 
> hopefully it is.
> What we're trying to do is to have the following UDF.
>
> The UDF
> 1) accepts an incoming tweet record through a feed job as an input and 
> then
> 2) takes a field, more specifically, coordinate field value from the 
> tweet record and sends a spatial-intersect query using the coordinate 
> in order to find out the corresponding county of the coordinates (we 
> have created AsterixDB instance which stores US county shapes records 
> into a dataset and created R-tree index on the polygon field of it, so 
> the query will return the county effectively using the R-tree index)
> 3) creates a new tweet record consisting of the original record's 
> fields + the returned county value
> 4) ingests to a tweet dataset in the AsterixDB instance.
>
> Can we have such an UDF?
>
> Best,
> Young-Seok
>
>
> On Thu, Oct 29, 2015 at 3:53 PM, Heri Ramampiaro <heriram@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     >
>     > 2. Can we use AQL function in those kind of feed UDFs?
>
>     Can you give an example of what you are trying to do?
>     I.e do you want to run an AQL inside a UDF or use an AQL function as a
>     UDF connected to a running feed?
>
>     -heri
>
>


Re: Socket feed questions

Posted by Raman Grover <ra...@gmail.com>.
I should have an update on this in next couple of days.
Sorry for the delay.

Regards,
Raman

On Wed, Nov 18, 2015 at 3:27 PM, Mike Carey <dt...@gmail.com> wrote:

> Any update on this?
>
>
> On 11/2/15 5:54 AM, Raman Grover wrote:
>
>> Hi,
>>
>> I should be able to look at this one in a few days when I return and
>> resume
>> work (I am currently on a vacation till 5th Nov).
>>
>> However, the exception suggests that building the secondary feed pipeline
>> encountered an exception. Can you share the logs so that I can get a
>> better
>> understanding of the sequence of steps that happened as your statement
>> executed.
>>
>> I have intermittent access to net, but should be able to revert with some
>> delays.
>> On Nov 2, 2015 2:32 AM, "Jianfeng Jia" <ji...@gmail.com> wrote:
>>
>> I’ve tried the “apply AQLfunction” idea, but got an NullPointException:
>>> java.lang.NullPointerException
>>>          at
>>>
>>> org.apache.asterix.aql.translator.AqlTranslator.getFeedJointKey(AqlTranslator.java:2268)
>>>          at
>>>
>>> org.apache.asterix.aql.translator.AqlTranslator.getFeedConnectionRequest(AqlTranslator.java:2214)
>>>          at
>>>
>>> org.apache.asterix.aql.translator.AqlTranslator.handleConnectFeedStatement(AqlTranslator.java:2130)
>>>          at
>>>
>>> org.apache.asterix.aql.translator.AqlTranslator.compileAndExecute(AqlTranslator.java:362)
>>>          at
>>>
>>> org.apache.asterix.api.http.servlet.APIServlet.doPost(APIServlet.java:114)
>>>          at javax.servlet.http.HttpServlet.service(HttpServlet.java:754)
>>>          at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
>>>          at
>>> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:546)
>>>          at
>>>
>>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:483)
>>>          at
>>>
>>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>>>          at
>>>
>>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:970)
>>>          at
>>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:411)
>>>          at
>>>
>>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
>>>          at
>>>
>>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:904)
>>>          at
>>>
>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
>>>          at
>>>
>>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
>>>          at org.eclipse.jetty.server.Server.handle(Server.java:347)
>>>          at
>>>
>>> org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:439)
>>>          at
>>>
>>> org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:924)
>>>          at
>>> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:781)
>>>          at
>>> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:220)
>>>          at
>>>
>>> org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:43)
>>>          at
>>>
>>> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:545)
>>>          at
>>>
>>> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:43)
>>>          at
>>>
>>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529)
>>>          at java.lang.Thread.run(Thread.java:745)
>>>
>>> The ddl that I was using is here:
>>>
>>> https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql
>>> <
>>>
>>> https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql
>>> Anyone has any idea? Thank you!
>>>
>>> On Oct 30, 2015, at 12:30 AM, Heri Ramampiaro <he...@gmail.com> wrote:
>>>>
>>>> Although I haven’t tested this (I have mostly used & created java-based
>>>>
>>> UDFs),
>>>
>>>> one can execute AQL calls from within a UDF. Feeds allows functions to
>>>> execute arbitrary AQL statements (DDL, DMLs etc). I.e., I believe what
>>>> you are trying to do is possible.
>>>>
>>>> For example you could do:
>>>> (Given that you have a rec. type called Tweet, and a dataset
>>>>
>>> ProcessedTweets
>>>
>>>> create feed CoordTwitterFeed if not exists
>>>> using “push_twitter" (("type-name"="Tweet”))
>>>>        apply function find-intersection;
>>>>
>>>> (Here "find-intersection” is an AQL function that does the step
>>>>
>>> specified under 2)).
>>>
>>>> To do nr. 3 the easiest way is to have a modified version of the
>>>>
>>> “hashTag”
>>>
>>>> (let’s call this “tweetlocator”).  Java-based UDF (see my previous
>>>>
>>> message with the
>>>
>>>> TweetLib example). You can then connect this  as a secondary feed,
>>>>
>>> connected to
>>>
>>>> “CoordTwitterFeed”.
>>>>
>>>> E.g.:
>>>> create secondary feed ProcessedTwitterFeed from feed CoordTwitterFeed
>>>> apply function “tweelib#tweetlocator";
>>>>
>>>> Thereafter you can connect the feeds to appropriate datasets.
>>>>
>>>> (Since the find-intersection is an AQL function, you can also call
>>>>
>>> “tweelib#tweetlocator”
>>>
>>>> inside this function).
>>>>
>>>> Best,
>>>> -heri
>>>>
>>>> On Oct 30, 2015, at 12:18 AM, Young-Seok Kim <ki...@gmail.com> wrote:
>>>>>
>>>>> I'm not sure whether the following UDF is possible or not, but
>>>>>
>>>> hopefully it
>>>
>>>> is.
>>>>> What we're trying to do is to have the following UDF.
>>>>>
>>>>> The UDF
>>>>> 1) accepts an incoming tweet record through a feed job as an input and
>>>>>
>>>> then
>>>
>>>> 2) takes a field, more specifically, coordinate field value from the
>>>>>
>>>> tweet
>>>
>>>> record and sends a spatial-intersect query using the coordinate in
>>>>>
>>>> order to
>>>
>>>> find out the corresponding county of the coordinates (we have created
>>>>> AsterixDB instance which stores US county shapes records into a dataset
>>>>>
>>>> and
>>>
>>>> created R-tree index on the polygon field of it, so the query will
>>>>>
>>>> return
>>>
>>>> the county effectively using the R-tree index)
>>>>> 3) creates a new tweet record consisting of the original record's
>>>>>
>>>> fields +
>>>
>>>> the returned county value
>>>>> 4) ingests to a tweet dataset in the AsterixDB instance.
>>>>>
>>>>> Can we have such an UDF?
>>>>>
>>>>> Best,
>>>>> Young-Seok
>>>>>
>>>>>
>>>>> On Thu, Oct 29, 2015 at 3:53 PM, Heri Ramampiaro <he...@gmail.com>
>>>>>
>>>> wrote:
>>>
>>>> 2. Can we use AQL function in those kind of feed UDFs?
>>>>>>>
>>>>>> Can you give an example of what you are trying to do?
>>>>>> I.e do you want to run an AQL inside a UDF or use an AQL function as a
>>>>>> UDF connected to a running feed?
>>>>>>
>>>>>> -heri
>>>>>>
>>>>>
>>>
>>> Best,
>>>
>>> Jianfeng Jia
>>> PhD Candidate of Computer Science
>>> University of California, Irvine
>>>
>>>
>>>
>


-- 
Raman

Re: Socket feed questions

Posted by Mike Carey <dt...@gmail.com>.
Any update on this?

On 11/2/15 5:54 AM, Raman Grover wrote:
> Hi,
>
> I should be able to look at this one in a few days when I return and resume
> work (I am currently on a vacation till 5th Nov).
>
> However, the exception suggests that building the secondary feed pipeline
> encountered an exception. Can you share the logs so that I can get a better
> understanding of the sequence of steps that happened as your statement
> executed.
>
> I have intermittent access to net, but should be able to revert with some
> delays.
> On Nov 2, 2015 2:32 AM, "Jianfeng Jia" <ji...@gmail.com> wrote:
>
>> I’ve tried the “apply AQLfunction” idea, but got an NullPointException:
>> java.lang.NullPointerException
>>          at
>> org.apache.asterix.aql.translator.AqlTranslator.getFeedJointKey(AqlTranslator.java:2268)
>>          at
>> org.apache.asterix.aql.translator.AqlTranslator.getFeedConnectionRequest(AqlTranslator.java:2214)
>>          at
>> org.apache.asterix.aql.translator.AqlTranslator.handleConnectFeedStatement(AqlTranslator.java:2130)
>>          at
>> org.apache.asterix.aql.translator.AqlTranslator.compileAndExecute(AqlTranslator.java:362)
>>          at
>> org.apache.asterix.api.http.servlet.APIServlet.doPost(APIServlet.java:114)
>>          at javax.servlet.http.HttpServlet.service(HttpServlet.java:754)
>>          at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
>>          at
>> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:546)
>>          at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:483)
>>          at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>>          at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:970)
>>          at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:411)
>>          at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
>>          at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:904)
>>          at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
>>          at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
>>          at org.eclipse.jetty.server.Server.handle(Server.java:347)
>>          at
>> org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:439)
>>          at
>> org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:924)
>>          at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:781)
>>          at
>> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:220)
>>          at
>> org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:43)
>>          at
>> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:545)
>>          at
>> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:43)
>>          at
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529)
>>          at java.lang.Thread.run(Thread.java:745)
>>
>> The ddl that I was using is here:
>> https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql
>> <
>> https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql
>> Anyone has any idea? Thank you!
>>
>>> On Oct 30, 2015, at 12:30 AM, Heri Ramampiaro <he...@gmail.com> wrote:
>>>
>>> Although I haven’t tested this (I have mostly used & created java-based
>> UDFs),
>>> one can execute AQL calls from within a UDF. Feeds allows functions to
>>> execute arbitrary AQL statements (DDL, DMLs etc). I.e., I believe what
>>> you are trying to do is possible.
>>>
>>> For example you could do:
>>> (Given that you have a rec. type called Tweet, and a dataset
>> ProcessedTweets
>>> create feed CoordTwitterFeed if not exists
>>> using “push_twitter" (("type-name"="Tweet”))
>>>        apply function find-intersection;
>>>
>>> (Here "find-intersection” is an AQL function that does the step
>> specified under 2)).
>>> To do nr. 3 the easiest way is to have a modified version of the
>> “hashTag”
>>> (let’s call this “tweetlocator”).  Java-based UDF (see my previous
>> message with the
>>> TweetLib example). You can then connect this  as a secondary feed,
>> connected to
>>> “CoordTwitterFeed”.
>>>
>>> E.g.:
>>> create secondary feed ProcessedTwitterFeed from feed CoordTwitterFeed
>>> apply function “tweelib#tweetlocator";
>>>
>>> Thereafter you can connect the feeds to appropriate datasets.
>>>
>>> (Since the find-intersection is an AQL function, you can also call
>> “tweelib#tweetlocator”
>>> inside this function).
>>>
>>> Best,
>>> -heri
>>>
>>>> On Oct 30, 2015, at 12:18 AM, Young-Seok Kim <ki...@gmail.com> wrote:
>>>>
>>>> I'm not sure whether the following UDF is possible or not, but
>> hopefully it
>>>> is.
>>>> What we're trying to do is to have the following UDF.
>>>>
>>>> The UDF
>>>> 1) accepts an incoming tweet record through a feed job as an input and
>> then
>>>> 2) takes a field, more specifically, coordinate field value from the
>> tweet
>>>> record and sends a spatial-intersect query using the coordinate in
>> order to
>>>> find out the corresponding county of the coordinates (we have created
>>>> AsterixDB instance which stores US county shapes records into a dataset
>> and
>>>> created R-tree index on the polygon field of it, so the query will
>> return
>>>> the county effectively using the R-tree index)
>>>> 3) creates a new tweet record consisting of the original record's
>> fields +
>>>> the returned county value
>>>> 4) ingests to a tweet dataset in the AsterixDB instance.
>>>>
>>>> Can we have such an UDF?
>>>>
>>>> Best,
>>>> Young-Seok
>>>>
>>>>
>>>> On Thu, Oct 29, 2015 at 3:53 PM, Heri Ramampiaro <he...@gmail.com>
>> wrote:
>>>>>> 2. Can we use AQL function in those kind of feed UDFs?
>>>>> Can you give an example of what you are trying to do?
>>>>> I.e do you want to run an AQL inside a UDF or use an AQL function as a
>>>>> UDF connected to a running feed?
>>>>>
>>>>> -heri
>>
>>
>> Best,
>>
>> Jianfeng Jia
>> PhD Candidate of Computer Science
>> University of California, Irvine
>>
>>


Re: Socket feed questions

Posted by Jianfeng Jia <ji...@gmail.com>.
Sorry, it was overwritten by my current new clusters. The exception was happening on the ddl part, should nc be involved? 
I’ll provide the nc logs once I finish the current task on that cluster.

> On Nov 3, 2015, at 2:43 AM, Heri Ramampiaro <he...@gmail.com> wrote:
> 
> Do you have the nc log too?
> 
> -heri
> 
>> On Nov 2, 2015, at 6:25 PM, Jianfeng Jia <ji...@gmail.com> wrote:
>> 
>> <cc.logs>
> 



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine


Re: Socket feed questions

Posted by Jianfeng Jia <ji...@gmail.com>.
Sorry, it was overwritten by my current new clusters. The exception was happening on the ddl part, should nc be involved? 
I’ll provide the nc logs once I finish the current task on that cluster.

> On Nov 3, 2015, at 2:43 AM, Heri Ramampiaro <he...@gmail.com> wrote:
> 
> Do you have the nc log too?
> 
> -heri
> 
>> On Nov 2, 2015, at 6:25 PM, Jianfeng Jia <ji...@gmail.com> wrote:
>> 
>> <cc.logs>
> 



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine


Re: Socket feed questions

Posted by Heri Ramampiaro <he...@gmail.com>.
Do you have the nc log too?

-heri

> On Nov 2, 2015, at 6:25 PM, Jianfeng Jia <ji...@gmail.com> wrote:
> 
> <cc.logs>


Re: Socket feed questions

Posted by Raman Grover <ra...@gmail.com>.
Not yet Chen.

I was outside of U.S and resumed work past Monday. I have some catching up
to do.
I think, I can get to steal some cycles tomorrow or early next week.

I will keep you posted.

Regards,
Raman

On Thu, Nov 12, 2015 at 4:38 PM, Chen Li <ch...@gmail.com> wrote:

> @Raman: did you have a chance to look into test cases of using AQL UDF
> in data feed?
>
> Chen
>
> On Mon, Nov 2, 2015 at 9:25 AM, Jianfeng Jia <ji...@gmail.com>
> wrote:
> > Great, here is the cc log:
> >
> > Thank you!
> >
> > On Nov 2, 2015, at 5:54 AM, Raman Grover <ra...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > I should be able to look at this one in a few days when I return and
> resume
> > work (I am currently on a vacation till 5th Nov).
> >
> > However, the exception suggests that building the secondary feed pipeline
> > encountered an exception. Can you share the logs so that I can get a
> better
> > understanding of the sequence of steps that happened as your statement
> > executed.
> >
> > I have intermittent access to net, but should be able to revert with some
> > delays.
> > On Nov 2, 2015 2:32 AM, "Jianfeng Jia" <ji...@gmail.com> wrote:
> >
> > I’ve tried the “apply AQLfunction” idea, but got an NullPointException:
> > java.lang.NullPointerException
> >        at
> >
> org.apache.asterix.aql.translator.AqlTranslator.getFeedJointKey(AqlTranslator.java:2268)
> >        at
> >
> org.apache.asterix.aql.translator.AqlTranslator.getFeedConnectionRequest(AqlTranslator.java:2214)
> >        at
> >
> org.apache.asterix.aql.translator.AqlTranslator.handleConnectFeedStatement(AqlTranslator.java:2130)
> >        at
> >
> org.apache.asterix.aql.translator.AqlTranslator.compileAndExecute(AqlTranslator.java:362)
> >        at
> >
> org.apache.asterix.api.http.servlet.APIServlet.doPost(APIServlet.java:114)
> >        at javax.servlet.http.HttpServlet.service(HttpServlet.java:754)
> >        at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
> >        at
> > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:546)
> >        at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:483)
> >        at
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> >        at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:970)
> >        at
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:411)
> >        at
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
> >        at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:904)
> >        at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
> >        at
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
> >        at org.eclipse.jetty.server.Server.handle(Server.java:347)
> >        at
> >
> org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:439)
> >        at
> >
> org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:924)
> >        at
> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:781)
> >        at
> > org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:220)
> >        at
> >
> org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:43)
> >        at
> >
> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:545)
> >        at
> >
> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:43)
> >        at
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529)
> >        at java.lang.Thread.run(Thread.java:745)
> >
> > The ddl that I was using is here:
> >
> https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql
> > <
> >
> https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql
> >
> >
> > Anyone has any idea? Thank you!
> >
> > On Oct 30, 2015, at 12:30 AM, Heri Ramampiaro <he...@gmail.com> wrote:
> >
> > Although I haven’t tested this (I have mostly used & created java-based
> >
> > UDFs),
> >
> > one can execute AQL calls from within a UDF. Feeds allows functions to
> > execute arbitrary AQL statements (DDL, DMLs etc). I.e., I believe what
> > you are trying to do is possible.
> >
> > For example you could do:
> > (Given that you have a rec. type called Tweet, and a dataset
> >
> > ProcessedTweets
> >
> >
> > create feed CoordTwitterFeed if not exists
> > using “push_twitter" (("type-name"="Tweet”))
> >      apply function find-intersection;
> >
> > (Here "find-intersection” is an AQL function that does the step
> >
> > specified under 2)).
> >
> >
> > To do nr. 3 the easiest way is to have a modified version of the
> >
> > “hashTag”
> >
> > (let’s call this “tweetlocator”).  Java-based UDF (see my previous
> >
> > message with the
> >
> > TweetLib example). You can then connect this  as a secondary feed,
> >
> > connected to
> >
> > “CoordTwitterFeed”.
> >
> > E.g.:
> > create secondary feed ProcessedTwitterFeed from feed CoordTwitterFeed
> > apply function “tweelib#tweetlocator";
> >
> > Thereafter you can connect the feeds to appropriate datasets.
> >
> > (Since the find-intersection is an AQL function, you can also call
> >
> > “tweelib#tweetlocator”
> >
> > inside this function).
> >
> > Best,
> > -heri
> >
> > On Oct 30, 2015, at 12:18 AM, Young-Seok Kim <ki...@gmail.com> wrote:
> >
> > I'm not sure whether the following UDF is possible or not, but
> >
> > hopefully it
> >
> > is.
> > What we're trying to do is to have the following UDF.
> >
> > The UDF
> > 1) accepts an incoming tweet record through a feed job as an input and
> >
> > then
> >
> > 2) takes a field, more specifically, coordinate field value from the
> >
> > tweet
> >
> > record and sends a spatial-intersect query using the coordinate in
> >
> > order to
> >
> > find out the corresponding county of the coordinates (we have created
> > AsterixDB instance which stores US county shapes records into a dataset
> >
> > and
> >
> > created R-tree index on the polygon field of it, so the query will
> >
> > return
> >
> > the county effectively using the R-tree index)
> > 3) creates a new tweet record consisting of the original record's
> >
> > fields +
> >
> > the returned county value
> > 4) ingests to a tweet dataset in the AsterixDB instance.
> >
> > Can we have such an UDF?
> >
> > Best,
> > Young-Seok
> >
> >
> > On Thu, Oct 29, 2015 at 3:53 PM, Heri Ramampiaro <he...@gmail.com>
> >
> > wrote:
> >
> >
> >
> > 2. Can we use AQL function in those kind of feed UDFs?
> >
> >
> > Can you give an example of what you are trying to do?
> > I.e do you want to run an AQL inside a UDF or use an AQL function as a
> > UDF connected to a running feed?
> >
> > -heri
> >
> >
> >
> >
> >
> > Best,
> >
> > Jianfeng Jia
> > PhD Candidate of Computer Science
> > University of California, Irvine
> >
> >
> >
> >
> > Best,
> >
> > Jianfeng Jia
> > PhD Candidate of Computer Science
> > University of California, Irvine
> >
> >
>



-- 
Raman

Re: Socket feed questions

Posted by Raman Grover <ra...@gmail.com>.
Not yet Chen.

I was outside of U.S and resumed work past Monday. I have some catching up
to do.
I think, I can get to steal some cycles tomorrow or early next week.

I will keep you posted.

Regards,
Raman

On Thu, Nov 12, 2015 at 4:38 PM, Chen Li <ch...@gmail.com> wrote:

> @Raman: did you have a chance to look into test cases of using AQL UDF
> in data feed?
>
> Chen
>
> On Mon, Nov 2, 2015 at 9:25 AM, Jianfeng Jia <ji...@gmail.com>
> wrote:
> > Great, here is the cc log:
> >
> > Thank you!
> >
> > On Nov 2, 2015, at 5:54 AM, Raman Grover <ra...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > I should be able to look at this one in a few days when I return and
> resume
> > work (I am currently on a vacation till 5th Nov).
> >
> > However, the exception suggests that building the secondary feed pipeline
> > encountered an exception. Can you share the logs so that I can get a
> better
> > understanding of the sequence of steps that happened as your statement
> > executed.
> >
> > I have intermittent access to net, but should be able to revert with some
> > delays.
> > On Nov 2, 2015 2:32 AM, "Jianfeng Jia" <ji...@gmail.com> wrote:
> >
> > I’ve tried the “apply AQLfunction” idea, but got an NullPointException:
> > java.lang.NullPointerException
> >        at
> >
> org.apache.asterix.aql.translator.AqlTranslator.getFeedJointKey(AqlTranslator.java:2268)
> >        at
> >
> org.apache.asterix.aql.translator.AqlTranslator.getFeedConnectionRequest(AqlTranslator.java:2214)
> >        at
> >
> org.apache.asterix.aql.translator.AqlTranslator.handleConnectFeedStatement(AqlTranslator.java:2130)
> >        at
> >
> org.apache.asterix.aql.translator.AqlTranslator.compileAndExecute(AqlTranslator.java:362)
> >        at
> >
> org.apache.asterix.api.http.servlet.APIServlet.doPost(APIServlet.java:114)
> >        at javax.servlet.http.HttpServlet.service(HttpServlet.java:754)
> >        at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
> >        at
> > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:546)
> >        at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:483)
> >        at
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> >        at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:970)
> >        at
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:411)
> >        at
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
> >        at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:904)
> >        at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
> >        at
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
> >        at org.eclipse.jetty.server.Server.handle(Server.java:347)
> >        at
> >
> org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:439)
> >        at
> >
> org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:924)
> >        at
> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:781)
> >        at
> > org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:220)
> >        at
> >
> org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:43)
> >        at
> >
> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:545)
> >        at
> >
> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:43)
> >        at
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529)
> >        at java.lang.Thread.run(Thread.java:745)
> >
> > The ddl that I was using is here:
> >
> https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql
> > <
> >
> https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql
> >
> >
> > Anyone has any idea? Thank you!
> >
> > On Oct 30, 2015, at 12:30 AM, Heri Ramampiaro <he...@gmail.com> wrote:
> >
> > Although I haven’t tested this (I have mostly used & created java-based
> >
> > UDFs),
> >
> > one can execute AQL calls from within a UDF. Feeds allows functions to
> > execute arbitrary AQL statements (DDL, DMLs etc). I.e., I believe what
> > you are trying to do is possible.
> >
> > For example you could do:
> > (Given that you have a rec. type called Tweet, and a dataset
> >
> > ProcessedTweets
> >
> >
> > create feed CoordTwitterFeed if not exists
> > using “push_twitter" (("type-name"="Tweet”))
> >      apply function find-intersection;
> >
> > (Here "find-intersection” is an AQL function that does the step
> >
> > specified under 2)).
> >
> >
> > To do nr. 3 the easiest way is to have a modified version of the
> >
> > “hashTag”
> >
> > (let’s call this “tweetlocator”).  Java-based UDF (see my previous
> >
> > message with the
> >
> > TweetLib example). You can then connect this  as a secondary feed,
> >
> > connected to
> >
> > “CoordTwitterFeed”.
> >
> > E.g.:
> > create secondary feed ProcessedTwitterFeed from feed CoordTwitterFeed
> > apply function “tweelib#tweetlocator";
> >
> > Thereafter you can connect the feeds to appropriate datasets.
> >
> > (Since the find-intersection is an AQL function, you can also call
> >
> > “tweelib#tweetlocator”
> >
> > inside this function).
> >
> > Best,
> > -heri
> >
> > On Oct 30, 2015, at 12:18 AM, Young-Seok Kim <ki...@gmail.com> wrote:
> >
> > I'm not sure whether the following UDF is possible or not, but
> >
> > hopefully it
> >
> > is.
> > What we're trying to do is to have the following UDF.
> >
> > The UDF
> > 1) accepts an incoming tweet record through a feed job as an input and
> >
> > then
> >
> > 2) takes a field, more specifically, coordinate field value from the
> >
> > tweet
> >
> > record and sends a spatial-intersect query using the coordinate in
> >
> > order to
> >
> > find out the corresponding county of the coordinates (we have created
> > AsterixDB instance which stores US county shapes records into a dataset
> >
> > and
> >
> > created R-tree index on the polygon field of it, so the query will
> >
> > return
> >
> > the county effectively using the R-tree index)
> > 3) creates a new tweet record consisting of the original record's
> >
> > fields +
> >
> > the returned county value
> > 4) ingests to a tweet dataset in the AsterixDB instance.
> >
> > Can we have such an UDF?
> >
> > Best,
> > Young-Seok
> >
> >
> > On Thu, Oct 29, 2015 at 3:53 PM, Heri Ramampiaro <he...@gmail.com>
> >
> > wrote:
> >
> >
> >
> > 2. Can we use AQL function in those kind of feed UDFs?
> >
> >
> > Can you give an example of what you are trying to do?
> > I.e do you want to run an AQL inside a UDF or use an AQL function as a
> > UDF connected to a running feed?
> >
> > -heri
> >
> >
> >
> >
> >
> > Best,
> >
> > Jianfeng Jia
> > PhD Candidate of Computer Science
> > University of California, Irvine
> >
> >
> >
> >
> > Best,
> >
> > Jianfeng Jia
> > PhD Candidate of Computer Science
> > University of California, Irvine
> >
> >
>



-- 
Raman

Re: Socket feed questions

Posted by Chen Li <ch...@gmail.com>.
@Raman: did you have a chance to look into test cases of using AQL UDF
in data feed?

Chen

On Mon, Nov 2, 2015 at 9:25 AM, Jianfeng Jia <ji...@gmail.com> wrote:
> Great, here is the cc log:
>
> Thank you!
>
> On Nov 2, 2015, at 5:54 AM, Raman Grover <ra...@gmail.com> wrote:
>
> Hi,
>
> I should be able to look at this one in a few days when I return and resume
> work (I am currently on a vacation till 5th Nov).
>
> However, the exception suggests that building the secondary feed pipeline
> encountered an exception. Can you share the logs so that I can get a better
> understanding of the sequence of steps that happened as your statement
> executed.
>
> I have intermittent access to net, but should be able to revert with some
> delays.
> On Nov 2, 2015 2:32 AM, "Jianfeng Jia" <ji...@gmail.com> wrote:
>
> I’ve tried the “apply AQLfunction” idea, but got an NullPointException:
> java.lang.NullPointerException
>        at
> org.apache.asterix.aql.translator.AqlTranslator.getFeedJointKey(AqlTranslator.java:2268)
>        at
> org.apache.asterix.aql.translator.AqlTranslator.getFeedConnectionRequest(AqlTranslator.java:2214)
>        at
> org.apache.asterix.aql.translator.AqlTranslator.handleConnectFeedStatement(AqlTranslator.java:2130)
>        at
> org.apache.asterix.aql.translator.AqlTranslator.compileAndExecute(AqlTranslator.java:362)
>        at
> org.apache.asterix.api.http.servlet.APIServlet.doPost(APIServlet.java:114)
>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:754)
>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
>        at
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:546)
>        at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:483)
>        at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>        at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:970)
>        at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:411)
>        at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
>        at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:904)
>        at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
>        at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
>        at org.eclipse.jetty.server.Server.handle(Server.java:347)
>        at
> org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:439)
>        at
> org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:924)
>        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:781)
>        at
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:220)
>        at
> org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:43)
>        at
> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:545)
>        at
> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:43)
>        at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529)
>        at java.lang.Thread.run(Thread.java:745)
>
> The ddl that I was using is here:
> https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql
> <
> https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql
>
>
> Anyone has any idea? Thank you!
>
> On Oct 30, 2015, at 12:30 AM, Heri Ramampiaro <he...@gmail.com> wrote:
>
> Although I haven’t tested this (I have mostly used & created java-based
>
> UDFs),
>
> one can execute AQL calls from within a UDF. Feeds allows functions to
> execute arbitrary AQL statements (DDL, DMLs etc). I.e., I believe what
> you are trying to do is possible.
>
> For example you could do:
> (Given that you have a rec. type called Tweet, and a dataset
>
> ProcessedTweets
>
>
> create feed CoordTwitterFeed if not exists
> using “push_twitter" (("type-name"="Tweet”))
>      apply function find-intersection;
>
> (Here "find-intersection” is an AQL function that does the step
>
> specified under 2)).
>
>
> To do nr. 3 the easiest way is to have a modified version of the
>
> “hashTag”
>
> (let’s call this “tweetlocator”).  Java-based UDF (see my previous
>
> message with the
>
> TweetLib example). You can then connect this  as a secondary feed,
>
> connected to
>
> “CoordTwitterFeed”.
>
> E.g.:
> create secondary feed ProcessedTwitterFeed from feed CoordTwitterFeed
> apply function “tweelib#tweetlocator";
>
> Thereafter you can connect the feeds to appropriate datasets.
>
> (Since the find-intersection is an AQL function, you can also call
>
> “tweelib#tweetlocator”
>
> inside this function).
>
> Best,
> -heri
>
> On Oct 30, 2015, at 12:18 AM, Young-Seok Kim <ki...@gmail.com> wrote:
>
> I'm not sure whether the following UDF is possible or not, but
>
> hopefully it
>
> is.
> What we're trying to do is to have the following UDF.
>
> The UDF
> 1) accepts an incoming tweet record through a feed job as an input and
>
> then
>
> 2) takes a field, more specifically, coordinate field value from the
>
> tweet
>
> record and sends a spatial-intersect query using the coordinate in
>
> order to
>
> find out the corresponding county of the coordinates (we have created
> AsterixDB instance which stores US county shapes records into a dataset
>
> and
>
> created R-tree index on the polygon field of it, so the query will
>
> return
>
> the county effectively using the R-tree index)
> 3) creates a new tweet record consisting of the original record's
>
> fields +
>
> the returned county value
> 4) ingests to a tweet dataset in the AsterixDB instance.
>
> Can we have such an UDF?
>
> Best,
> Young-Seok
>
>
> On Thu, Oct 29, 2015 at 3:53 PM, Heri Ramampiaro <he...@gmail.com>
>
> wrote:
>
>
>
> 2. Can we use AQL function in those kind of feed UDFs?
>
>
> Can you give an example of what you are trying to do?
> I.e do you want to run an AQL inside a UDF or use an AQL function as a
> UDF connected to a running feed?
>
> -heri
>
>
>
>
>
> Best,
>
> Jianfeng Jia
> PhD Candidate of Computer Science
> University of California, Irvine
>
>
>
>
> Best,
>
> Jianfeng Jia
> PhD Candidate of Computer Science
> University of California, Irvine
>
>

Re: Socket feed questions

Posted by Heri Ramampiaro <he...@gmail.com>.
Do you have the nc log too?

-heri

> On Nov 2, 2015, at 6:25 PM, Jianfeng Jia <ji...@gmail.com> wrote:
> 
> <cc.logs>


Re: Socket feed questions

Posted by Chen Li <ch...@gmail.com>.
@Raman: did you have a chance to look into test cases of using AQL UDF
in data feed?

Chen

On Mon, Nov 2, 2015 at 9:25 AM, Jianfeng Jia <ji...@gmail.com> wrote:
> Great, here is the cc log:
>
> Thank you!
>
> On Nov 2, 2015, at 5:54 AM, Raman Grover <ra...@gmail.com> wrote:
>
> Hi,
>
> I should be able to look at this one in a few days when I return and resume
> work (I am currently on a vacation till 5th Nov).
>
> However, the exception suggests that building the secondary feed pipeline
> encountered an exception. Can you share the logs so that I can get a better
> understanding of the sequence of steps that happened as your statement
> executed.
>
> I have intermittent access to net, but should be able to revert with some
> delays.
> On Nov 2, 2015 2:32 AM, "Jianfeng Jia" <ji...@gmail.com> wrote:
>
> I’ve tried the “apply AQLfunction” idea, but got an NullPointException:
> java.lang.NullPointerException
>        at
> org.apache.asterix.aql.translator.AqlTranslator.getFeedJointKey(AqlTranslator.java:2268)
>        at
> org.apache.asterix.aql.translator.AqlTranslator.getFeedConnectionRequest(AqlTranslator.java:2214)
>        at
> org.apache.asterix.aql.translator.AqlTranslator.handleConnectFeedStatement(AqlTranslator.java:2130)
>        at
> org.apache.asterix.aql.translator.AqlTranslator.compileAndExecute(AqlTranslator.java:362)
>        at
> org.apache.asterix.api.http.servlet.APIServlet.doPost(APIServlet.java:114)
>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:754)
>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
>        at
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:546)
>        at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:483)
>        at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>        at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:970)
>        at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:411)
>        at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
>        at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:904)
>        at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
>        at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
>        at org.eclipse.jetty.server.Server.handle(Server.java:347)
>        at
> org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:439)
>        at
> org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:924)
>        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:781)
>        at
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:220)
>        at
> org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:43)
>        at
> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:545)
>        at
> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:43)
>        at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529)
>        at java.lang.Thread.run(Thread.java:745)
>
> The ddl that I was using is here:
> https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql
> <
> https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql
>
>
> Anyone has any idea? Thank you!
>
> On Oct 30, 2015, at 12:30 AM, Heri Ramampiaro <he...@gmail.com> wrote:
>
> Although I haven’t tested this (I have mostly used & created java-based
>
> UDFs),
>
> one can execute AQL calls from within a UDF. Feeds allows functions to
> execute arbitrary AQL statements (DDL, DMLs etc). I.e., I believe what
> you are trying to do is possible.
>
> For example you could do:
> (Given that you have a rec. type called Tweet, and a dataset
>
> ProcessedTweets
>
>
> create feed CoordTwitterFeed if not exists
> using “push_twitter" (("type-name"="Tweet”))
>      apply function find-intersection;
>
> (Here "find-intersection” is an AQL function that does the step
>
> specified under 2)).
>
>
> To do nr. 3 the easiest way is to have a modified version of the
>
> “hashTag”
>
> (let’s call this “tweetlocator”).  Java-based UDF (see my previous
>
> message with the
>
> TweetLib example). You can then connect this  as a secondary feed,
>
> connected to
>
> “CoordTwitterFeed”.
>
> E.g.:
> create secondary feed ProcessedTwitterFeed from feed CoordTwitterFeed
> apply function “tweelib#tweetlocator";
>
> Thereafter you can connect the feeds to appropriate datasets.
>
> (Since the find-intersection is an AQL function, you can also call
>
> “tweelib#tweetlocator”
>
> inside this function).
>
> Best,
> -heri
>
> On Oct 30, 2015, at 12:18 AM, Young-Seok Kim <ki...@gmail.com> wrote:
>
> I'm not sure whether the following UDF is possible or not, but
>
> hopefully it
>
> is.
> What we're trying to do is to have the following UDF.
>
> The UDF
> 1) accepts an incoming tweet record through a feed job as an input and
>
> then
>
> 2) takes a field, more specifically, coordinate field value from the
>
> tweet
>
> record and sends a spatial-intersect query using the coordinate in
>
> order to
>
> find out the corresponding county of the coordinates (we have created
> AsterixDB instance which stores US county shapes records into a dataset
>
> and
>
> created R-tree index on the polygon field of it, so the query will
>
> return
>
> the county effectively using the R-tree index)
> 3) creates a new tweet record consisting of the original record's
>
> fields +
>
> the returned county value
> 4) ingests to a tweet dataset in the AsterixDB instance.
>
> Can we have such an UDF?
>
> Best,
> Young-Seok
>
>
> On Thu, Oct 29, 2015 at 3:53 PM, Heri Ramampiaro <he...@gmail.com>
>
> wrote:
>
>
>
> 2. Can we use AQL function in those kind of feed UDFs?
>
>
> Can you give an example of what you are trying to do?
> I.e do you want to run an AQL inside a UDF or use an AQL function as a
> UDF connected to a running feed?
>
> -heri
>
>
>
>
>
> Best,
>
> Jianfeng Jia
> PhD Candidate of Computer Science
> University of California, Irvine
>
>
>
>
> Best,
>
> Jianfeng Jia
> PhD Candidate of Computer Science
> University of California, Irvine
>
>

Re: Socket feed questions

Posted by Jianfeng Jia <ji...@gmail.com>.
Great, here is the cc log:

Thank you!

> On Nov 2, 2015, at 5:54 AM, Raman Grover <ra...@gmail.com> wrote:
> 
> Hi,
> 
> I should be able to look at this one in a few days when I return and resume
> work (I am currently on a vacation till 5th Nov).
> 
> However, the exception suggests that building the secondary feed pipeline
> encountered an exception. Can you share the logs so that I can get a better
> understanding of the sequence of steps that happened as your statement
> executed.
> 
> I have intermittent access to net, but should be able to revert with some
> delays.
> On Nov 2, 2015 2:32 AM, "Jianfeng Jia" <jianfeng.jia@gmail.com <ma...@gmail.com>> wrote:
> 
>> I’ve tried the “apply AQLfunction” idea, but got an NullPointException:
>> java.lang.NullPointerException
>>        at
>> org.apache.asterix.aql.translator.AqlTranslator.getFeedJointKey(AqlTranslator.java:2268)
>>        at
>> org.apache.asterix.aql.translator.AqlTranslator.getFeedConnectionRequest(AqlTranslator.java:2214)
>>        at
>> org.apache.asterix.aql.translator.AqlTranslator.handleConnectFeedStatement(AqlTranslator.java:2130)
>>        at
>> org.apache.asterix.aql.translator.AqlTranslator.compileAndExecute(AqlTranslator.java:362)
>>        at
>> org.apache.asterix.api.http.servlet.APIServlet.doPost(APIServlet.java:114)
>>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:754)
>>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
>>        at
>> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:546)
>>        at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:483)
>>        at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>>        at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:970)
>>        at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:411)
>>        at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
>>        at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:904)
>>        at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
>>        at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
>>        at org.eclipse.jetty.server.Server.handle(Server.java:347)
>>        at
>> org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:439)
>>        at
>> org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:924)
>>        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:781)
>>        at
>> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:220)
>>        at
>> org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:43)
>>        at
>> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:545)
>>        at
>> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:43)
>>        at
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529)
>>        at java.lang.Thread.run(Thread.java:745)
>> 
>> The ddl that I was using is here:
>> https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql
>> <
>> https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql <https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql>
>>> 
>> Anyone has any idea? Thank you!
>> 
>>> On Oct 30, 2015, at 12:30 AM, Heri Ramampiaro <he...@gmail.com> wrote:
>>> 
>>> Although I haven’t tested this (I have mostly used & created java-based
>> UDFs),
>>> one can execute AQL calls from within a UDF. Feeds allows functions to
>>> execute arbitrary AQL statements (DDL, DMLs etc). I.e., I believe what
>>> you are trying to do is possible.
>>> 
>>> For example you could do:
>>> (Given that you have a rec. type called Tweet, and a dataset
>> ProcessedTweets
>>> 
>>> create feed CoordTwitterFeed if not exists
>>> using “push_twitter" (("type-name"="Tweet”))
>>>      apply function find-intersection;
>>> 
>>> (Here "find-intersection” is an AQL function that does the step
>> specified under 2)).
>>> 
>>> To do nr. 3 the easiest way is to have a modified version of the
>> “hashTag”
>>> (let’s call this “tweetlocator”).  Java-based UDF (see my previous
>> message with the
>>> TweetLib example). You can then connect this  as a secondary feed,
>> connected to
>>> “CoordTwitterFeed”.
>>> 
>>> E.g.:
>>> create secondary feed ProcessedTwitterFeed from feed CoordTwitterFeed
>>> apply function “tweelib#tweetlocator";
>>> 
>>> Thereafter you can connect the feeds to appropriate datasets.
>>> 
>>> (Since the find-intersection is an AQL function, you can also call
>> “tweelib#tweetlocator”
>>> inside this function).
>>> 
>>> Best,
>>> -heri
>>> 
>>>> On Oct 30, 2015, at 12:18 AM, Young-Seok Kim <ki...@gmail.com> wrote:
>>>> 
>>>> I'm not sure whether the following UDF is possible or not, but
>> hopefully it
>>>> is.
>>>> What we're trying to do is to have the following UDF.
>>>> 
>>>> The UDF
>>>> 1) accepts an incoming tweet record through a feed job as an input and
>> then
>>>> 2) takes a field, more specifically, coordinate field value from the
>> tweet
>>>> record and sends a spatial-intersect query using the coordinate in
>> order to
>>>> find out the corresponding county of the coordinates (we have created
>>>> AsterixDB instance which stores US county shapes records into a dataset
>> and
>>>> created R-tree index on the polygon field of it, so the query will
>> return
>>>> the county effectively using the R-tree index)
>>>> 3) creates a new tweet record consisting of the original record's
>> fields +
>>>> the returned county value
>>>> 4) ingests to a tweet dataset in the AsterixDB instance.
>>>> 
>>>> Can we have such an UDF?
>>>> 
>>>> Best,
>>>> Young-Seok
>>>> 
>>>> 
>>>> On Thu, Oct 29, 2015 at 3:53 PM, Heri Ramampiaro <he...@gmail.com>
>> wrote:
>>>> 
>>>>>> 
>>>>>> 2. Can we use AQL function in those kind of feed UDFs?
>>>>> 
>>>>> Can you give an example of what you are trying to do?
>>>>> I.e do you want to run an AQL inside a UDF or use an AQL function as a
>>>>> UDF connected to a running feed?
>>>>> 
>>>>> -heri
>>> 
>> 
>> 
>> 
>> Best,
>> 
>> Jianfeng Jia
>> PhD Candidate of Computer Science
>> University of California, Irvine



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine


Re: Socket feed questions

Posted by Jianfeng Jia <ji...@gmail.com>.
Great, here is the cc log:

Thank you!

> On Nov 2, 2015, at 5:54 AM, Raman Grover <ra...@gmail.com> wrote:
> 
> Hi,
> 
> I should be able to look at this one in a few days when I return and resume
> work (I am currently on a vacation till 5th Nov).
> 
> However, the exception suggests that building the secondary feed pipeline
> encountered an exception. Can you share the logs so that I can get a better
> understanding of the sequence of steps that happened as your statement
> executed.
> 
> I have intermittent access to net, but should be able to revert with some
> delays.
> On Nov 2, 2015 2:32 AM, "Jianfeng Jia" <jianfeng.jia@gmail.com <ma...@gmail.com>> wrote:
> 
>> I’ve tried the “apply AQLfunction” idea, but got an NullPointException:
>> java.lang.NullPointerException
>>        at
>> org.apache.asterix.aql.translator.AqlTranslator.getFeedJointKey(AqlTranslator.java:2268)
>>        at
>> org.apache.asterix.aql.translator.AqlTranslator.getFeedConnectionRequest(AqlTranslator.java:2214)
>>        at
>> org.apache.asterix.aql.translator.AqlTranslator.handleConnectFeedStatement(AqlTranslator.java:2130)
>>        at
>> org.apache.asterix.aql.translator.AqlTranslator.compileAndExecute(AqlTranslator.java:362)
>>        at
>> org.apache.asterix.api.http.servlet.APIServlet.doPost(APIServlet.java:114)
>>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:754)
>>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
>>        at
>> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:546)
>>        at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:483)
>>        at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>>        at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:970)
>>        at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:411)
>>        at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
>>        at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:904)
>>        at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
>>        at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
>>        at org.eclipse.jetty.server.Server.handle(Server.java:347)
>>        at
>> org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:439)
>>        at
>> org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:924)
>>        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:781)
>>        at
>> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:220)
>>        at
>> org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:43)
>>        at
>> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:545)
>>        at
>> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:43)
>>        at
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529)
>>        at java.lang.Thread.run(Thread.java:745)
>> 
>> The ddl that I was using is here:
>> https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql
>> <
>> https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql <https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql>
>>> 
>> Anyone has any idea? Thank you!
>> 
>>> On Oct 30, 2015, at 12:30 AM, Heri Ramampiaro <he...@gmail.com> wrote:
>>> 
>>> Although I haven’t tested this (I have mostly used & created java-based
>> UDFs),
>>> one can execute AQL calls from within a UDF. Feeds allows functions to
>>> execute arbitrary AQL statements (DDL, DMLs etc). I.e., I believe what
>>> you are trying to do is possible.
>>> 
>>> For example you could do:
>>> (Given that you have a rec. type called Tweet, and a dataset
>> ProcessedTweets
>>> 
>>> create feed CoordTwitterFeed if not exists
>>> using “push_twitter" (("type-name"="Tweet”))
>>>      apply function find-intersection;
>>> 
>>> (Here "find-intersection” is an AQL function that does the step
>> specified under 2)).
>>> 
>>> To do nr. 3 the easiest way is to have a modified version of the
>> “hashTag”
>>> (let’s call this “tweetlocator”).  Java-based UDF (see my previous
>> message with the
>>> TweetLib example). You can then connect this  as a secondary feed,
>> connected to
>>> “CoordTwitterFeed”.
>>> 
>>> E.g.:
>>> create secondary feed ProcessedTwitterFeed from feed CoordTwitterFeed
>>> apply function “tweelib#tweetlocator";
>>> 
>>> Thereafter you can connect the feeds to appropriate datasets.
>>> 
>>> (Since the find-intersection is an AQL function, you can also call
>> “tweelib#tweetlocator”
>>> inside this function).
>>> 
>>> Best,
>>> -heri
>>> 
>>>> On Oct 30, 2015, at 12:18 AM, Young-Seok Kim <ki...@gmail.com> wrote:
>>>> 
>>>> I'm not sure whether the following UDF is possible or not, but
>> hopefully it
>>>> is.
>>>> What we're trying to do is to have the following UDF.
>>>> 
>>>> The UDF
>>>> 1) accepts an incoming tweet record through a feed job as an input and
>> then
>>>> 2) takes a field, more specifically, coordinate field value from the
>> tweet
>>>> record and sends a spatial-intersect query using the coordinate in
>> order to
>>>> find out the corresponding county of the coordinates (we have created
>>>> AsterixDB instance which stores US county shapes records into a dataset
>> and
>>>> created R-tree index on the polygon field of it, so the query will
>> return
>>>> the county effectively using the R-tree index)
>>>> 3) creates a new tweet record consisting of the original record's
>> fields +
>>>> the returned county value
>>>> 4) ingests to a tweet dataset in the AsterixDB instance.
>>>> 
>>>> Can we have such an UDF?
>>>> 
>>>> Best,
>>>> Young-Seok
>>>> 
>>>> 
>>>> On Thu, Oct 29, 2015 at 3:53 PM, Heri Ramampiaro <he...@gmail.com>
>> wrote:
>>>> 
>>>>>> 
>>>>>> 2. Can we use AQL function in those kind of feed UDFs?
>>>>> 
>>>>> Can you give an example of what you are trying to do?
>>>>> I.e do you want to run an AQL inside a UDF or use an AQL function as a
>>>>> UDF connected to a running feed?
>>>>> 
>>>>> -heri
>>> 
>> 
>> 
>> 
>> Best,
>> 
>> Jianfeng Jia
>> PhD Candidate of Computer Science
>> University of California, Irvine



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine


Re: Socket feed questions

Posted by Raman Grover <ra...@gmail.com>.
Hi,

I should be able to look at this one in a few days when I return and resume
work (I am currently on a vacation till 5th Nov).

However, the exception suggests that building the secondary feed pipeline
encountered an exception. Can you share the logs so that I can get a better
understanding of the sequence of steps that happened as your statement
executed.

I have intermittent access to net, but should be able to revert with some
delays.
On Nov 2, 2015 2:32 AM, "Jianfeng Jia" <ji...@gmail.com> wrote:

> I’ve tried the “apply AQLfunction” idea, but got an NullPointException:
> java.lang.NullPointerException
>         at
> org.apache.asterix.aql.translator.AqlTranslator.getFeedJointKey(AqlTranslator.java:2268)
>         at
> org.apache.asterix.aql.translator.AqlTranslator.getFeedConnectionRequest(AqlTranslator.java:2214)
>         at
> org.apache.asterix.aql.translator.AqlTranslator.handleConnectFeedStatement(AqlTranslator.java:2130)
>         at
> org.apache.asterix.aql.translator.AqlTranslator.compileAndExecute(AqlTranslator.java:362)
>         at
> org.apache.asterix.api.http.servlet.APIServlet.doPost(APIServlet.java:114)
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:754)
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
>         at
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:546)
>         at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:483)
>         at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>         at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:970)
>         at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:411)
>         at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
>         at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:904)
>         at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
>         at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
>         at org.eclipse.jetty.server.Server.handle(Server.java:347)
>         at
> org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:439)
>         at
> org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:924)
>         at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:781)
>         at
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:220)
>         at
> org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:43)
>         at
> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:545)
>         at
> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:43)
>         at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529)
>         at java.lang.Thread.run(Thread.java:745)
>
> The ddl that I was using is here:
> https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql
> <
> https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql
> >
> Anyone has any idea? Thank you!
>
> > On Oct 30, 2015, at 12:30 AM, Heri Ramampiaro <he...@gmail.com> wrote:
> >
> > Although I haven’t tested this (I have mostly used & created java-based
> UDFs),
> > one can execute AQL calls from within a UDF. Feeds allows functions to
> > execute arbitrary AQL statements (DDL, DMLs etc). I.e., I believe what
> > you are trying to do is possible.
> >
> > For example you could do:
> > (Given that you have a rec. type called Tweet, and a dataset
> ProcessedTweets
> >
> > create feed CoordTwitterFeed if not exists
> > using “push_twitter" (("type-name"="Tweet”))
> >       apply function find-intersection;
> >
> > (Here "find-intersection” is an AQL function that does the step
> specified under 2)).
> >
> > To do nr. 3 the easiest way is to have a modified version of the
> “hashTag”
> > (let’s call this “tweetlocator”).  Java-based UDF (see my previous
> message with the
> > TweetLib example). You can then connect this  as a secondary feed,
> connected to
> > “CoordTwitterFeed”.
> >
> > E.g.:
> > create secondary feed ProcessedTwitterFeed from feed CoordTwitterFeed
> > apply function “tweelib#tweetlocator";
> >
> > Thereafter you can connect the feeds to appropriate datasets.
> >
> > (Since the find-intersection is an AQL function, you can also call
> “tweelib#tweetlocator”
> > inside this function).
> >
> > Best,
> > -heri
> >
> >> On Oct 30, 2015, at 12:18 AM, Young-Seok Kim <ki...@gmail.com> wrote:
> >>
> >> I'm not sure whether the following UDF is possible or not, but
> hopefully it
> >> is.
> >> What we're trying to do is to have the following UDF.
> >>
> >> The UDF
> >> 1) accepts an incoming tweet record through a feed job as an input and
> then
> >> 2) takes a field, more specifically, coordinate field value from the
> tweet
> >> record and sends a spatial-intersect query using the coordinate in
> order to
> >> find out the corresponding county of the coordinates (we have created
> >> AsterixDB instance which stores US county shapes records into a dataset
> and
> >> created R-tree index on the polygon field of it, so the query will
> return
> >> the county effectively using the R-tree index)
> >> 3) creates a new tweet record consisting of the original record's
> fields +
> >> the returned county value
> >> 4) ingests to a tweet dataset in the AsterixDB instance.
> >>
> >> Can we have such an UDF?
> >>
> >> Best,
> >> Young-Seok
> >>
> >>
> >> On Thu, Oct 29, 2015 at 3:53 PM, Heri Ramampiaro <he...@gmail.com>
> wrote:
> >>
> >>>>
> >>>> 2. Can we use AQL function in those kind of feed UDFs?
> >>>
> >>> Can you give an example of what you are trying to do?
> >>> I.e do you want to run an AQL inside a UDF or use an AQL function as a
> >>> UDF connected to a running feed?
> >>>
> >>> -heri
> >
>
>
>
> Best,
>
> Jianfeng Jia
> PhD Candidate of Computer Science
> University of California, Irvine
>
>

Re: Socket feed questions

Posted by Raman Grover <ra...@gmail.com>.
Hi,

I should be able to look at this one in a few days when I return and resume
work (I am currently on a vacation till 5th Nov).

However, the exception suggests that building the secondary feed pipeline
encountered an exception. Can you share the logs so that I can get a better
understanding of the sequence of steps that happened as your statement
executed.

I have intermittent access to net, but should be able to revert with some
delays.
On Nov 2, 2015 2:32 AM, "Jianfeng Jia" <ji...@gmail.com> wrote:

> I’ve tried the “apply AQLfunction” idea, but got an NullPointException:
> java.lang.NullPointerException
>         at
> org.apache.asterix.aql.translator.AqlTranslator.getFeedJointKey(AqlTranslator.java:2268)
>         at
> org.apache.asterix.aql.translator.AqlTranslator.getFeedConnectionRequest(AqlTranslator.java:2214)
>         at
> org.apache.asterix.aql.translator.AqlTranslator.handleConnectFeedStatement(AqlTranslator.java:2130)
>         at
> org.apache.asterix.aql.translator.AqlTranslator.compileAndExecute(AqlTranslator.java:362)
>         at
> org.apache.asterix.api.http.servlet.APIServlet.doPost(APIServlet.java:114)
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:754)
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
>         at
> org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:546)
>         at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:483)
>         at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>         at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:970)
>         at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:411)
>         at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
>         at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:904)
>         at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
>         at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
>         at org.eclipse.jetty.server.Server.handle(Server.java:347)
>         at
> org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:439)
>         at
> org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:924)
>         at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:781)
>         at
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:220)
>         at
> org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:43)
>         at
> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:545)
>         at
> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:43)
>         at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529)
>         at java.lang.Thread.run(Thread.java:745)
>
> The ddl that I was using is here:
> https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql
> <
> https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql
> >
> Anyone has any idea? Thank you!
>
> > On Oct 30, 2015, at 12:30 AM, Heri Ramampiaro <he...@gmail.com> wrote:
> >
> > Although I haven’t tested this (I have mostly used & created java-based
> UDFs),
> > one can execute AQL calls from within a UDF. Feeds allows functions to
> > execute arbitrary AQL statements (DDL, DMLs etc). I.e., I believe what
> > you are trying to do is possible.
> >
> > For example you could do:
> > (Given that you have a rec. type called Tweet, and a dataset
> ProcessedTweets
> >
> > create feed CoordTwitterFeed if not exists
> > using “push_twitter" (("type-name"="Tweet”))
> >       apply function find-intersection;
> >
> > (Here "find-intersection” is an AQL function that does the step
> specified under 2)).
> >
> > To do nr. 3 the easiest way is to have a modified version of the
> “hashTag”
> > (let’s call this “tweetlocator”).  Java-based UDF (see my previous
> message with the
> > TweetLib example). You can then connect this  as a secondary feed,
> connected to
> > “CoordTwitterFeed”.
> >
> > E.g.:
> > create secondary feed ProcessedTwitterFeed from feed CoordTwitterFeed
> > apply function “tweelib#tweetlocator";
> >
> > Thereafter you can connect the feeds to appropriate datasets.
> >
> > (Since the find-intersection is an AQL function, you can also call
> “tweelib#tweetlocator”
> > inside this function).
> >
> > Best,
> > -heri
> >
> >> On Oct 30, 2015, at 12:18 AM, Young-Seok Kim <ki...@gmail.com> wrote:
> >>
> >> I'm not sure whether the following UDF is possible or not, but
> hopefully it
> >> is.
> >> What we're trying to do is to have the following UDF.
> >>
> >> The UDF
> >> 1) accepts an incoming tweet record through a feed job as an input and
> then
> >> 2) takes a field, more specifically, coordinate field value from the
> tweet
> >> record and sends a spatial-intersect query using the coordinate in
> order to
> >> find out the corresponding county of the coordinates (we have created
> >> AsterixDB instance which stores US county shapes records into a dataset
> and
> >> created R-tree index on the polygon field of it, so the query will
> return
> >> the county effectively using the R-tree index)
> >> 3) creates a new tweet record consisting of the original record's
> fields +
> >> the returned county value
> >> 4) ingests to a tweet dataset in the AsterixDB instance.
> >>
> >> Can we have such an UDF?
> >>
> >> Best,
> >> Young-Seok
> >>
> >>
> >> On Thu, Oct 29, 2015 at 3:53 PM, Heri Ramampiaro <he...@gmail.com>
> wrote:
> >>
> >>>>
> >>>> 2. Can we use AQL function in those kind of feed UDFs?
> >>>
> >>> Can you give an example of what you are trying to do?
> >>> I.e do you want to run an AQL inside a UDF or use an AQL function as a
> >>> UDF connected to a running feed?
> >>>
> >>> -heri
> >
>
>
>
> Best,
>
> Jianfeng Jia
> PhD Candidate of Computer Science
> University of California, Irvine
>
>

Re: Socket feed questions

Posted by Jianfeng Jia <ji...@gmail.com>.
I’ve tried the “apply AQLfunction” idea, but got an NullPointException:
java.lang.NullPointerException
        at org.apache.asterix.aql.translator.AqlTranslator.getFeedJointKey(AqlTranslator.java:2268)
        at org.apache.asterix.aql.translator.AqlTranslator.getFeedConnectionRequest(AqlTranslator.java:2214)
        at org.apache.asterix.aql.translator.AqlTranslator.handleConnectFeedStatement(AqlTranslator.java:2130)
        at org.apache.asterix.aql.translator.AqlTranslator.compileAndExecute(AqlTranslator.java:362)
        at org.apache.asterix.api.http.servlet.APIServlet.doPost(APIServlet.java:114)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:754)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:546)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:483)
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:970)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:411)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:904)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
        at org.eclipse.jetty.server.Server.handle(Server.java:347)
        at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:439)
        at org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:924)
        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:781)
        at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:220)
        at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:43)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:545)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:43)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529)
        at java.lang.Thread.run(Thread.java:745)

The ddl that I was using is here: https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql <https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql>
Anyone has any idea? Thank you!

> On Oct 30, 2015, at 12:30 AM, Heri Ramampiaro <he...@gmail.com> wrote:
> 
> Although I haven’t tested this (I have mostly used & created java-based UDFs), 
> one can execute AQL calls from within a UDF. Feeds allows functions to 
> execute arbitrary AQL statements (DDL, DMLs etc). I.e., I believe what
> you are trying to do is possible.  
> 
> For example you could do:
> (Given that you have a rec. type called Tweet, and a dataset ProcessedTweets
> 
> create feed CoordTwitterFeed if not exists
> using “push_twitter" (("type-name"="Tweet”))
> 	apply function find-intersection;
> 
> (Here "find-intersection” is an AQL function that does the step specified under 2)).
> 
> To do nr. 3 the easiest way is to have a modified version of the “hashTag” 
> (let’s call this “tweetlocator”).  Java-based UDF (see my previous message with the 
> TweetLib example). You can then connect this  as a secondary feed, connected to 
> “CoordTwitterFeed”.
> 
> E.g.:
> create secondary feed ProcessedTwitterFeed from feed CoordTwitterFeed
> apply function “tweelib#tweetlocator";
> 
> Thereafter you can connect the feeds to appropriate datasets.
> 
> (Since the find-intersection is an AQL function, you can also call “tweelib#tweetlocator”
> inside this function).
> 
> Best,
> -heri
> 
>> On Oct 30, 2015, at 12:18 AM, Young-Seok Kim <ki...@gmail.com> wrote:
>> 
>> I'm not sure whether the following UDF is possible or not, but hopefully it
>> is.
>> What we're trying to do is to have the following UDF.
>> 
>> The UDF
>> 1) accepts an incoming tweet record through a feed job as an input and then
>> 2) takes a field, more specifically, coordinate field value from the tweet
>> record and sends a spatial-intersect query using the coordinate in order to
>> find out the corresponding county of the coordinates (we have created
>> AsterixDB instance which stores US county shapes records into a dataset and
>> created R-tree index on the polygon field of it, so the query will return
>> the county effectively using the R-tree index)
>> 3) creates a new tweet record consisting of the original record's fields +
>> the returned county value
>> 4) ingests to a tweet dataset in the AsterixDB instance.
>> 
>> Can we have such an UDF?
>> 
>> Best,
>> Young-Seok
>> 
>> 
>> On Thu, Oct 29, 2015 at 3:53 PM, Heri Ramampiaro <he...@gmail.com> wrote:
>> 
>>>> 
>>>> 2. Can we use AQL function in those kind of feed UDFs?
>>> 
>>> Can you give an example of what you are trying to do?
>>> I.e do you want to run an AQL inside a UDF or use an AQL function as a
>>> UDF connected to a running feed?
>>> 
>>> -heri
> 



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine


Re: Socket feed questions

Posted by Jianfeng Jia <ji...@gmail.com>.
I’ve tried the “apply AQLfunction” idea, but got an NullPointException:
java.lang.NullPointerException
        at org.apache.asterix.aql.translator.AqlTranslator.getFeedJointKey(AqlTranslator.java:2268)
        at org.apache.asterix.aql.translator.AqlTranslator.getFeedConnectionRequest(AqlTranslator.java:2214)
        at org.apache.asterix.aql.translator.AqlTranslator.handleConnectFeedStatement(AqlTranslator.java:2130)
        at org.apache.asterix.aql.translator.AqlTranslator.compileAndExecute(AqlTranslator.java:362)
        at org.apache.asterix.api.http.servlet.APIServlet.doPost(APIServlet.java:114)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:754)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:546)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:483)
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:970)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:411)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:904)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
        at org.eclipse.jetty.server.Server.handle(Server.java:347)
        at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:439)
        at org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:924)
        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:781)
        at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:220)
        at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:43)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:545)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:43)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529)
        at java.lang.Thread.run(Thread.java:745)

The ddl that I was using is here: https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql <https://gist.githubusercontent.com/JavierJia/ca596df82ffdd456001f/raw/ea86f9ad1531a68c3ecf9036ef5b69976893149d/feed-ddl.aql>
Anyone has any idea? Thank you!

> On Oct 30, 2015, at 12:30 AM, Heri Ramampiaro <he...@gmail.com> wrote:
> 
> Although I haven’t tested this (I have mostly used & created java-based UDFs), 
> one can execute AQL calls from within a UDF. Feeds allows functions to 
> execute arbitrary AQL statements (DDL, DMLs etc). I.e., I believe what
> you are trying to do is possible.  
> 
> For example you could do:
> (Given that you have a rec. type called Tweet, and a dataset ProcessedTweets
> 
> create feed CoordTwitterFeed if not exists
> using “push_twitter" (("type-name"="Tweet”))
> 	apply function find-intersection;
> 
> (Here "find-intersection” is an AQL function that does the step specified under 2)).
> 
> To do nr. 3 the easiest way is to have a modified version of the “hashTag” 
> (let’s call this “tweetlocator”).  Java-based UDF (see my previous message with the 
> TweetLib example). You can then connect this  as a secondary feed, connected to 
> “CoordTwitterFeed”.
> 
> E.g.:
> create secondary feed ProcessedTwitterFeed from feed CoordTwitterFeed
> apply function “tweelib#tweetlocator";
> 
> Thereafter you can connect the feeds to appropriate datasets.
> 
> (Since the find-intersection is an AQL function, you can also call “tweelib#tweetlocator”
> inside this function).
> 
> Best,
> -heri
> 
>> On Oct 30, 2015, at 12:18 AM, Young-Seok Kim <ki...@gmail.com> wrote:
>> 
>> I'm not sure whether the following UDF is possible or not, but hopefully it
>> is.
>> What we're trying to do is to have the following UDF.
>> 
>> The UDF
>> 1) accepts an incoming tweet record through a feed job as an input and then
>> 2) takes a field, more specifically, coordinate field value from the tweet
>> record and sends a spatial-intersect query using the coordinate in order to
>> find out the corresponding county of the coordinates (we have created
>> AsterixDB instance which stores US county shapes records into a dataset and
>> created R-tree index on the polygon field of it, so the query will return
>> the county effectively using the R-tree index)
>> 3) creates a new tweet record consisting of the original record's fields +
>> the returned county value
>> 4) ingests to a tweet dataset in the AsterixDB instance.
>> 
>> Can we have such an UDF?
>> 
>> Best,
>> Young-Seok
>> 
>> 
>> On Thu, Oct 29, 2015 at 3:53 PM, Heri Ramampiaro <he...@gmail.com> wrote:
>> 
>>>> 
>>>> 2. Can we use AQL function in those kind of feed UDFs?
>>> 
>>> Can you give an example of what you are trying to do?
>>> I.e do you want to run an AQL inside a UDF or use an AQL function as a
>>> UDF connected to a running feed?
>>> 
>>> -heri
> 



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine


Re: Socket feed questions

Posted by Heri Ramampiaro <he...@gmail.com>.
The feeds test cases are under
asterix-app/src/test/resources/runtimets/queries/feeds

For the UDFs I believe the test cases are:
asterix-app/src/test/resources/runtimets/queries/user-defined-functions

I can prepare some simple examples till next week.

-heri


> On Oct 30, 2015, at 5:58 PM, Chen Li <ch...@gmail.com> wrote:
> 
> @Heri: do you know where to find the test cases of AQL UDF, or have some
> test cases of your own?  Jianfeng can start from there.
> 
> Chen
> 
> On Fri, Oct 30, 2015 at 12:30 AM, Heri Ramampiaro <he...@gmail.com> wrote:
> 
>> Although I haven’t tested this (I have mostly used & created java-based
>> UDFs),
>> one can execute AQL calls from within a UDF. Feeds allows functions to
>> execute arbitrary AQL statements (DDL, DMLs etc). I.e., I believe what
>> you are trying to do is possible.
>> 
>> For example you could do:
>> (Given that you have a rec. type called Tweet, and a dataset
>> ProcessedTweets
>> 
>> create feed CoordTwitterFeed if not exists
>> using “push_twitter" (("type-name"="Tweet”))
>>        apply function find-intersection;
>> 
>> (Here "find-intersection” is an AQL function that does the step specified
>> under 2)).
>> 
>> To do nr. 3 the easiest way is to have a modified version of the “hashTag”
>> (let’s call this “tweetlocator”).  Java-based UDF (see my previous message
>> with the
>> TweetLib example). You can then connect this  as a secondary feed,
>> connected to
>> “CoordTwitterFeed”.
>> 
>> E.g.:
>> create secondary feed ProcessedTwitterFeed from feed CoordTwitterFeed
>> apply function “tweelib#tweetlocator";
>> 
>> Thereafter you can connect the feeds to appropriate datasets.
>> 
>> (Since the find-intersection is an AQL function, you can also call
>> “tweelib#tweetlocator”
>> inside this function).
>> 
>> Best,
>> -heri
>> 
>>> On Oct 30, 2015, at 12:18 AM, Young-Seok Kim <ki...@gmail.com> wrote:
>>> 
>>> I'm not sure whether the following UDF is possible or not, but hopefully
>> it
>>> is.
>>> What we're trying to do is to have the following UDF.
>>> 
>>> The UDF
>>> 1) accepts an incoming tweet record through a feed job as an input and
>> then
>>> 2) takes a field, more specifically, coordinate field value from the
>> tweet
>>> record and sends a spatial-intersect query using the coordinate in order
>> to
>>> find out the corresponding county of the coordinates (we have created
>>> AsterixDB instance which stores US county shapes records into a dataset
>> and
>>> created R-tree index on the polygon field of it, so the query will return
>>> the county effectively using the R-tree index)
>>> 3) creates a new tweet record consisting of the original record's fields
>> +
>>> the returned county value
>>> 4) ingests to a tweet dataset in the AsterixDB instance.
>>> 
>>> Can we have such an UDF?
>>> 
>>> Best,
>>> Young-Seok
>>> 
>>> 
>>> On Thu, Oct 29, 2015 at 3:53 PM, Heri Ramampiaro <he...@gmail.com>
>> wrote:
>>> 
>>>>> 
>>>>> 2. Can we use AQL function in those kind of feed UDFs?
>>>> 
>>>> Can you give an example of what you are trying to do?
>>>> I.e do you want to run an AQL inside a UDF or use an AQL function as a
>>>> UDF connected to a running feed?
>>>> 
>>>> -heri
>> 
>> 


Re: Socket feed questions

Posted by Chen Li <ch...@gmail.com>.
@Heri: do you know where to find the test cases of AQL UDF, or have some
test cases of your own?  Jianfeng can start from there.

Chen

On Fri, Oct 30, 2015 at 12:30 AM, Heri Ramampiaro <he...@gmail.com> wrote:

> Although I haven’t tested this (I have mostly used & created java-based
> UDFs),
> one can execute AQL calls from within a UDF. Feeds allows functions to
> execute arbitrary AQL statements (DDL, DMLs etc). I.e., I believe what
> you are trying to do is possible.
>
> For example you could do:
> (Given that you have a rec. type called Tweet, and a dataset
> ProcessedTweets
>
> create feed CoordTwitterFeed if not exists
> using “push_twitter" (("type-name"="Tweet”))
>         apply function find-intersection;
>
> (Here "find-intersection” is an AQL function that does the step specified
> under 2)).
>
> To do nr. 3 the easiest way is to have a modified version of the “hashTag”
> (let’s call this “tweetlocator”).  Java-based UDF (see my previous message
> with the
> TweetLib example). You can then connect this  as a secondary feed,
> connected to
> “CoordTwitterFeed”.
>
> E.g.:
> create secondary feed ProcessedTwitterFeed from feed CoordTwitterFeed
> apply function “tweelib#tweetlocator";
>
> Thereafter you can connect the feeds to appropriate datasets.
>
> (Since the find-intersection is an AQL function, you can also call
> “tweelib#tweetlocator”
> inside this function).
>
> Best,
> -heri
>
> > On Oct 30, 2015, at 12:18 AM, Young-Seok Kim <ki...@gmail.com> wrote:
> >
> > I'm not sure whether the following UDF is possible or not, but hopefully
> it
> > is.
> > What we're trying to do is to have the following UDF.
> >
> > The UDF
> > 1) accepts an incoming tweet record through a feed job as an input and
> then
> > 2) takes a field, more specifically, coordinate field value from the
> tweet
> > record and sends a spatial-intersect query using the coordinate in order
> to
> > find out the corresponding county of the coordinates (we have created
> > AsterixDB instance which stores US county shapes records into a dataset
> and
> > created R-tree index on the polygon field of it, so the query will return
> > the county effectively using the R-tree index)
> > 3) creates a new tweet record consisting of the original record's fields
> +
> > the returned county value
> > 4) ingests to a tweet dataset in the AsterixDB instance.
> >
> > Can we have such an UDF?
> >
> > Best,
> > Young-Seok
> >
> >
> > On Thu, Oct 29, 2015 at 3:53 PM, Heri Ramampiaro <he...@gmail.com>
> wrote:
> >
> >>>
> >>> 2. Can we use AQL function in those kind of feed UDFs?
> >>
> >> Can you give an example of what you are trying to do?
> >> I.e do you want to run an AQL inside a UDF or use an AQL function as a
> >> UDF connected to a running feed?
> >>
> >> -heri
>
>

Re: Socket feed questions

Posted by Chen Li <ch...@gmail.com>.
@Heri: do you know where to find the test cases of AQL UDF, or have some
test cases of your own?  Jianfeng can start from there.

Chen

On Fri, Oct 30, 2015 at 12:30 AM, Heri Ramampiaro <he...@gmail.com> wrote:

> Although I haven’t tested this (I have mostly used & created java-based
> UDFs),
> one can execute AQL calls from within a UDF. Feeds allows functions to
> execute arbitrary AQL statements (DDL, DMLs etc). I.e., I believe what
> you are trying to do is possible.
>
> For example you could do:
> (Given that you have a rec. type called Tweet, and a dataset
> ProcessedTweets
>
> create feed CoordTwitterFeed if not exists
> using “push_twitter" (("type-name"="Tweet”))
>         apply function find-intersection;
>
> (Here "find-intersection” is an AQL function that does the step specified
> under 2)).
>
> To do nr. 3 the easiest way is to have a modified version of the “hashTag”
> (let’s call this “tweetlocator”).  Java-based UDF (see my previous message
> with the
> TweetLib example). You can then connect this  as a secondary feed,
> connected to
> “CoordTwitterFeed”.
>
> E.g.:
> create secondary feed ProcessedTwitterFeed from feed CoordTwitterFeed
> apply function “tweelib#tweetlocator";
>
> Thereafter you can connect the feeds to appropriate datasets.
>
> (Since the find-intersection is an AQL function, you can also call
> “tweelib#tweetlocator”
> inside this function).
>
> Best,
> -heri
>
> > On Oct 30, 2015, at 12:18 AM, Young-Seok Kim <ki...@gmail.com> wrote:
> >
> > I'm not sure whether the following UDF is possible or not, but hopefully
> it
> > is.
> > What we're trying to do is to have the following UDF.
> >
> > The UDF
> > 1) accepts an incoming tweet record through a feed job as an input and
> then
> > 2) takes a field, more specifically, coordinate field value from the
> tweet
> > record and sends a spatial-intersect query using the coordinate in order
> to
> > find out the corresponding county of the coordinates (we have created
> > AsterixDB instance which stores US county shapes records into a dataset
> and
> > created R-tree index on the polygon field of it, so the query will return
> > the county effectively using the R-tree index)
> > 3) creates a new tweet record consisting of the original record's fields
> +
> > the returned county value
> > 4) ingests to a tweet dataset in the AsterixDB instance.
> >
> > Can we have such an UDF?
> >
> > Best,
> > Young-Seok
> >
> >
> > On Thu, Oct 29, 2015 at 3:53 PM, Heri Ramampiaro <he...@gmail.com>
> wrote:
> >
> >>>
> >>> 2. Can we use AQL function in those kind of feed UDFs?
> >>
> >> Can you give an example of what you are trying to do?
> >> I.e do you want to run an AQL inside a UDF or use an AQL function as a
> >> UDF connected to a running feed?
> >>
> >> -heri
>
>

Re: Socket feed questions

Posted by Heri Ramampiaro <he...@gmail.com>.
Although I haven’t tested this (I have mostly used & created java-based UDFs), 
one can execute AQL calls from within a UDF. Feeds allows functions to 
execute arbitrary AQL statements (DDL, DMLs etc). I.e., I believe what
you are trying to do is possible.  

For example you could do:
(Given that you have a rec. type called Tweet, and a dataset ProcessedTweets

create feed CoordTwitterFeed if not exists
using “push_twitter" (("type-name"="Tweet”))
	apply function find-intersection;

(Here "find-intersection” is an AQL function that does the step specified under 2)).

To do nr. 3 the easiest way is to have a modified version of the “hashTag” 
(let’s call this “tweetlocator”).  Java-based UDF (see my previous message with the 
TweetLib example). You can then connect this  as a secondary feed, connected to 
“CoordTwitterFeed”.

E.g.:
create secondary feed ProcessedTwitterFeed from feed CoordTwitterFeed
apply function “tweelib#tweetlocator";

Thereafter you can connect the feeds to appropriate datasets.

(Since the find-intersection is an AQL function, you can also call “tweelib#tweetlocator”
inside this function).

Best,
-heri

> On Oct 30, 2015, at 12:18 AM, Young-Seok Kim <ki...@gmail.com> wrote:
> 
> I'm not sure whether the following UDF is possible or not, but hopefully it
> is.
> What we're trying to do is to have the following UDF.
> 
> The UDF
> 1) accepts an incoming tweet record through a feed job as an input and then
> 2) takes a field, more specifically, coordinate field value from the tweet
> record and sends a spatial-intersect query using the coordinate in order to
> find out the corresponding county of the coordinates (we have created
> AsterixDB instance which stores US county shapes records into a dataset and
> created R-tree index on the polygon field of it, so the query will return
> the county effectively using the R-tree index)
> 3) creates a new tweet record consisting of the original record's fields +
> the returned county value
> 4) ingests to a tweet dataset in the AsterixDB instance.
> 
> Can we have such an UDF?
> 
> Best,
> Young-Seok
> 
> 
> On Thu, Oct 29, 2015 at 3:53 PM, Heri Ramampiaro <he...@gmail.com> wrote:
> 
>>> 
>>> 2. Can we use AQL function in those kind of feed UDFs?
>> 
>> Can you give an example of what you are trying to do?
>> I.e do you want to run an AQL inside a UDF or use an AQL function as a
>> UDF connected to a running feed?
>> 
>> -heri


Re: Socket feed questions

Posted by Heri Ramampiaro <he...@gmail.com>.
Although I haven’t tested this (I have mostly used & created java-based UDFs), 
one can execute AQL calls from within a UDF. Feeds allows functions to 
execute arbitrary AQL statements (DDL, DMLs etc). I.e., I believe what
you are trying to do is possible.  

For example you could do:
(Given that you have a rec. type called Tweet, and a dataset ProcessedTweets

create feed CoordTwitterFeed if not exists
using “push_twitter" (("type-name"="Tweet”))
	apply function find-intersection;

(Here "find-intersection” is an AQL function that does the step specified under 2)).

To do nr. 3 the easiest way is to have a modified version of the “hashTag” 
(let’s call this “tweetlocator”).  Java-based UDF (see my previous message with the 
TweetLib example). You can then connect this  as a secondary feed, connected to 
“CoordTwitterFeed”.

E.g.:
create secondary feed ProcessedTwitterFeed from feed CoordTwitterFeed
apply function “tweelib#tweetlocator";

Thereafter you can connect the feeds to appropriate datasets.

(Since the find-intersection is an AQL function, you can also call “tweelib#tweetlocator”
inside this function).

Best,
-heri

> On Oct 30, 2015, at 12:18 AM, Young-Seok Kim <ki...@gmail.com> wrote:
> 
> I'm not sure whether the following UDF is possible or not, but hopefully it
> is.
> What we're trying to do is to have the following UDF.
> 
> The UDF
> 1) accepts an incoming tweet record through a feed job as an input and then
> 2) takes a field, more specifically, coordinate field value from the tweet
> record and sends a spatial-intersect query using the coordinate in order to
> find out the corresponding county of the coordinates (we have created
> AsterixDB instance which stores US county shapes records into a dataset and
> created R-tree index on the polygon field of it, so the query will return
> the county effectively using the R-tree index)
> 3) creates a new tweet record consisting of the original record's fields +
> the returned county value
> 4) ingests to a tweet dataset in the AsterixDB instance.
> 
> Can we have such an UDF?
> 
> Best,
> Young-Seok
> 
> 
> On Thu, Oct 29, 2015 at 3:53 PM, Heri Ramampiaro <he...@gmail.com> wrote:
> 
>>> 
>>> 2. Can we use AQL function in those kind of feed UDFs?
>> 
>> Can you give an example of what you are trying to do?
>> I.e do you want to run an AQL inside a UDF or use an AQL function as a
>> UDF connected to a running feed?
>> 
>> -heri


Re: Socket feed questions

Posted by Young-Seok Kim <ki...@gmail.com>.
I'm not sure whether the following UDF is possible or not, but hopefully it
is.
What we're trying to do is to have the following UDF.

The UDF
1) accepts an incoming tweet record through a feed job as an input and then
2) takes a field, more specifically, coordinate field value from the tweet
record and sends a spatial-intersect query using the coordinate in order to
find out the corresponding county of the coordinates (we have created
AsterixDB instance which stores US county shapes records into a dataset and
created R-tree index on the polygon field of it, so the query will return
the county effectively using the R-tree index)
3) creates a new tweet record consisting of the original record's fields +
the returned county value
4) ingests to a tweet dataset in the AsterixDB instance.

Can we have such an UDF?

Best,
Young-Seok


On Thu, Oct 29, 2015 at 3:53 PM, Heri Ramampiaro <he...@gmail.com> wrote:

> >
> > 2. Can we use AQL function in those kind of feed UDFs?
>
> Can you give an example of what you are trying to do?
> I.e do you want to run an AQL inside a UDF or use an AQL function as a
> UDF connected to a running feed?
>
> -heri

Re: Socket feed questions

Posted by Michael Carey <mj...@ics.uci.edu>.
That's exactly right:  AQL-based function on a feed connector.
(And it worked at the point where I signed Raman's thesis. :-))

On 10/29/15 4:18 PM, Jianfeng Jia wrote:
> Let’s say I have a Tweets dataset, and I have a 
> SyntheticTweeterAdapter to feed the data. It will generate the record 
> like r1: { “id”: 1, “tweet” : “blabla”, “coordinate”: point(1,2) }
> During the feeding, I want to have an UDF that can enrich the dataset 
> by appending another field to the record.
> E.g. I can add the address information of the previous r1 :  { “id”: 
> 1, “tweet” : “blabla”, “coordinate”: point(1,2), “address”: [“Irvine, 
> CA”] } .
>
> In order to get the mapping from “coordinate” to “address”, I need to 
> run a AQL query like
>
> for $t in feed Tweets
> return { “id”: $t.id, “tweet”: $t.tweet, “coordinate”: $t.coordinate,
> “address”: for $city in dataset (“AsterixCityTable”) where 
>  spatio-intersect ($t.coordinate, $city.geometry) return $city.name
> }
>
> It seems more like the second one: using an AQL function as a UDF 
> connector. But I’m not very certain about it. It will be very helpful 
> if you can provide some existing UDF example, not necessary the AQL 
> ones.  Thank you!
>
>
>> On Oct 29, 2015, at 3:53 PM, Heri Ramampiaro <heriram@gmail.com 
>> <ma...@gmail.com>> wrote:
>>
>>>
>>> 2. Can we use AQL function in those kind of feed UDFs?
>>
>> Can you give an example of what you are trying to do?
>> I.e do you want to run an AQL inside a UDF or use an AQL function as a
>> UDF connected to a running feed?
>>
>> -heri
>
>
>
> Best,
>
> Jianfeng Jia
> PhD Candidate of Computer Science
> University of California, Irvine
>


Re: Socket feed questions

Posted by Jianfeng Jia <ji...@gmail.com>.
 
Let’s say I have a Tweets dataset, and I have a SyntheticTweeterAdapter to feed the data. It will generate the record like r1: { “id”: 1, “tweet” : “blabla”, “coordinate”: point(1,2) } 
During the feeding, I want to have an UDF that can enrich the dataset by appending another field to the record. 
E.g. I can add the address information of the previous r1 :  { “id”: 1, “tweet” : “blabla”, “coordinate”: point(1,2), “address”: [“Irvine, CA”] } . 

In order to get the mapping from “coordinate” to “address”, I need to run a AQL query like 

for $t in feed Tweets
return { “id”: $t.id, “tweet”: $t.tweet, “coordinate”: $t.coordinate,
“address”: for $city in dataset (“AsterixCityTable”) where  spatio-intersect ($t.coordinate, $city.geometry) return $city.name
}

It seems more like the second one: using an AQL function as a UDF connector. But I’m not very certain about it. It will be very helpful if you can provide some existing UDF example, not necessary the AQL ones.  Thank you!


> On Oct 29, 2015, at 3:53 PM, Heri Ramampiaro <he...@gmail.com> wrote:
> 
>> 
>> 2. Can we use AQL function in those kind of feed UDFs?
> 
> Can you give an example of what you are trying to do?
> I.e do you want to run an AQL inside a UDF or use an AQL function as a
> UDF connected to a running feed?
> 
> -heri



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine


Re: Socket feed questions

Posted by Jianfeng Jia <ji...@gmail.com>.
 
Let’s say I have a Tweets dataset, and I have a SyntheticTweeterAdapter to feed the data. It will generate the record like r1: { “id”: 1, “tweet” : “blabla”, “coordinate”: point(1,2) } 
During the feeding, I want to have an UDF that can enrich the dataset by appending another field to the record. 
E.g. I can add the address information of the previous r1 :  { “id”: 1, “tweet” : “blabla”, “coordinate”: point(1,2), “address”: [“Irvine, CA”] } . 

In order to get the mapping from “coordinate” to “address”, I need to run a AQL query like 

for $t in feed Tweets
return { “id”: $t.id, “tweet”: $t.tweet, “coordinate”: $t.coordinate,
“address”: for $city in dataset (“AsterixCityTable”) where  spatio-intersect ($t.coordinate, $city.geometry) return $city.name
}

It seems more like the second one: using an AQL function as a UDF connector. But I’m not very certain about it. It will be very helpful if you can provide some existing UDF example, not necessary the AQL ones.  Thank you!


> On Oct 29, 2015, at 3:53 PM, Heri Ramampiaro <he...@gmail.com> wrote:
> 
>> 
>> 2. Can we use AQL function in those kind of feed UDFs?
> 
> Can you give an example of what you are trying to do?
> I.e do you want to run an AQL inside a UDF or use an AQL function as a
> UDF connected to a running feed?
> 
> -heri



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine


Re: Socket feed questions

Posted by Young-Seok Kim <ki...@gmail.com>.
I'm not sure whether the following UDF is possible or not, but hopefully it
is.
What we're trying to do is to have the following UDF.

The UDF
1) accepts an incoming tweet record through a feed job as an input and then
2) takes a field, more specifically, coordinate field value from the tweet
record and sends a spatial-intersect query using the coordinate in order to
find out the corresponding county of the coordinates (we have created
AsterixDB instance which stores US county shapes records into a dataset and
created R-tree index on the polygon field of it, so the query will return
the county effectively using the R-tree index)
3) creates a new tweet record consisting of the original record's fields +
the returned county value
4) ingests to a tweet dataset in the AsterixDB instance.

Can we have such an UDF?

Best,
Young-Seok


On Thu, Oct 29, 2015 at 3:53 PM, Heri Ramampiaro <he...@gmail.com> wrote:

> >
> > 2. Can we use AQL function in those kind of feed UDFs?
>
> Can you give an example of what you are trying to do?
> I.e do you want to run an AQL inside a UDF or use an AQL function as a
> UDF connected to a running feed?
>
> -heri

Re: Socket feed questions

Posted by Heri Ramampiaro <he...@gmail.com>.
> 
> 2. Can we use AQL function in those kind of feed UDFs?

Can you give an example of what you are trying to do?
I.e do you want to run an AQL inside a UDF or use an AQL function as a
UDF connected to a running feed?

-heri

Re: Socket feed questions

Posted by Heri Ramampiaro <he...@gmail.com>.
> 
> 2. Can we use AQL function in those kind of feed UDFs?

Can you give an example of what you are trying to do?
I.e do you want to run an AQL inside a UDF or use an AQL function as a
UDF connected to a running feed?

-heri

Re: Socket feed questions

Posted by Mike Carey <dt...@gmail.com>.
Yes and yes, I believe!  The AQL UDF case is less tested, I believe, but it
should work...
On Oct 29, 2015 12:22 PM, "Jianfeng Jia" <ji...@gmail.com> wrote:

> Hi Devs,
>
> I have two related questions,
> 1. Is there any example code of using UDF in feed-adapter?
> 2. Can we use AQL function in those kind of feed UDFs?
>
> Thank you.
>
> On Tue, Oct 27, 2015 at 9:54 PM, Michael Carey <mj...@ics.uci.edu>
> wrote:
>
> > Thanks!
> >
> > On 10/27/15 9:48 AM, Raman Grover wrote:
> >
> >> Hi,
> >>
> >>
> >> In the case when data is being received from an external source (e.g.
> >> during feed ingestion), a slow rate of arrival of data may result in
> >> excessive delays until the data is deposited into the target dataset and
> >> made accessible to queries. Data moves along a data ingestion pipeline
> >> between operators as packed fixed size frames. The default behavior is
> to
> >> wait for the frame to be full before dispatching the contained data to
> the
> >> downstream operator. However, as noted, this may not suit all scenarios
> >> particularly when data source is sending data at a low rate. To cater to
> >> different scenarios, AsterixDB allows configuring the behavior. The
> >> different options are described next.
> >>
> >> *Push data downstream when*
> >> (a) Frame is full (default)
> >> (b) At least N records (data items) have been collected into a partially
> >> filled frame
> >> (c) At least T seconds have elapsed since the last record was put into
> >> the frame
> >>
> >> *How to configure the behavior?*
> >> At the time of defining a feed, an end-user may specify configuration
> >> parameters that determine the runtime behavior (options (a), (b) or (c)
> >> from above).
> >>
> >> The parameters are described below:
> >>
> >> /"parser-policy"/: A specific strategy chosen from a set of pre-defined
> >> values -
> >>   (i) / "frame_full"/
> >>  This is the default value. As the name suggests, this choice causes
> >> frames to be pushed by the feed adaptor only when there isn't sufficient
> >> space for an additional record to fit in. This corresponds to option
> (a).
> >>
> >>  (ii) / "counter_timer_expired" /
> >>  Use this as the value if you wish to set either option (b) or (c)  or a
> >> combination of both.
> >>
> >> *Some Examples*
> >> *
> >> *
> >> 1) Pack a maximum of 100 records into a data frame and push it
> downstream.
> >>
> >>  create feed my_feed using my_adaptor
> >> (("parser-policy"="counter_timer_expired"), ("batch-size"="100"), ...
> >> other parameters);
> >>
> >> 2) Wait till 2 seconds and send however many records collected in a
> frame
> >> downstream.
> >>  create feed my_feed using my_adaptor
> >> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2")...
> >> other parameters);
> >>
> >> 3) Wait till 100 records have been collected into a data frame or 2
> >> seconds have elapsed since the last record was put into the current data
> >> frame.
> >>  create feed my_feed using my_adaptor
> >> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2"),
> >> ("batch-size"="100"),... other parameters);
> >>
> >>
> >> *Note*
> >> The above config parameters are not specific to using a particular
> >> implementation of an adaptor but are available for use with any feed
> >> adaptor. Some adaptors that ship with AsterixDB use different default
> >> values for above to suit their specific scenario. E.g. the pull-based
> >> twitter adaptor uses "counter_timer_expired" as the "parser-policy" and
> >> sets the  parameter "batch-interval".
> >>
> >>
> >> Regards,
> >> Raman
> >> PS: The names of the parameters described above are not as intuitive as
> >> one would like them to be. The names need to be changed.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Oct 22, 2015 at 9:09 AM, Mike Carey <dtabass@gmail.com <mailto:
> >> dtabass@gmail.com>> wrote:
> >>
> >>     I think we need to have tuning parameters - like batch size and
> >>     maximum tolerable latency (in case there's a lull and you still
> >>     want to push stuff with some worst-case delay). @Raman Grover -
> >>     remind me (us) what's available in this regard?
> >>
> >>     On 10/22/15 4:29 AM, Pääkkönen Pekka wrote:
> >>
> >>>
> >>>     Hi,
> >>>
> >>>     Yes, you are right. I tried sending a larger amount of data, and
> >>>     data is now stored to the database.
> >>>
> >>>     Does it make sense to configure a smaller batch size in order to
> >>>     get more frequent writes?
> >>>
> >>>     Or would it significantly impact performance?
> >>>
> >>>     -Pekka
> >>>
> >>>     Data moves through the pipeline in frame-sized batches, so one
> >>>
> >>>     (uniformed :-)) guess is that you aren't running very long, and
> >>>     you're
> >>>
> >>>     only seeing the data flow when you close because only then do you
> >>>     have a
> >>>
> >>>     batch's worth.  Is that possible?  You can test this by running
> >>>     longer
> >>>
> >>>     (more data) and seeing if you start to see the expected incremental
> >>>
> >>>     flow/inserts. (And we need tunability in this area, e.g.,
> >>>     parameters on
> >>>
> >>>     how much batching and/or low much latency to tolerate on each
> feed.)
> >>>
> >>>     On 10/21/15 4:45 AM, Pääkkönen Pekka wrote:
> >>>
> >>>     >
> >>>
> >>>     > Hi,
> >>>
> >>>     >
> >>>
> >>>     > Thanks, now I am able to create a socket feed, and save items to
> >>> the
> >>>
> >>>     > dataset from the feed.
> >>>
> >>>     >
> >>>
> >>>     > It seems that data items are written to the dataset after I close
> >>> the
> >>>
> >>>     > socket at the client.
> >>>
> >>>     >
> >>>
> >>>     > Is there some way to indicate to AsterixDB feed (with a newline
> or
> >>>
> >>>     > other indicator) that data can be written to the database, when
> the
> >>>
> >>>     > connection is open?
> >>>
> >>>     >
> >>>
> >>>     > After I close the socket at the client, the feed seems to close
> >>> down.
> >>>
> >>>     > Or is it only paused, until it is resumed?
> >>>
> >>>     >
> >>>
> >>>     > -Pekka
> >>>
> >>>     >
> >>>
> >>>     > Hi Pekka,
> >>>
> >>>     >
> >>>
> >>>     > That's interesting, I'm not sure why the CC would appear as being
> >>> down
> >>>
> >>>     >
> >>>
> >>>     > to Managix. However if you can access the web console, it that
> >>>
> >>>     >
> >>>
> >>>     > evidently isn't the case.
> >>>
> >>>     >
> >>>
> >>>     > As for data ingestion via sockets, yes it is possible, but it
> kind
> >>> of
> >>>
> >>>     >
> >>>
> >>>     > depends on what's meant by sockets. There's no tutorial for it,
> but
> >>>
> >>>     >
> >>>
> >>>     > take a look at SocketBasedFeedAdapter in the source, as well as
> >>>
> >>>     >
> >>>
> >>>     >
> >>>
> https://github.com/kisskys/incubator-asterixdb/blob/kisskys/indexonlyhilbertbtree/asterix-experiments/src/main/java/org/apache/asterix/experiment/client/SocketTweetGenerator.java
> >>>
> >>>     >
> >>>
> >>>     > for some examples of how it works.
> >>>
> >>>     >
> >>>
> >>>     > Hope that helps!
> >>>
> >>>     >
> >>>
> >>>     > Thanks,
> >>>
> >>>     >
> >>>
> >>>     > -Ian
> >>>
> >>>     >
> >>>
> >>>     > On Mon, Oct 19, 2015 at 10:15 PM, Pääkkönen Pekka
> >>>
> >>>     ><Pe...@vtt.fi> <ma...@vtt.fi> wrote:
> >>>
> >>>     > > Hi Ian,
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > Thanks for the reply.
> >>>
> >>>     > >
> >>>
> >>>     > > I compiled AsterixDB v0.87 and started it.
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > However, I get the following warnings:
> >>>
> >>>     > >
> >>>
> >>>     > > INFO: Name:my_asterix
> >>>
> >>>     > >
> >>>
> >>>     > > Created:Mon Oct 19 08:37:16 UTC 2015
> >>>
> >>>     > >
> >>>
> >>>     > > Web-Url:http://192.168.101.144:19001
> >>>
> >>>     > >
> >>>
> >>>     > > State:UNUSABLE
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > WARNING!:Cluster Controller not running at master
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > Also, I see the following warnings in my_asterixdb1.log. there
> >>>     are no
> >>>
> >>>     > > warnings or errors in cc.log
> >>>
> >>>     > >
> >>>
> >>>     > > “
> >>>
> >>>     > >
> >>>
> >>>     > > Oct 19, 2015 8:37:39 AM
> >>>
> >>>     > > org.apache.hyracks.api.lifecycle.LifeCycleComponentManager
> >>> configure
> >>>
> >>>     > >
> >>>
> >>>     > > SEVERE: LifecycleComponentManager configured
> >>>
> >>>     > >
> >>> org.apache.hyracks.api.lifecycle.LifeCycleComponentManager@7559ec47
> >>>
> >>>     > >
> >>>
> >>>     > > ..
> >>>
> >>>     > >
> >>>
> >>>     > > INFO: Completed sharp checkpoint.
> >>>
> >>>     > >
> >>>
> >>>     > > Oct 19, 2015 8:37:40 AM
> >>>     org.apache.asterix.om.util.AsterixClusterProperties
> >>>
> >>>     > > getIODevices
> >>>
> >>>     > >
> >>>
> >>>     > > WARNING: Configuration parameters for nodeId my_asterix_node1
> >>>     not found. The
> >>>
> >>>     > > node has not joined yet or has left.
> >>>
> >>>     > >
> >>>
> >>>     > > Oct 19, 2015 8:37:40 AM
> >>>     org.apache.asterix.om.util.AsterixClusterProperties
> >>>
> >>>     > > getIODevices
> >>>
> >>>     > >
> >>>
> >>>     > > WARNING: Configuration parameters for nodeId my_asterix_node1
> >>>     not found. The
> >>>
> >>>     > > node has not joined yet or has left.
> >>>
> >>>     > >
> >>>
> >>>     > > Oct 19, 2015 8:38:38 AM
> >>>
> >>>     > > org.apache.hyracks.control.common.dataset.ResultStateSweeper
> >>> sweep
> >>>
> >>>     > >
> >>>
> >>>     > > INFO: Result state cleanup instance successfully completed.”
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > I seems that AsterixDB is running, and I can access it at port
> >>> 19001.
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > The documentation shows ingestion of tweets, but I would be
> >>>     interested in
> >>>
> >>>     > > using sockets.
> >>>
> >>>     > >
> >>>
> >>>     > > Is it possible to ingest data from sockets?
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > Regards,
> >>>
> >>>     > >
> >>>
> >>>     > > -Pekka
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > Hey there Pekka,
> >>>
> >>>     > >
> >>>
> >>>     > > Your intuition is correct, most of the newer feeds features are
> >>> in the
> >>>
> >>>     > >
> >>>
> >>>     > > current master branch and not in the (very) old 0.8.6 release.
> >>>     If you'd
> >>>
> >>>     > >
> >>>
> >>>     > > like to experiment with them you'll have to build from source.
> >>> The
> >>>     details
> >>>
> >>>     > >
> >>>
> >>>     > > about that are here:
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> https://asterixdb.incubator.apache.org/dev-setup.html#setting-up-an-asterix-development-environment-in-eclipse
> >>>
> >>>     > >
> >>>
> >>>     > > , but they're probably a bit overkill for just trying to get
> the
> >>>     compiled
> >>>
> >>>     > >
> >>>
> >>>     > > binaries. For that all you really need to do is :
> >>>
> >>>     > >
> >>>
> >>>     > > - Clone Hyracks from git
> >>>
> >>>     > >
> >>>
> >>>     > > - 'mvn clean install -DskipTests'
> >>>
> >>>     > >
> >>>
> >>>     > > - Clone AsterixDB
> >>>
> >>>     > >
> >>>
> >>>     > > - 'mvn clean package -DskipTests'
> >>>
> >>>     > >
> >>>
> >>>     > > Then, the binaries will sit in asterix-installer/target
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > For an example, the documentation shows how to set up a feed
> >>> that's
> >>>
> >>>     > >
> >>>
> >>>     > > ingesting Tweets:
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> https://asterix-jenkins.ics.uci.edu/job/asterix-test-full/site/asterix-doc/feeds/tutorial.html
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > Thanks,
> >>>
> >>>     > >
> >>>
> >>>     > > -Ian
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > On Wed, Oct 7, 2015 at 9:48 PM, Pääkkönen Pekka
> >>>     <Pe...@vtt.fi> <ma...@vtt.fi>
> >>>
> >>>     > >
> >>>
> >>>     > > wrote:
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >> Hi,
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >> I would like to experiment with a socket-based feed.
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >> Can you point me to an example on how to utilize them?
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >> Do I need to install 0.8.7-snapshot version of AsterixDB in
> >>> order to
> >>>
> >>>     > >
> >>>
> >>>     > >> experiment with feeds?
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >> Regards,
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >> -Pekka Pääkkönen
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     >
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> Raman
> >>
> >
> >
>
>
> --
>
> -----------------
> Best Regards
>
> Jianfeng Jia
> Ph.D. Candidate of Computer Science
> University of California, Irvine
>

Re: Socket feed questions

Posted by Mike Carey <dt...@gmail.com>.
Yes and yes, I believe!  The AQL UDF case is less tested, I believe, but it
should work...
On Oct 29, 2015 12:22 PM, "Jianfeng Jia" <ji...@gmail.com> wrote:

> Hi Devs,
>
> I have two related questions,
> 1. Is there any example code of using UDF in feed-adapter?
> 2. Can we use AQL function in those kind of feed UDFs?
>
> Thank you.
>
> On Tue, Oct 27, 2015 at 9:54 PM, Michael Carey <mj...@ics.uci.edu>
> wrote:
>
> > Thanks!
> >
> > On 10/27/15 9:48 AM, Raman Grover wrote:
> >
> >> Hi,
> >>
> >>
> >> In the case when data is being received from an external source (e.g.
> >> during feed ingestion), a slow rate of arrival of data may result in
> >> excessive delays until the data is deposited into the target dataset and
> >> made accessible to queries. Data moves along a data ingestion pipeline
> >> between operators as packed fixed size frames. The default behavior is
> to
> >> wait for the frame to be full before dispatching the contained data to
> the
> >> downstream operator. However, as noted, this may not suit all scenarios
> >> particularly when data source is sending data at a low rate. To cater to
> >> different scenarios, AsterixDB allows configuring the behavior. The
> >> different options are described next.
> >>
> >> *Push data downstream when*
> >> (a) Frame is full (default)
> >> (b) At least N records (data items) have been collected into a partially
> >> filled frame
> >> (c) At least T seconds have elapsed since the last record was put into
> >> the frame
> >>
> >> *How to configure the behavior?*
> >> At the time of defining a feed, an end-user may specify configuration
> >> parameters that determine the runtime behavior (options (a), (b) or (c)
> >> from above).
> >>
> >> The parameters are described below:
> >>
> >> /"parser-policy"/: A specific strategy chosen from a set of pre-defined
> >> values -
> >>   (i) / "frame_full"/
> >>  This is the default value. As the name suggests, this choice causes
> >> frames to be pushed by the feed adaptor only when there isn't sufficient
> >> space for an additional record to fit in. This corresponds to option
> (a).
> >>
> >>  (ii) / "counter_timer_expired" /
> >>  Use this as the value if you wish to set either option (b) or (c)  or a
> >> combination of both.
> >>
> >> *Some Examples*
> >> *
> >> *
> >> 1) Pack a maximum of 100 records into a data frame and push it
> downstream.
> >>
> >>  create feed my_feed using my_adaptor
> >> (("parser-policy"="counter_timer_expired"), ("batch-size"="100"), ...
> >> other parameters);
> >>
> >> 2) Wait till 2 seconds and send however many records collected in a
> frame
> >> downstream.
> >>  create feed my_feed using my_adaptor
> >> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2")...
> >> other parameters);
> >>
> >> 3) Wait till 100 records have been collected into a data frame or 2
> >> seconds have elapsed since the last record was put into the current data
> >> frame.
> >>  create feed my_feed using my_adaptor
> >> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2"),
> >> ("batch-size"="100"),... other parameters);
> >>
> >>
> >> *Note*
> >> The above config parameters are not specific to using a particular
> >> implementation of an adaptor but are available for use with any feed
> >> adaptor. Some adaptors that ship with AsterixDB use different default
> >> values for above to suit their specific scenario. E.g. the pull-based
> >> twitter adaptor uses "counter_timer_expired" as the "parser-policy" and
> >> sets the  parameter "batch-interval".
> >>
> >>
> >> Regards,
> >> Raman
> >> PS: The names of the parameters described above are not as intuitive as
> >> one would like them to be. The names need to be changed.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Oct 22, 2015 at 9:09 AM, Mike Carey <dtabass@gmail.com <mailto:
> >> dtabass@gmail.com>> wrote:
> >>
> >>     I think we need to have tuning parameters - like batch size and
> >>     maximum tolerable latency (in case there's a lull and you still
> >>     want to push stuff with some worst-case delay). @Raman Grover -
> >>     remind me (us) what's available in this regard?
> >>
> >>     On 10/22/15 4:29 AM, Pääkkönen Pekka wrote:
> >>
> >>>
> >>>     Hi,
> >>>
> >>>     Yes, you are right. I tried sending a larger amount of data, and
> >>>     data is now stored to the database.
> >>>
> >>>     Does it make sense to configure a smaller batch size in order to
> >>>     get more frequent writes?
> >>>
> >>>     Or would it significantly impact performance?
> >>>
> >>>     -Pekka
> >>>
> >>>     Data moves through the pipeline in frame-sized batches, so one
> >>>
> >>>     (uniformed :-)) guess is that you aren't running very long, and
> >>>     you're
> >>>
> >>>     only seeing the data flow when you close because only then do you
> >>>     have a
> >>>
> >>>     batch's worth.  Is that possible?  You can test this by running
> >>>     longer
> >>>
> >>>     (more data) and seeing if you start to see the expected incremental
> >>>
> >>>     flow/inserts. (And we need tunability in this area, e.g.,
> >>>     parameters on
> >>>
> >>>     how much batching and/or low much latency to tolerate on each
> feed.)
> >>>
> >>>     On 10/21/15 4:45 AM, Pääkkönen Pekka wrote:
> >>>
> >>>     >
> >>>
> >>>     > Hi,
> >>>
> >>>     >
> >>>
> >>>     > Thanks, now I am able to create a socket feed, and save items to
> >>> the
> >>>
> >>>     > dataset from the feed.
> >>>
> >>>     >
> >>>
> >>>     > It seems that data items are written to the dataset after I close
> >>> the
> >>>
> >>>     > socket at the client.
> >>>
> >>>     >
> >>>
> >>>     > Is there some way to indicate to AsterixDB feed (with a newline
> or
> >>>
> >>>     > other indicator) that data can be written to the database, when
> the
> >>>
> >>>     > connection is open?
> >>>
> >>>     >
> >>>
> >>>     > After I close the socket at the client, the feed seems to close
> >>> down.
> >>>
> >>>     > Or is it only paused, until it is resumed?
> >>>
> >>>     >
> >>>
> >>>     > -Pekka
> >>>
> >>>     >
> >>>
> >>>     > Hi Pekka,
> >>>
> >>>     >
> >>>
> >>>     > That's interesting, I'm not sure why the CC would appear as being
> >>> down
> >>>
> >>>     >
> >>>
> >>>     > to Managix. However if you can access the web console, it that
> >>>
> >>>     >
> >>>
> >>>     > evidently isn't the case.
> >>>
> >>>     >
> >>>
> >>>     > As for data ingestion via sockets, yes it is possible, but it
> kind
> >>> of
> >>>
> >>>     >
> >>>
> >>>     > depends on what's meant by sockets. There's no tutorial for it,
> but
> >>>
> >>>     >
> >>>
> >>>     > take a look at SocketBasedFeedAdapter in the source, as well as
> >>>
> >>>     >
> >>>
> >>>     >
> >>>
> https://github.com/kisskys/incubator-asterixdb/blob/kisskys/indexonlyhilbertbtree/asterix-experiments/src/main/java/org/apache/asterix/experiment/client/SocketTweetGenerator.java
> >>>
> >>>     >
> >>>
> >>>     > for some examples of how it works.
> >>>
> >>>     >
> >>>
> >>>     > Hope that helps!
> >>>
> >>>     >
> >>>
> >>>     > Thanks,
> >>>
> >>>     >
> >>>
> >>>     > -Ian
> >>>
> >>>     >
> >>>
> >>>     > On Mon, Oct 19, 2015 at 10:15 PM, Pääkkönen Pekka
> >>>
> >>>     ><Pe...@vtt.fi> <ma...@vtt.fi> wrote:
> >>>
> >>>     > > Hi Ian,
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > Thanks for the reply.
> >>>
> >>>     > >
> >>>
> >>>     > > I compiled AsterixDB v0.87 and started it.
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > However, I get the following warnings:
> >>>
> >>>     > >
> >>>
> >>>     > > INFO: Name:my_asterix
> >>>
> >>>     > >
> >>>
> >>>     > > Created:Mon Oct 19 08:37:16 UTC 2015
> >>>
> >>>     > >
> >>>
> >>>     > > Web-Url:http://192.168.101.144:19001
> >>>
> >>>     > >
> >>>
> >>>     > > State:UNUSABLE
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > WARNING!:Cluster Controller not running at master
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > Also, I see the following warnings in my_asterixdb1.log. there
> >>>     are no
> >>>
> >>>     > > warnings or errors in cc.log
> >>>
> >>>     > >
> >>>
> >>>     > > “
> >>>
> >>>     > >
> >>>
> >>>     > > Oct 19, 2015 8:37:39 AM
> >>>
> >>>     > > org.apache.hyracks.api.lifecycle.LifeCycleComponentManager
> >>> configure
> >>>
> >>>     > >
> >>>
> >>>     > > SEVERE: LifecycleComponentManager configured
> >>>
> >>>     > >
> >>> org.apache.hyracks.api.lifecycle.LifeCycleComponentManager@7559ec47
> >>>
> >>>     > >
> >>>
> >>>     > > ..
> >>>
> >>>     > >
> >>>
> >>>     > > INFO: Completed sharp checkpoint.
> >>>
> >>>     > >
> >>>
> >>>     > > Oct 19, 2015 8:37:40 AM
> >>>     org.apache.asterix.om.util.AsterixClusterProperties
> >>>
> >>>     > > getIODevices
> >>>
> >>>     > >
> >>>
> >>>     > > WARNING: Configuration parameters for nodeId my_asterix_node1
> >>>     not found. The
> >>>
> >>>     > > node has not joined yet or has left.
> >>>
> >>>     > >
> >>>
> >>>     > > Oct 19, 2015 8:37:40 AM
> >>>     org.apache.asterix.om.util.AsterixClusterProperties
> >>>
> >>>     > > getIODevices
> >>>
> >>>     > >
> >>>
> >>>     > > WARNING: Configuration parameters for nodeId my_asterix_node1
> >>>     not found. The
> >>>
> >>>     > > node has not joined yet or has left.
> >>>
> >>>     > >
> >>>
> >>>     > > Oct 19, 2015 8:38:38 AM
> >>>
> >>>     > > org.apache.hyracks.control.common.dataset.ResultStateSweeper
> >>> sweep
> >>>
> >>>     > >
> >>>
> >>>     > > INFO: Result state cleanup instance successfully completed.”
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > I seems that AsterixDB is running, and I can access it at port
> >>> 19001.
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > The documentation shows ingestion of tweets, but I would be
> >>>     interested in
> >>>
> >>>     > > using sockets.
> >>>
> >>>     > >
> >>>
> >>>     > > Is it possible to ingest data from sockets?
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > Regards,
> >>>
> >>>     > >
> >>>
> >>>     > > -Pekka
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > Hey there Pekka,
> >>>
> >>>     > >
> >>>
> >>>     > > Your intuition is correct, most of the newer feeds features are
> >>> in the
> >>>
> >>>     > >
> >>>
> >>>     > > current master branch and not in the (very) old 0.8.6 release.
> >>>     If you'd
> >>>
> >>>     > >
> >>>
> >>>     > > like to experiment with them you'll have to build from source.
> >>> The
> >>>     details
> >>>
> >>>     > >
> >>>
> >>>     > > about that are here:
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> https://asterixdb.incubator.apache.org/dev-setup.html#setting-up-an-asterix-development-environment-in-eclipse
> >>>
> >>>     > >
> >>>
> >>>     > > , but they're probably a bit overkill for just trying to get
> the
> >>>     compiled
> >>>
> >>>     > >
> >>>
> >>>     > > binaries. For that all you really need to do is :
> >>>
> >>>     > >
> >>>
> >>>     > > - Clone Hyracks from git
> >>>
> >>>     > >
> >>>
> >>>     > > - 'mvn clean install -DskipTests'
> >>>
> >>>     > >
> >>>
> >>>     > > - Clone AsterixDB
> >>>
> >>>     > >
> >>>
> >>>     > > - 'mvn clean package -DskipTests'
> >>>
> >>>     > >
> >>>
> >>>     > > Then, the binaries will sit in asterix-installer/target
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > For an example, the documentation shows how to set up a feed
> >>> that's
> >>>
> >>>     > >
> >>>
> >>>     > > ingesting Tweets:
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> https://asterix-jenkins.ics.uci.edu/job/asterix-test-full/site/asterix-doc/feeds/tutorial.html
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > Thanks,
> >>>
> >>>     > >
> >>>
> >>>     > > -Ian
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > > On Wed, Oct 7, 2015 at 9:48 PM, Pääkkönen Pekka
> >>>     <Pe...@vtt.fi> <ma...@vtt.fi>
> >>>
> >>>     > >
> >>>
> >>>     > > wrote:
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     > >> Hi,
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >> I would like to experiment with a socket-based feed.
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >> Can you point me to an example on how to utilize them?
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >> Do I need to install 0.8.7-snapshot version of AsterixDB in
> >>> order to
> >>>
> >>>     > >
> >>>
> >>>     > >> experiment with feeds?
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >> Regards,
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >> -Pekka Pääkkönen
> >>>
> >>>     > >
> >>>
> >>>     > >>
> >>>
> >>>     > >
> >>>
> >>>     > >
> >>>
> >>>     >
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> Raman
> >>
> >
> >
>
>
> --
>
> -----------------
> Best Regards
>
> Jianfeng Jia
> Ph.D. Candidate of Computer Science
> University of California, Irvine
>

Re: Socket feed questions

Posted by Jianfeng Jia <ji...@gmail.com>.
Hi Devs,

I have two related questions,
1. Is there any example code of using UDF in feed-adapter?
2. Can we use AQL function in those kind of feed UDFs?

Thank you.

On Tue, Oct 27, 2015 at 9:54 PM, Michael Carey <mj...@ics.uci.edu> wrote:

> Thanks!
>
> On 10/27/15 9:48 AM, Raman Grover wrote:
>
>> Hi,
>>
>>
>> In the case when data is being received from an external source (e.g.
>> during feed ingestion), a slow rate of arrival of data may result in
>> excessive delays until the data is deposited into the target dataset and
>> made accessible to queries. Data moves along a data ingestion pipeline
>> between operators as packed fixed size frames. The default behavior is to
>> wait for the frame to be full before dispatching the contained data to the
>> downstream operator. However, as noted, this may not suit all scenarios
>> particularly when data source is sending data at a low rate. To cater to
>> different scenarios, AsterixDB allows configuring the behavior. The
>> different options are described next.
>>
>> *Push data downstream when*
>> (a) Frame is full (default)
>> (b) At least N records (data items) have been collected into a partially
>> filled frame
>> (c) At least T seconds have elapsed since the last record was put into
>> the frame
>>
>> *How to configure the behavior?*
>> At the time of defining a feed, an end-user may specify configuration
>> parameters that determine the runtime behavior (options (a), (b) or (c)
>> from above).
>>
>> The parameters are described below:
>>
>> /"parser-policy"/: A specific strategy chosen from a set of pre-defined
>> values -
>>   (i) / "frame_full"/
>>  This is the default value. As the name suggests, this choice causes
>> frames to be pushed by the feed adaptor only when there isn't sufficient
>> space for an additional record to fit in. This corresponds to option (a).
>>
>>  (ii) / "counter_timer_expired" /
>>  Use this as the value if you wish to set either option (b) or (c)  or a
>> combination of both.
>>
>> *Some Examples*
>> *
>> *
>> 1) Pack a maximum of 100 records into a data frame and push it downstream.
>>
>>  create feed my_feed using my_adaptor
>> (("parser-policy"="counter_timer_expired"), ("batch-size"="100"), ...
>> other parameters);
>>
>> 2) Wait till 2 seconds and send however many records collected in a frame
>> downstream.
>>  create feed my_feed using my_adaptor
>> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2")...
>> other parameters);
>>
>> 3) Wait till 100 records have been collected into a data frame or 2
>> seconds have elapsed since the last record was put into the current data
>> frame.
>>  create feed my_feed using my_adaptor
>> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2"),
>> ("batch-size"="100"),... other parameters);
>>
>>
>> *Note*
>> The above config parameters are not specific to using a particular
>> implementation of an adaptor but are available for use with any feed
>> adaptor. Some adaptors that ship with AsterixDB use different default
>> values for above to suit their specific scenario. E.g. the pull-based
>> twitter adaptor uses "counter_timer_expired" as the "parser-policy" and
>> sets the  parameter "batch-interval".
>>
>>
>> Regards,
>> Raman
>> PS: The names of the parameters described above are not as intuitive as
>> one would like them to be. The names need to be changed.
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Oct 22, 2015 at 9:09 AM, Mike Carey <dtabass@gmail.com <mailto:
>> dtabass@gmail.com>> wrote:
>>
>>     I think we need to have tuning parameters - like batch size and
>>     maximum tolerable latency (in case there's a lull and you still
>>     want to push stuff with some worst-case delay). @Raman Grover -
>>     remind me (us) what's available in this regard?
>>
>>     On 10/22/15 4:29 AM, Pääkkönen Pekka wrote:
>>
>>>
>>>     Hi,
>>>
>>>     Yes, you are right. I tried sending a larger amount of data, and
>>>     data is now stored to the database.
>>>
>>>     Does it make sense to configure a smaller batch size in order to
>>>     get more frequent writes?
>>>
>>>     Or would it significantly impact performance?
>>>
>>>     -Pekka
>>>
>>>     Data moves through the pipeline in frame-sized batches, so one
>>>
>>>     (uniformed :-)) guess is that you aren't running very long, and
>>>     you're
>>>
>>>     only seeing the data flow when you close because only then do you
>>>     have a
>>>
>>>     batch's worth.  Is that possible?  You can test this by running
>>>     longer
>>>
>>>     (more data) and seeing if you start to see the expected incremental
>>>
>>>     flow/inserts. (And we need tunability in this area, e.g.,
>>>     parameters on
>>>
>>>     how much batching and/or low much latency to tolerate on each feed.)
>>>
>>>     On 10/21/15 4:45 AM, Pääkkönen Pekka wrote:
>>>
>>>     >
>>>
>>>     > Hi,
>>>
>>>     >
>>>
>>>     > Thanks, now I am able to create a socket feed, and save items to
>>> the
>>>
>>>     > dataset from the feed.
>>>
>>>     >
>>>
>>>     > It seems that data items are written to the dataset after I close
>>> the
>>>
>>>     > socket at the client.
>>>
>>>     >
>>>
>>>     > Is there some way to indicate to AsterixDB feed (with a newline or
>>>
>>>     > other indicator) that data can be written to the database, when the
>>>
>>>     > connection is open?
>>>
>>>     >
>>>
>>>     > After I close the socket at the client, the feed seems to close
>>> down.
>>>
>>>     > Or is it only paused, until it is resumed?
>>>
>>>     >
>>>
>>>     > -Pekka
>>>
>>>     >
>>>
>>>     > Hi Pekka,
>>>
>>>     >
>>>
>>>     > That's interesting, I'm not sure why the CC would appear as being
>>> down
>>>
>>>     >
>>>
>>>     > to Managix. However if you can access the web console, it that
>>>
>>>     >
>>>
>>>     > evidently isn't the case.
>>>
>>>     >
>>>
>>>     > As for data ingestion via sockets, yes it is possible, but it kind
>>> of
>>>
>>>     >
>>>
>>>     > depends on what's meant by sockets. There's no tutorial for it, but
>>>
>>>     >
>>>
>>>     > take a look at SocketBasedFeedAdapter in the source, as well as
>>>
>>>     >
>>>
>>>     >
>>> https://github.com/kisskys/incubator-asterixdb/blob/kisskys/indexonlyhilbertbtree/asterix-experiments/src/main/java/org/apache/asterix/experiment/client/SocketTweetGenerator.java
>>>
>>>     >
>>>
>>>     > for some examples of how it works.
>>>
>>>     >
>>>
>>>     > Hope that helps!
>>>
>>>     >
>>>
>>>     > Thanks,
>>>
>>>     >
>>>
>>>     > -Ian
>>>
>>>     >
>>>
>>>     > On Mon, Oct 19, 2015 at 10:15 PM, Pääkkönen Pekka
>>>
>>>     ><Pe...@vtt.fi> <ma...@vtt.fi> wrote:
>>>
>>>     > > Hi Ian,
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > > Thanks for the reply.
>>>
>>>     > >
>>>
>>>     > > I compiled AsterixDB v0.87 and started it.
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > > However, I get the following warnings:
>>>
>>>     > >
>>>
>>>     > > INFO: Name:my_asterix
>>>
>>>     > >
>>>
>>>     > > Created:Mon Oct 19 08:37:16 UTC 2015
>>>
>>>     > >
>>>
>>>     > > Web-Url:http://192.168.101.144:19001
>>>
>>>     > >
>>>
>>>     > > State:UNUSABLE
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > > WARNING!:Cluster Controller not running at master
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > > Also, I see the following warnings in my_asterixdb1.log. there
>>>     are no
>>>
>>>     > > warnings or errors in cc.log
>>>
>>>     > >
>>>
>>>     > > “
>>>
>>>     > >
>>>
>>>     > > Oct 19, 2015 8:37:39 AM
>>>
>>>     > > org.apache.hyracks.api.lifecycle.LifeCycleComponentManager
>>> configure
>>>
>>>     > >
>>>
>>>     > > SEVERE: LifecycleComponentManager configured
>>>
>>>     > >
>>> org.apache.hyracks.api.lifecycle.LifeCycleComponentManager@7559ec47
>>>
>>>     > >
>>>
>>>     > > ..
>>>
>>>     > >
>>>
>>>     > > INFO: Completed sharp checkpoint.
>>>
>>>     > >
>>>
>>>     > > Oct 19, 2015 8:37:40 AM
>>>     org.apache.asterix.om.util.AsterixClusterProperties
>>>
>>>     > > getIODevices
>>>
>>>     > >
>>>
>>>     > > WARNING: Configuration parameters for nodeId my_asterix_node1
>>>     not found. The
>>>
>>>     > > node has not joined yet or has left.
>>>
>>>     > >
>>>
>>>     > > Oct 19, 2015 8:37:40 AM
>>>     org.apache.asterix.om.util.AsterixClusterProperties
>>>
>>>     > > getIODevices
>>>
>>>     > >
>>>
>>>     > > WARNING: Configuration parameters for nodeId my_asterix_node1
>>>     not found. The
>>>
>>>     > > node has not joined yet or has left.
>>>
>>>     > >
>>>
>>>     > > Oct 19, 2015 8:38:38 AM
>>>
>>>     > > org.apache.hyracks.control.common.dataset.ResultStateSweeper
>>> sweep
>>>
>>>     > >
>>>
>>>     > > INFO: Result state cleanup instance successfully completed.”
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > > I seems that AsterixDB is running, and I can access it at port
>>> 19001.
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > > The documentation shows ingestion of tweets, but I would be
>>>     interested in
>>>
>>>     > > using sockets.
>>>
>>>     > >
>>>
>>>     > > Is it possible to ingest data from sockets?
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > > Regards,
>>>
>>>     > >
>>>
>>>     > > -Pekka
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > > Hey there Pekka,
>>>
>>>     > >
>>>
>>>     > > Your intuition is correct, most of the newer feeds features are
>>> in the
>>>
>>>     > >
>>>
>>>     > > current master branch and not in the (very) old 0.8.6 release.
>>>     If you'd
>>>
>>>     > >
>>>
>>>     > > like to experiment with them you'll have to build from source.
>>> The
>>>     details
>>>
>>>     > >
>>>
>>>     > > about that are here:
>>>
>>>     > >
>>>
>>>     > >
>>> https://asterixdb.incubator.apache.org/dev-setup.html#setting-up-an-asterix-development-environment-in-eclipse
>>>
>>>     > >
>>>
>>>     > > , but they're probably a bit overkill for just trying to get the
>>>     compiled
>>>
>>>     > >
>>>
>>>     > > binaries. For that all you really need to do is :
>>>
>>>     > >
>>>
>>>     > > - Clone Hyracks from git
>>>
>>>     > >
>>>
>>>     > > - 'mvn clean install -DskipTests'
>>>
>>>     > >
>>>
>>>     > > - Clone AsterixDB
>>>
>>>     > >
>>>
>>>     > > - 'mvn clean package -DskipTests'
>>>
>>>     > >
>>>
>>>     > > Then, the binaries will sit in asterix-installer/target
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > > For an example, the documentation shows how to set up a feed
>>> that's
>>>
>>>     > >
>>>
>>>     > > ingesting Tweets:
>>>
>>>     > >
>>>
>>>     > >
>>> https://asterix-jenkins.ics.uci.edu/job/asterix-test-full/site/asterix-doc/feeds/tutorial.html
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > > Thanks,
>>>
>>>     > >
>>>
>>>     > > -Ian
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > > On Wed, Oct 7, 2015 at 9:48 PM, Pääkkönen Pekka
>>>     <Pe...@vtt.fi> <ma...@vtt.fi>
>>>
>>>     > >
>>>
>>>     > > wrote:
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >> Hi,
>>>
>>>     > >
>>>
>>>     > >>
>>>
>>>     > >
>>>
>>>     > >>
>>>
>>>     > >
>>>
>>>     > >>
>>>
>>>     > >
>>>
>>>     > >> I would like to experiment with a socket-based feed.
>>>
>>>     > >
>>>
>>>     > >>
>>>
>>>     > >
>>>
>>>     > >> Can you point me to an example on how to utilize them?
>>>
>>>     > >
>>>
>>>     > >>
>>>
>>>     > >
>>>
>>>     > >> Do I need to install 0.8.7-snapshot version of AsterixDB in
>>> order to
>>>
>>>     > >
>>>
>>>     > >> experiment with feeds?
>>>
>>>     > >
>>>
>>>     > >>
>>>
>>>     > >
>>>
>>>     > >>
>>>
>>>     > >
>>>
>>>     > >>
>>>
>>>     > >
>>>
>>>     > >> Regards,
>>>
>>>     > >
>>>
>>>     > >>
>>>
>>>     > >
>>>
>>>     > >> -Pekka Pääkkönen
>>>
>>>     > >
>>>
>>>     > >>
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     >
>>>
>>>
>>
>>
>>
>> --
>> Raman
>>
>
>


-- 

-----------------
Best Regards

Jianfeng Jia
Ph.D. Candidate of Computer Science
University of California, Irvine

Re: Socket feed questions

Posted by Jianfeng Jia <ji...@gmail.com>.
Hi Devs,

I have two related questions,
1. Is there any example code of using UDF in feed-adapter?
2. Can we use AQL function in those kind of feed UDFs?

Thank you.

On Tue, Oct 27, 2015 at 9:54 PM, Michael Carey <mj...@ics.uci.edu> wrote:

> Thanks!
>
> On 10/27/15 9:48 AM, Raman Grover wrote:
>
>> Hi,
>>
>>
>> In the case when data is being received from an external source (e.g.
>> during feed ingestion), a slow rate of arrival of data may result in
>> excessive delays until the data is deposited into the target dataset and
>> made accessible to queries. Data moves along a data ingestion pipeline
>> between operators as packed fixed size frames. The default behavior is to
>> wait for the frame to be full before dispatching the contained data to the
>> downstream operator. However, as noted, this may not suit all scenarios
>> particularly when data source is sending data at a low rate. To cater to
>> different scenarios, AsterixDB allows configuring the behavior. The
>> different options are described next.
>>
>> *Push data downstream when*
>> (a) Frame is full (default)
>> (b) At least N records (data items) have been collected into a partially
>> filled frame
>> (c) At least T seconds have elapsed since the last record was put into
>> the frame
>>
>> *How to configure the behavior?*
>> At the time of defining a feed, an end-user may specify configuration
>> parameters that determine the runtime behavior (options (a), (b) or (c)
>> from above).
>>
>> The parameters are described below:
>>
>> /"parser-policy"/: A specific strategy chosen from a set of pre-defined
>> values -
>>   (i) / "frame_full"/
>>  This is the default value. As the name suggests, this choice causes
>> frames to be pushed by the feed adaptor only when there isn't sufficient
>> space for an additional record to fit in. This corresponds to option (a).
>>
>>  (ii) / "counter_timer_expired" /
>>  Use this as the value if you wish to set either option (b) or (c)  or a
>> combination of both.
>>
>> *Some Examples*
>> *
>> *
>> 1) Pack a maximum of 100 records into a data frame and push it downstream.
>>
>>  create feed my_feed using my_adaptor
>> (("parser-policy"="counter_timer_expired"), ("batch-size"="100"), ...
>> other parameters);
>>
>> 2) Wait till 2 seconds and send however many records collected in a frame
>> downstream.
>>  create feed my_feed using my_adaptor
>> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2")...
>> other parameters);
>>
>> 3) Wait till 100 records have been collected into a data frame or 2
>> seconds have elapsed since the last record was put into the current data
>> frame.
>>  create feed my_feed using my_adaptor
>> (("parser-policy"="counter_timer_expired"), ("batch-interval"="2"),
>> ("batch-size"="100"),... other parameters);
>>
>>
>> *Note*
>> The above config parameters are not specific to using a particular
>> implementation of an adaptor but are available for use with any feed
>> adaptor. Some adaptors that ship with AsterixDB use different default
>> values for above to suit their specific scenario. E.g. the pull-based
>> twitter adaptor uses "counter_timer_expired" as the "parser-policy" and
>> sets the  parameter "batch-interval".
>>
>>
>> Regards,
>> Raman
>> PS: The names of the parameters described above are not as intuitive as
>> one would like them to be. The names need to be changed.
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Oct 22, 2015 at 9:09 AM, Mike Carey <dtabass@gmail.com <mailto:
>> dtabass@gmail.com>> wrote:
>>
>>     I think we need to have tuning parameters - like batch size and
>>     maximum tolerable latency (in case there's a lull and you still
>>     want to push stuff with some worst-case delay). @Raman Grover -
>>     remind me (us) what's available in this regard?
>>
>>     On 10/22/15 4:29 AM, Pääkkönen Pekka wrote:
>>
>>>
>>>     Hi,
>>>
>>>     Yes, you are right. I tried sending a larger amount of data, and
>>>     data is now stored to the database.
>>>
>>>     Does it make sense to configure a smaller batch size in order to
>>>     get more frequent writes?
>>>
>>>     Or would it significantly impact performance?
>>>
>>>     -Pekka
>>>
>>>     Data moves through the pipeline in frame-sized batches, so one
>>>
>>>     (uniformed :-)) guess is that you aren't running very long, and
>>>     you're
>>>
>>>     only seeing the data flow when you close because only then do you
>>>     have a
>>>
>>>     batch's worth.  Is that possible?  You can test this by running
>>>     longer
>>>
>>>     (more data) and seeing if you start to see the expected incremental
>>>
>>>     flow/inserts. (And we need tunability in this area, e.g.,
>>>     parameters on
>>>
>>>     how much batching and/or low much latency to tolerate on each feed.)
>>>
>>>     On 10/21/15 4:45 AM, Pääkkönen Pekka wrote:
>>>
>>>     >
>>>
>>>     > Hi,
>>>
>>>     >
>>>
>>>     > Thanks, now I am able to create a socket feed, and save items to
>>> the
>>>
>>>     > dataset from the feed.
>>>
>>>     >
>>>
>>>     > It seems that data items are written to the dataset after I close
>>> the
>>>
>>>     > socket at the client.
>>>
>>>     >
>>>
>>>     > Is there some way to indicate to AsterixDB feed (with a newline or
>>>
>>>     > other indicator) that data can be written to the database, when the
>>>
>>>     > connection is open?
>>>
>>>     >
>>>
>>>     > After I close the socket at the client, the feed seems to close
>>> down.
>>>
>>>     > Or is it only paused, until it is resumed?
>>>
>>>     >
>>>
>>>     > -Pekka
>>>
>>>     >
>>>
>>>     > Hi Pekka,
>>>
>>>     >
>>>
>>>     > That's interesting, I'm not sure why the CC would appear as being
>>> down
>>>
>>>     >
>>>
>>>     > to Managix. However if you can access the web console, it that
>>>
>>>     >
>>>
>>>     > evidently isn't the case.
>>>
>>>     >
>>>
>>>     > As for data ingestion via sockets, yes it is possible, but it kind
>>> of
>>>
>>>     >
>>>
>>>     > depends on what's meant by sockets. There's no tutorial for it, but
>>>
>>>     >
>>>
>>>     > take a look at SocketBasedFeedAdapter in the source, as well as
>>>
>>>     >
>>>
>>>     >
>>> https://github.com/kisskys/incubator-asterixdb/blob/kisskys/indexonlyhilbertbtree/asterix-experiments/src/main/java/org/apache/asterix/experiment/client/SocketTweetGenerator.java
>>>
>>>     >
>>>
>>>     > for some examples of how it works.
>>>
>>>     >
>>>
>>>     > Hope that helps!
>>>
>>>     >
>>>
>>>     > Thanks,
>>>
>>>     >
>>>
>>>     > -Ian
>>>
>>>     >
>>>
>>>     > On Mon, Oct 19, 2015 at 10:15 PM, Pääkkönen Pekka
>>>
>>>     ><Pe...@vtt.fi> <ma...@vtt.fi> wrote:
>>>
>>>     > > Hi Ian,
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > > Thanks for the reply.
>>>
>>>     > >
>>>
>>>     > > I compiled AsterixDB v0.87 and started it.
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > > However, I get the following warnings:
>>>
>>>     > >
>>>
>>>     > > INFO: Name:my_asterix
>>>
>>>     > >
>>>
>>>     > > Created:Mon Oct 19 08:37:16 UTC 2015
>>>
>>>     > >
>>>
>>>     > > Web-Url:http://192.168.101.144:19001
>>>
>>>     > >
>>>
>>>     > > State:UNUSABLE
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > > WARNING!:Cluster Controller not running at master
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > > Also, I see the following warnings in my_asterixdb1.log. there
>>>     are no
>>>
>>>     > > warnings or errors in cc.log
>>>
>>>     > >
>>>
>>>     > > “
>>>
>>>     > >
>>>
>>>     > > Oct 19, 2015 8:37:39 AM
>>>
>>>     > > org.apache.hyracks.api.lifecycle.LifeCycleComponentManager
>>> configure
>>>
>>>     > >
>>>
>>>     > > SEVERE: LifecycleComponentManager configured
>>>
>>>     > >
>>> org.apache.hyracks.api.lifecycle.LifeCycleComponentManager@7559ec47
>>>
>>>     > >
>>>
>>>     > > ..
>>>
>>>     > >
>>>
>>>     > > INFO: Completed sharp checkpoint.
>>>
>>>     > >
>>>
>>>     > > Oct 19, 2015 8:37:40 AM
>>>     org.apache.asterix.om.util.AsterixClusterProperties
>>>
>>>     > > getIODevices
>>>
>>>     > >
>>>
>>>     > > WARNING: Configuration parameters for nodeId my_asterix_node1
>>>     not found. The
>>>
>>>     > > node has not joined yet or has left.
>>>
>>>     > >
>>>
>>>     > > Oct 19, 2015 8:37:40 AM
>>>     org.apache.asterix.om.util.AsterixClusterProperties
>>>
>>>     > > getIODevices
>>>
>>>     > >
>>>
>>>     > > WARNING: Configuration parameters for nodeId my_asterix_node1
>>>     not found. The
>>>
>>>     > > node has not joined yet or has left.
>>>
>>>     > >
>>>
>>>     > > Oct 19, 2015 8:38:38 AM
>>>
>>>     > > org.apache.hyracks.control.common.dataset.ResultStateSweeper
>>> sweep
>>>
>>>     > >
>>>
>>>     > > INFO: Result state cleanup instance successfully completed.”
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > > I seems that AsterixDB is running, and I can access it at port
>>> 19001.
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > > The documentation shows ingestion of tweets, but I would be
>>>     interested in
>>>
>>>     > > using sockets.
>>>
>>>     > >
>>>
>>>     > > Is it possible to ingest data from sockets?
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > > Regards,
>>>
>>>     > >
>>>
>>>     > > -Pekka
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > > Hey there Pekka,
>>>
>>>     > >
>>>
>>>     > > Your intuition is correct, most of the newer feeds features are
>>> in the
>>>
>>>     > >
>>>
>>>     > > current master branch and not in the (very) old 0.8.6 release.
>>>     If you'd
>>>
>>>     > >
>>>
>>>     > > like to experiment with them you'll have to build from source.
>>> The
>>>     details
>>>
>>>     > >
>>>
>>>     > > about that are here:
>>>
>>>     > >
>>>
>>>     > >
>>> https://asterixdb.incubator.apache.org/dev-setup.html#setting-up-an-asterix-development-environment-in-eclipse
>>>
>>>     > >
>>>
>>>     > > , but they're probably a bit overkill for just trying to get the
>>>     compiled
>>>
>>>     > >
>>>
>>>     > > binaries. For that all you really need to do is :
>>>
>>>     > >
>>>
>>>     > > - Clone Hyracks from git
>>>
>>>     > >
>>>
>>>     > > - 'mvn clean install -DskipTests'
>>>
>>>     > >
>>>
>>>     > > - Clone AsterixDB
>>>
>>>     > >
>>>
>>>     > > - 'mvn clean package -DskipTests'
>>>
>>>     > >
>>>
>>>     > > Then, the binaries will sit in asterix-installer/target
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > > For an example, the documentation shows how to set up a feed
>>> that's
>>>
>>>     > >
>>>
>>>     > > ingesting Tweets:
>>>
>>>     > >
>>>
>>>     > >
>>> https://asterix-jenkins.ics.uci.edu/job/asterix-test-full/site/asterix-doc/feeds/tutorial.html
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > > Thanks,
>>>
>>>     > >
>>>
>>>     > > -Ian
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > > On Wed, Oct 7, 2015 at 9:48 PM, Pääkkönen Pekka
>>>     <Pe...@vtt.fi> <ma...@vtt.fi>
>>>
>>>     > >
>>>
>>>     > > wrote:
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     > >> Hi,
>>>
>>>     > >
>>>
>>>     > >>
>>>
>>>     > >
>>>
>>>     > >>
>>>
>>>     > >
>>>
>>>     > >>
>>>
>>>     > >
>>>
>>>     > >> I would like to experiment with a socket-based feed.
>>>
>>>     > >
>>>
>>>     > >>
>>>
>>>     > >
>>>
>>>     > >> Can you point me to an example on how to utilize them?
>>>
>>>     > >
>>>
>>>     > >>
>>>
>>>     > >
>>>
>>>     > >> Do I need to install 0.8.7-snapshot version of AsterixDB in
>>> order to
>>>
>>>     > >
>>>
>>>     > >> experiment with feeds?
>>>
>>>     > >
>>>
>>>     > >>
>>>
>>>     > >
>>>
>>>     > >>
>>>
>>>     > >
>>>
>>>     > >>
>>>
>>>     > >
>>>
>>>     > >> Regards,
>>>
>>>     > >
>>>
>>>     > >>
>>>
>>>     > >
>>>
>>>     > >> -Pekka Pääkkönen
>>>
>>>     > >
>>>
>>>     > >>
>>>
>>>     > >
>>>
>>>     > >
>>>
>>>     >
>>>
>>>
>>
>>
>>
>> --
>> Raman
>>
>
>


-- 

-----------------
Best Regards

Jianfeng Jia
Ph.D. Candidate of Computer Science
University of California, Irvine