You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@asterixdb.apache.org by Mike Carey <dt...@gmail.com> on 2016/08/10 16:36:09 UTC

Re: Trio: AsterixDB, Spark and Zeppelin.

Kevin,

Q:  Could you chime back in here - please 'cc' the user list - with a 
brief (maybe one paragraph) summary of what you are actually trying to 
do at the moment and what its current status is?  (And your timeframe, 
etc.?)

My impression until yesterday was that you were slowly/leisurely 
exploring the new Spark connector to AsterixDB that Wail worked on - 
essentially as his first "beta" user - and that things were moving at 
the pace you wanted (and were setting).  As an early adopter, I was also 
under the impression that you were using his branch for your 
explorations, while he was addressing code review comments, etc.  
However, when I arrived back home in OC after a trip yesterday, I was 
the recipient of a message (via a back channel) warning me that there 
was a blocking issue at SDSC that UCI wasn't being attentive to, one 
that had AsterixDB on the brink being given up on by the UCLA folks, and 
that we'd better get on it or....  (Meanwhile I had not heard any such 
thing from UCLA directly; I was not aware of any blocking Spark issues 
for SDSC nor of any transitively blocking implications for UCLA, and it 
still doesn't look from what I see below like there was one.)

I think that we need to have SDSC's activities be much more visible here 
- likewise for UCLA's - so that the Apache AsterixDB community has much 
better visibility into the goals, activities, progress, and problems of 
our early adopters.  The community wants users to be successful!  It 
will be much more effective (and healthy and productive) if we all know 
what's going on and it is clear to all how each of those things are going.

Thanks!

Mike


On 8/10/16 8:36 AM, Wail Alkowaileet wrote:
> Hi Kevin,
>
> Cool!
> Please let me know if you need any assistance.
>
> On Aug 8, 2016 1:42 PM, "Coakley, Kevin" <kc...@sdsc.edu> wrote:
>
>> Hi Wail,
>>
>> I figure out the problem, AsterixDB was configured for 127.0.0.1. The
>> notebook at https://github.com/Nullification/asterixdb-spark-
>> connector/blob/master/zeppelin-notebook/asterixdb-spark-example/note.json
>> ran successfully once I recreated the AsterixDB instance to use the
>> external IP.
>>
>> I have not ran any of my own queries but I did get both of the examples
>> https://github.com/Nullification/asterixdb-spark-connector to run
>> successfully.
>>
>> Thank you!
>>
>> -Kevin
>>
>>
>>
>> On 8/3/16, 10:23 AM, "Wail Alkowaileet" <wa...@gmail.com> wrote:
>>
>>      One more thing:
>>      Can you paste your cluster configuration as well?
>>
>>      Thanks
>>
>>     (ETC ETC ETC deleted)


RE: Trio: AsterixDB, Spark and Zeppelin.

Posted by "Gupta, Amarnath" <a1...@ucsd.edu>.
Mike:

The timetable actually comes from Sean and Wei, whom we are trying to serve through our efforts at SDSC. There are three issues we are trying to handle right now.

1. The 2015 Twitter data set, which UCLA needs access to, is large, and we often discover that the actual schema shifts over time, causing a cascade of failures that Kevin and Ian are sorting out.
2. While we originally put Spark integration later in our timetable, Wei's group needs access to it now. I spoke to the student and she has an algorithm that is waiting to be tested for larger scale data.
3. *Some* aggregate queries, curiously take a really long time. Ian is also aware of this. Since aggregate queries are very common for machine learning explorations, I would feel better if these queries execute within a reasonable time.

I appreciate your comment about getting higher visibility of SDSC as an early adopter of AsterixDB. Kevin has been spending a lot of time with AsterixDB and fielding requirements we all dump on him. I am hoping that we can actually achieve something concrete through this effort.

Thanks,

Amarnath
________________________________
From: Michael Carey [mjcarey@ics.uci.edu]
Sent: Wednesday, August 10, 2016 4:41 PM
To: users@asterixdb.apache.org; dev@asterixdb.apache.org
Cc: Gupta, Amarnath; Sean Young
Subject: Re: Trio: AsterixDB, Spark and Zeppelin.


Kevin,

Thanks!  That helps a lot.  Now what we need to know (possibly above your pay grade :-)) is what the timetable is for UCLA (i) wanting to get the results of your assessment of how well what's there works and meets their needs and (ii) wanting to put stuff into production (and at what scale).  I don't anticipate the review and merging taking forever, but this will be Wail's first AsterixDB code contribution - last I knew he was addressing initial reviewing comments (and I'm not sure if all reviews are done yet) - but I think we next need to ask UCLA/Sean/Amarnath for the timetable info.

Cheers,

Mike

On 8/10/16 1:33 PM, Coakley, Kevin wrote:

Mike,

UCLA wanted a way to do use Spark’s Machine Learning packages with data stored in AsterixDB. We started looking at the Spark connector as way to access the data in AsterixDB directly instead of having to export the data from AsterixDB to a file and import the file in Spark. I don’t know how this is fits into Amarnath’s projects, I was just following up on a request from UCLA to see what would be involved in providing this Spark connector to others.

The current status is: I have the Spark connector working in a test environment with the queries provided by Wail. I was planning on loading a small amount of data into the test AsterixDB server with the Schema Inferencer code and running my own queries, but I have not had time yet. The issue with providing others with access to the Spark connector is the version of AsterixDB that we are running that contains the Twitter data does not have the Schema Inferencer code and therefor will not work with the Spark connector.

I don’t believe SDSC would want to update the AsterixDB servers that contain the Twitter data with the Schema Inferencer code until after it has been approved by you and merged into the master branch. However, even after the Schema Inferencer code has been merged into the master branch, we wouldn’t have it ready of people to use right away.

I offered to load a small subset of the data from our main servers into my test environment that has a working Spark connector for UCLA to test, but it sounds like they misunderstood my offer.

I would be happy to help you test the Schema Inferencer and Spark connector if you have specific items that you want me to check, I can also give others that you select access to test environment so they can run tests themselves. Otherwise, I will respond here if I discover any issues.

My current test environment is Zeppelin with the Spark connector on server A, AsterixDB with the Schema Inferencer code on server B and a Spark 1.6.0 cluster running on servers C, D and E.

-Kevin



On 8/10/16, 9:36 AM, "Mike Carey" <dt...@gmail.com> wrote:

    Kevin,

    Q:  Could you chime back in here - please 'cc' the user list - with a
    brief (maybe one paragraph) summary of what you are actually trying to
    do at the moment and what its current status is?  (And your timeframe,
    etc.?)

    My impression until yesterday was that you were slowly/leisurely
    exploring the new Spark connector to AsterixDB that Wail worked on -
    essentially as his first "beta" user - and that things were moving at
    the pace you wanted (and were setting).  As an early adopter, I was also
    under the impression that you were using his branch for your
    explorations, while he was addressing code review comments, etc.
    However, when I arrived back home in OC after a trip yesterday, I was
    the recipient of a message (via a back channel) warning me that there
    was a blocking issue at SDSC that UCI wasn't being attentive to, one
    that had AsterixDB on the brink being given up on by the UCLA folks, and
    that we'd better get on it or....  (Meanwhile I had not heard any such
    thing from UCLA directly; I was not aware of any blocking Spark issues
    for SDSC nor of any transitively blocking implications for UCLA, and it
    still doesn't look from what I see below like there was one.)

    I think that we need to have SDSC's activities be much more visible here
    - likewise for UCLA's - so that the Apache AsterixDB community has much
    better visibility into the goals, activities, progress, and problems of
    our early adopters.  The community wants users to be successful!  It
    will be much more effective (and healthy and productive) if we all know
    what's going on and it is clear to all how each of those things are going.

    Thanks!

    Mike


    On 8/10/16 8:36 AM, Wail Alkowaileet wrote:
    > Hi Kevin,
    >
    > Cool!
    > Please let me know if you need any assistance.
    >
    > On Aug 8, 2016 1:42 PM, "Coakley, Kevin" <kc...@sdsc.edu> wrote:
    >
    >> Hi Wail,
    >>
    >> I figure out the problem, AsterixDB was configured for 127.0.0.1. The
    >> notebook at https://github.com/Nullification/asterixdb-spark-
    >> connector/blob/master/zeppelin-notebook/asterixdb-spark-example/note.json
    >> ran successfully once I recreated the AsterixDB instance to use the
    >> external IP.
    >>
    >> I have not ran any of my own queries but I did get both of the examples
    >> https://github.com/Nullification/asterixdb-spark-connector to run
    >> successfully.
    >>
    >> Thank you!
    >>
    >> -Kevin
    >>
    >>
    >>
    >> On 8/3/16, 10:23 AM, "Wail Alkowaileet" <wa...@gmail.com> wrote:
    >>
    >>      One more thing:
    >>      Can you paste your cluster configuration as well?
    >>
    >>      Thanks
    >>
    >>     (ETC ETC ETC deleted)







Re: Trio: AsterixDB, Spark and Zeppelin.

Posted by Michael Carey <mj...@ics.uci.edu>.
Kevin,

Thanks!  That helps a lot.  Now what we need to know (possibly above 
your pay grade :-)) is what the timetable is for UCLA (i) wanting to get 
the results of your assessment of how well what's there works and meets 
their needs and (ii) wanting to put stuff into production (and at what 
scale).  I don't anticipate the review and merging taking forever, but 
this will be Wail's first AsterixDB code contribution - last I knew he 
was addressing initial reviewing comments (and I'm not sure if all 
reviews are done yet) - but I think we next need to ask 
UCLA/Sean/Amarnath for the timetable info.

Cheers,

Mike


On 8/10/16 1:33 PM, Coakley, Kevin wrote:
> Mike,
>
> UCLA wanted a way to do use Spark\u2019s Machine Learning packages with data stored in AsterixDB. We started looking at the Spark connector as way to access the data in AsterixDB directly instead of having to export the data from AsterixDB to a file and import the file in Spark. I don\u2019t know how this is fits into Amarnath\u2019s projects, I was just following up on a request from UCLA to see what would be involved in providing this Spark connector to others.
>
> The current status is: I have the Spark connector working in a test environment with the queries provided by Wail. I was planning on loading a small amount of data into the test AsterixDB server with the Schema Inferencer code and running my own queries, but I have not had time yet. The issue with providing others with access to the Spark connector is the version of AsterixDB that we are running that contains the Twitter data does not have the Schema Inferencer code and therefor will not work with the Spark connector.
>
> I don\u2019t believe SDSC would want to update the AsterixDB servers that contain the Twitter data with the Schema Inferencer code until after it has been approved by you and merged into the master branch. However, even after the Schema Inferencer code has been merged into the master branch, we wouldn\u2019t have it ready of people to use right away.
>
> I offered to load a small subset of the data from our main servers into my test environment that has a working Spark connector for UCLA to test, but it sounds like they misunderstood my offer.
>
> I would be happy to help you test the Schema Inferencer and Spark connector if you have specific items that you want me to check, I can also give others that you select access to test environment so they can run tests themselves. Otherwise, I will respond here if I discover any issues.
>
> My current test environment is Zeppelin with the Spark connector on server A, AsterixDB with the Schema Inferencer code on server B and a Spark 1.6.0 cluster running on servers C, D and E.
>
> -Kevin
>
>
>
> On 8/10/16, 9:36 AM, "Mike Carey" <dt...@gmail.com> wrote:
>
>      Kevin,
>      
>      Q:  Could you chime back in here - please 'cc' the user list - with a
>      brief (maybe one paragraph) summary of what you are actually trying to
>      do at the moment and what its current status is?  (And your timeframe,
>      etc.?)
>      
>      My impression until yesterday was that you were slowly/leisurely
>      exploring the new Spark connector to AsterixDB that Wail worked on -
>      essentially as his first "beta" user - and that things were moving at
>      the pace you wanted (and were setting).  As an early adopter, I was also
>      under the impression that you were using his branch for your
>      explorations, while he was addressing code review comments, etc.
>      However, when I arrived back home in OC after a trip yesterday, I was
>      the recipient of a message (via a back channel) warning me that there
>      was a blocking issue at SDSC that UCI wasn't being attentive to, one
>      that had AsterixDB on the brink being given up on by the UCLA folks, and
>      that we'd better get on it or....  (Meanwhile I had not heard any such
>      thing from UCLA directly; I was not aware of any blocking Spark issues
>      for SDSC nor of any transitively blocking implications for UCLA, and it
>      still doesn't look from what I see below like there was one.)
>      
>      I think that we need to have SDSC's activities be much more visible here
>      - likewise for UCLA's - so that the Apache AsterixDB community has much
>      better visibility into the goals, activities, progress, and problems of
>      our early adopters.  The community wants users to be successful!  It
>      will be much more effective (and healthy and productive) if we all know
>      what's going on and it is clear to all how each of those things are going.
>      
>      Thanks!
>      
>      Mike
>      
>      
>      On 8/10/16 8:36 AM, Wail Alkowaileet wrote:
>      > Hi Kevin,
>      >
>      > Cool!
>      > Please let me know if you need any assistance.
>      >
>      > On Aug 8, 2016 1:42 PM, "Coakley, Kevin" <kc...@sdsc.edu> wrote:
>      >
>      >> Hi Wail,
>      >>
>      >> I figure out the problem, AsterixDB was configured for 127.0.0.1. The
>      >> notebook at https://github.com/Nullification/asterixdb-spark-
>      >> connector/blob/master/zeppelin-notebook/asterixdb-spark-example/note.json
>      >> ran successfully once I recreated the AsterixDB instance to use the
>      >> external IP.
>      >>
>      >> I have not ran any of my own queries but I did get both of the examples
>      >> https://github.com/Nullification/asterixdb-spark-connector to run
>      >> successfully.
>      >>
>      >> Thank you!
>      >>
>      >> -Kevin
>      >>
>      >>
>      >>
>      >> On 8/3/16, 10:23 AM, "Wail Alkowaileet" <wa...@gmail.com> wrote:
>      >>
>      >>      One more thing:
>      >>      Can you paste your cluster configuration as well?
>      >>
>      >>      Thanks
>      >>
>      >>     (ETC ETC ETC deleted)
>      
>      
>
>


Re: Trio: AsterixDB, Spark and Zeppelin.

Posted by Michael Carey <mj...@ics.uci.edu>.
Kevin,

Thanks!  That helps a lot.  Now what we need to know (possibly above 
your pay grade :-)) is what the timetable is for UCLA (i) wanting to get 
the results of your assessment of how well what's there works and meets 
their needs and (ii) wanting to put stuff into production (and at what 
scale).  I don't anticipate the review and merging taking forever, but 
this will be Wail's first AsterixDB code contribution - last I knew he 
was addressing initial reviewing comments (and I'm not sure if all 
reviews are done yet) - but I think we next need to ask 
UCLA/Sean/Amarnath for the timetable info.

Cheers,

Mike


On 8/10/16 1:33 PM, Coakley, Kevin wrote:
> Mike,
>
> UCLA wanted a way to do use Spark\u2019s Machine Learning packages with data stored in AsterixDB. We started looking at the Spark connector as way to access the data in AsterixDB directly instead of having to export the data from AsterixDB to a file and import the file in Spark. I don\u2019t know how this is fits into Amarnath\u2019s projects, I was just following up on a request from UCLA to see what would be involved in providing this Spark connector to others.
>
> The current status is: I have the Spark connector working in a test environment with the queries provided by Wail. I was planning on loading a small amount of data into the test AsterixDB server with the Schema Inferencer code and running my own queries, but I have not had time yet. The issue with providing others with access to the Spark connector is the version of AsterixDB that we are running that contains the Twitter data does not have the Schema Inferencer code and therefor will not work with the Spark connector.
>
> I don\u2019t believe SDSC would want to update the AsterixDB servers that contain the Twitter data with the Schema Inferencer code until after it has been approved by you and merged into the master branch. However, even after the Schema Inferencer code has been merged into the master branch, we wouldn\u2019t have it ready of people to use right away.
>
> I offered to load a small subset of the data from our main servers into my test environment that has a working Spark connector for UCLA to test, but it sounds like they misunderstood my offer.
>
> I would be happy to help you test the Schema Inferencer and Spark connector if you have specific items that you want me to check, I can also give others that you select access to test environment so they can run tests themselves. Otherwise, I will respond here if I discover any issues.
>
> My current test environment is Zeppelin with the Spark connector on server A, AsterixDB with the Schema Inferencer code on server B and a Spark 1.6.0 cluster running on servers C, D and E.
>
> -Kevin
>
>
>
> On 8/10/16, 9:36 AM, "Mike Carey" <dt...@gmail.com> wrote:
>
>      Kevin,
>      
>      Q:  Could you chime back in here - please 'cc' the user list - with a
>      brief (maybe one paragraph) summary of what you are actually trying to
>      do at the moment and what its current status is?  (And your timeframe,
>      etc.?)
>      
>      My impression until yesterday was that you were slowly/leisurely
>      exploring the new Spark connector to AsterixDB that Wail worked on -
>      essentially as his first "beta" user - and that things were moving at
>      the pace you wanted (and were setting).  As an early adopter, I was also
>      under the impression that you were using his branch for your
>      explorations, while he was addressing code review comments, etc.
>      However, when I arrived back home in OC after a trip yesterday, I was
>      the recipient of a message (via a back channel) warning me that there
>      was a blocking issue at SDSC that UCI wasn't being attentive to, one
>      that had AsterixDB on the brink being given up on by the UCLA folks, and
>      that we'd better get on it or....  (Meanwhile I had not heard any such
>      thing from UCLA directly; I was not aware of any blocking Spark issues
>      for SDSC nor of any transitively blocking implications for UCLA, and it
>      still doesn't look from what I see below like there was one.)
>      
>      I think that we need to have SDSC's activities be much more visible here
>      - likewise for UCLA's - so that the Apache AsterixDB community has much
>      better visibility into the goals, activities, progress, and problems of
>      our early adopters.  The community wants users to be successful!  It
>      will be much more effective (and healthy and productive) if we all know
>      what's going on and it is clear to all how each of those things are going.
>      
>      Thanks!
>      
>      Mike
>      
>      
>      On 8/10/16 8:36 AM, Wail Alkowaileet wrote:
>      > Hi Kevin,
>      >
>      > Cool!
>      > Please let me know if you need any assistance.
>      >
>      > On Aug 8, 2016 1:42 PM, "Coakley, Kevin" <kc...@sdsc.edu> wrote:
>      >
>      >> Hi Wail,
>      >>
>      >> I figure out the problem, AsterixDB was configured for 127.0.0.1. The
>      >> notebook at https://github.com/Nullification/asterixdb-spark-
>      >> connector/blob/master/zeppelin-notebook/asterixdb-spark-example/note.json
>      >> ran successfully once I recreated the AsterixDB instance to use the
>      >> external IP.
>      >>
>      >> I have not ran any of my own queries but I did get both of the examples
>      >> https://github.com/Nullification/asterixdb-spark-connector to run
>      >> successfully.
>      >>
>      >> Thank you!
>      >>
>      >> -Kevin
>      >>
>      >>
>      >>
>      >> On 8/3/16, 10:23 AM, "Wail Alkowaileet" <wa...@gmail.com> wrote:
>      >>
>      >>      One more thing:
>      >>      Can you paste your cluster configuration as well?
>      >>
>      >>      Thanks
>      >>
>      >>     (ETC ETC ETC deleted)
>      
>      
>
>


Re: Trio: AsterixDB, Spark and Zeppelin.

Posted by "Coakley, Kevin" <kc...@sdsc.edu>.
Mike,

UCLA wanted a way to do use Spark’s Machine Learning packages with data stored in AsterixDB. We started looking at the Spark connector as way to access the data in AsterixDB directly instead of having to export the data from AsterixDB to a file and import the file in Spark. I don’t know how this is fits into Amarnath’s projects, I was just following up on a request from UCLA to see what would be involved in providing this Spark connector to others.

The current status is: I have the Spark connector working in a test environment with the queries provided by Wail. I was planning on loading a small amount of data into the test AsterixDB server with the Schema Inferencer code and running my own queries, but I have not had time yet. The issue with providing others with access to the Spark connector is the version of AsterixDB that we are running that contains the Twitter data does not have the Schema Inferencer code and therefor will not work with the Spark connector.  

I don’t believe SDSC would want to update the AsterixDB servers that contain the Twitter data with the Schema Inferencer code until after it has been approved by you and merged into the master branch. However, even after the Schema Inferencer code has been merged into the master branch, we wouldn’t have it ready of people to use right away. 

I offered to load a small subset of the data from our main servers into my test environment that has a working Spark connector for UCLA to test, but it sounds like they misunderstood my offer.

I would be happy to help you test the Schema Inferencer and Spark connector if you have specific items that you want me to check, I can also give others that you select access to test environment so they can run tests themselves. Otherwise, I will respond here if I discover any issues.

My current test environment is Zeppelin with the Spark connector on server A, AsterixDB with the Schema Inferencer code on server B and a Spark 1.6.0 cluster running on servers C, D and E.

-Kevin



On 8/10/16, 9:36 AM, "Mike Carey" <dt...@gmail.com> wrote:

    Kevin,
    
    Q:  Could you chime back in here - please 'cc' the user list - with a 
    brief (maybe one paragraph) summary of what you are actually trying to 
    do at the moment and what its current status is?  (And your timeframe, 
    etc.?)
    
    My impression until yesterday was that you were slowly/leisurely 
    exploring the new Spark connector to AsterixDB that Wail worked on - 
    essentially as his first "beta" user - and that things were moving at 
    the pace you wanted (and were setting).  As an early adopter, I was also 
    under the impression that you were using his branch for your 
    explorations, while he was addressing code review comments, etc.  
    However, when I arrived back home in OC after a trip yesterday, I was 
    the recipient of a message (via a back channel) warning me that there 
    was a blocking issue at SDSC that UCI wasn't being attentive to, one 
    that had AsterixDB on the brink being given up on by the UCLA folks, and 
    that we'd better get on it or....  (Meanwhile I had not heard any such 
    thing from UCLA directly; I was not aware of any blocking Spark issues 
    for SDSC nor of any transitively blocking implications for UCLA, and it 
    still doesn't look from what I see below like there was one.)
    
    I think that we need to have SDSC's activities be much more visible here 
    - likewise for UCLA's - so that the Apache AsterixDB community has much 
    better visibility into the goals, activities, progress, and problems of 
    our early adopters.  The community wants users to be successful!  It 
    will be much more effective (and healthy and productive) if we all know 
    what's going on and it is clear to all how each of those things are going.
    
    Thanks!
    
    Mike
    
    
    On 8/10/16 8:36 AM, Wail Alkowaileet wrote:
    > Hi Kevin,
    >
    > Cool!
    > Please let me know if you need any assistance.
    >
    > On Aug 8, 2016 1:42 PM, "Coakley, Kevin" <kc...@sdsc.edu> wrote:
    >
    >> Hi Wail,
    >>
    >> I figure out the problem, AsterixDB was configured for 127.0.0.1. The
    >> notebook at https://github.com/Nullification/asterixdb-spark-
    >> connector/blob/master/zeppelin-notebook/asterixdb-spark-example/note.json
    >> ran successfully once I recreated the AsterixDB instance to use the
    >> external IP.
    >>
    >> I have not ran any of my own queries but I did get both of the examples
    >> https://github.com/Nullification/asterixdb-spark-connector to run
    >> successfully.
    >>
    >> Thank you!
    >>
    >> -Kevin
    >>
    >>
    >>
    >> On 8/3/16, 10:23 AM, "Wail Alkowaileet" <wa...@gmail.com> wrote:
    >>
    >>      One more thing:
    >>      Can you paste your cluster configuration as well?
    >>
    >>      Thanks
    >>
    >>     (ETC ETC ETC deleted)
    
    



Re: Trio: AsterixDB, Spark and Zeppelin.

Posted by "Coakley, Kevin" <kc...@sdsc.edu>.
Mike,

UCLA wanted a way to do use Spark’s Machine Learning packages with data stored in AsterixDB. We started looking at the Spark connector as way to access the data in AsterixDB directly instead of having to export the data from AsterixDB to a file and import the file in Spark. I don’t know how this is fits into Amarnath’s projects, I was just following up on a request from UCLA to see what would be involved in providing this Spark connector to others.

The current status is: I have the Spark connector working in a test environment with the queries provided by Wail. I was planning on loading a small amount of data into the test AsterixDB server with the Schema Inferencer code and running my own queries, but I have not had time yet. The issue with providing others with access to the Spark connector is the version of AsterixDB that we are running that contains the Twitter data does not have the Schema Inferencer code and therefor will not work with the Spark connector.  

I don’t believe SDSC would want to update the AsterixDB servers that contain the Twitter data with the Schema Inferencer code until after it has been approved by you and merged into the master branch. However, even after the Schema Inferencer code has been merged into the master branch, we wouldn’t have it ready of people to use right away. 

I offered to load a small subset of the data from our main servers into my test environment that has a working Spark connector for UCLA to test, but it sounds like they misunderstood my offer.

I would be happy to help you test the Schema Inferencer and Spark connector if you have specific items that you want me to check, I can also give others that you select access to test environment so they can run tests themselves. Otherwise, I will respond here if I discover any issues.

My current test environment is Zeppelin with the Spark connector on server A, AsterixDB with the Schema Inferencer code on server B and a Spark 1.6.0 cluster running on servers C, D and E.

-Kevin



On 8/10/16, 9:36 AM, "Mike Carey" <dt...@gmail.com> wrote:

    Kevin,
    
    Q:  Could you chime back in here - please 'cc' the user list - with a 
    brief (maybe one paragraph) summary of what you are actually trying to 
    do at the moment and what its current status is?  (And your timeframe, 
    etc.?)
    
    My impression until yesterday was that you were slowly/leisurely 
    exploring the new Spark connector to AsterixDB that Wail worked on - 
    essentially as his first "beta" user - and that things were moving at 
    the pace you wanted (and were setting).  As an early adopter, I was also 
    under the impression that you were using his branch for your 
    explorations, while he was addressing code review comments, etc.  
    However, when I arrived back home in OC after a trip yesterday, I was 
    the recipient of a message (via a back channel) warning me that there 
    was a blocking issue at SDSC that UCI wasn't being attentive to, one 
    that had AsterixDB on the brink being given up on by the UCLA folks, and 
    that we'd better get on it or....  (Meanwhile I had not heard any such 
    thing from UCLA directly; I was not aware of any blocking Spark issues 
    for SDSC nor of any transitively blocking implications for UCLA, and it 
    still doesn't look from what I see below like there was one.)
    
    I think that we need to have SDSC's activities be much more visible here 
    - likewise for UCLA's - so that the Apache AsterixDB community has much 
    better visibility into the goals, activities, progress, and problems of 
    our early adopters.  The community wants users to be successful!  It 
    will be much more effective (and healthy and productive) if we all know 
    what's going on and it is clear to all how each of those things are going.
    
    Thanks!
    
    Mike
    
    
    On 8/10/16 8:36 AM, Wail Alkowaileet wrote:
    > Hi Kevin,
    >
    > Cool!
    > Please let me know if you need any assistance.
    >
    > On Aug 8, 2016 1:42 PM, "Coakley, Kevin" <kc...@sdsc.edu> wrote:
    >
    >> Hi Wail,
    >>
    >> I figure out the problem, AsterixDB was configured for 127.0.0.1. The
    >> notebook at https://github.com/Nullification/asterixdb-spark-
    >> connector/blob/master/zeppelin-notebook/asterixdb-spark-example/note.json
    >> ran successfully once I recreated the AsterixDB instance to use the
    >> external IP.
    >>
    >> I have not ran any of my own queries but I did get both of the examples
    >> https://github.com/Nullification/asterixdb-spark-connector to run
    >> successfully.
    >>
    >> Thank you!
    >>
    >> -Kevin
    >>
    >>
    >>
    >> On 8/3/16, 10:23 AM, "Wail Alkowaileet" <wa...@gmail.com> wrote:
    >>
    >>      One more thing:
    >>      Can you paste your cluster configuration as well?
    >>
    >>      Thanks
    >>
    >>     (ETC ETC ETC deleted)