You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@asterixdb.apache.org by sc...@informatik.hu-berlin.de on 2015/10/20 14:40:46 UTC

unable to load external data

Hello,

I have done a cluster setup of AsterixDB on four nodes. Everyhing is
running fine and I want to load some data into the system to run sum
bigger examples. However I am unable to do so using the description at

https://asterixdb.ics.uci.edu/documentation/aql/externaldata.html

I created a dataverse, a datatype and a dataset as follows:

create dataverse tpch;

use dataverse tpch
create type LineitemType as closed {
      orderkey:int32,
      partkey: int32,
      suppkey: int32,
      linenumber: int32,
      quantity: double,
      extendedprice: double,
      discount: double,
      tax: double,
      returnflag: string,
      linestatus: string,
      shipdate: string,
      commitdate: string,
      receiptdate: string,
      shipinstruct: string,
      shipmode: string,
      comment: string}

create dataset lineitem(LineitemType) if not exists primary key orderkey,
linenumber

as described on the homepage linked above there are two ways to load data
from, using either a reachable HDFS or the localFS. I have a running HDFS
within the same network containing the data I want to access and tried to
reach it like this:

load dataset lineitem using hdfs
(("hdfs"="hdfs://192.168.127.11:50040"),
("path"="/user/schultzem/lineitem.tbl"),
("input-format"="text-input-format"),
("format"="delimited-text"),
("delimiter"="|"));

However I get an error message

Unable to create adapter org.apache.hadoop.ipc.RemoteException: Server IPC
version 9 cannot communicate with client version 3 [AlgebricksException]

all I found out about this was an old Issue from 2013 that recommends an
older version of hadoop, which is not an option for me.

https://code.google.com/p/asterixdb/issues/detail?id=521

Is this somehow fixable?

The other option to load data from the localFS also throws an error.

load dataset lineitem using localfs
(("path"="192.168.127.21:///home/schultzem/tpch/TPCH_data_10GB/lineitem.tbl"),
    ("format"="delimited-text"),
    ("delimiter"="|"));

leads to

No node controllers found at the address: 192.168.127.21 [AsterixException]

which is the same error as for 127.0.0.1.

On the linked documentation about external datasets it is assumed that
AsterixDB is used in local mode. Is this the problem why I cannot reach
the cluster nodes?

Did I make a mistake accessing the data? How can I load data into the
database?

Regards, Max


Re: AsterixDB - query status / system status

Posted by Yingyi Bu <bu...@gmail.com>.
The ones I know are in our regression tests:
https://github.com/apache/incubator-asterixdb/tree/master/asterix-app/src/test/resources/runtimets/queries/tpch-sql-like/

But they are not the original ones --- we changed the filter conditions in
some queries to make sure their result set is not empty.
I guess Pouria may have more accurate ones.

Best,
Yingyi


On Wed, Oct 28, 2015 at 9:53 AM, Michael Carey <mj...@ics.uci.edu> wrote:

> Pouria indeed has a full set of TPC-H queries in AQL....
> (Yingyi does too, I believe.)
>
> On 10/28/15 9:49 AM, Ian Maxon wrote:
>
> Hi Max,
>
> Let me respond inline...
>
> I am currently trying to rewrite some of the TPCH Queries to run them as
> examples on AsterixDB. So far I was able to run some of the queries using
> the web client, but others do not work yet. Unfortunatly sometimes I do
> not get any error messages or results.
>
> I agree, the WebUI leaves something to be desired, especially for
> queries that take a long time.
> There is one minor amelioration to this however, that I find useful,
> the Hyracks admin console.
> It's on port 8888 on the CC at /adminconsole . It will show what jobs
> are running and which
> NCs are still registered with the CC.
>
> As for rewriting the TPC-H queries, I'm pretty sure we have these (but
> the ones in the tests folder aren't, they're toned-down versions). So
> maybe you don't have to rewrite anything, hopefully?
> @ Pouria, do you happen to have them handy?
>
> In general though, for more serious use, the HTTP api may be a better
> option, as it is more amenable to scripting.
>
>
> As I cannot tell if the queries are still
> running, or if an error occured that is just not displayed, is there a
> possibility to monitor the state of the system/query execution?
>
> The adminconsole will show if there's a job still running, but not
> exceptional states that cause the job to hang.
> The exceptions should be in the CC log always. Where that is depends
> on your managix configuration (but it's always called cc.log).
>
> >From time to time the system even crashes and I cannot even shut it down
>
> using managix stop.
>
> Yes, unfortunately 'managix stop' just goes through the list of NC's
> and requests 'kill' on the process, not 'kill' and then 'kill -9'. In
> the cases where an NC fails to exit politely, it has to be done by
> hand for managix.
>
> By the way, what environment are you running everything on? Is it a cluster?
>
> Thanks,
> -Ian
>
> On Wed, Oct 28, 2015 at 9:22 AM,  <sc...@informatik.hu-berlin.de> <sc...@informatik.hu-berlin.de> wrote:
>
> Hello,
>
> by now I was able to successfully install AsterixDB from the latest master
> branch which also made it possible for me to load external data files into
> the system. Thanks a lot for the help!
>
> I am currently trying to rewrite some of the TPCH Queries to run them as
> examples on AsterixDB. So far I was able to run some of the queries using
> the web client, but others do not work yet. Unfortunatly sometimes I do
> not get any error messages or results. On execution the output area of the
> webclient turns blank as it usually does, but after some time I should get
> the results shown (or an error). As I cannot tell if the queries are still
> running, or if an error occured that is just not displayed, is there a
> possibility to monitor the state of the system/query execution?
>
> From time to time the system even crashes and I cannot even shut it down
> using managix stop.
>
> Regards, Max
>
>
>
> Your assumption is correct, for the latest AsterixDB master we usually
> depend on the development version of Hyracks
> (https://github.com/apache/incubator-asterixdb-hyracks/). 'mvn install
> -DskipTests' with Hyracks should do the trick.
>
> Thanks,
> -Ian
>
>
>
>

Re: AsterixDB - query status / system status

Posted by Michael Carey <mj...@ics.uci.edu>.
Pouria indeed has a full set of TPC-H queries in AQL....
(Yingyi does too, I believe.)

On 10/28/15 9:49 AM, Ian Maxon wrote:
> Hi Max,
>
> Let me respond inline...
>> I am currently trying to rewrite some of the TPCH Queries to run them as
>> examples on AsterixDB. So far I was able to run some of the queries using
>> the web client, but others do not work yet. Unfortunatly sometimes I do
>> not get any error messages or results.
> I agree, the WebUI leaves something to be desired, especially for
> queries that take a long time.
> There is one minor amelioration to this however, that I find useful,
> the Hyracks admin console.
> It's on port 8888 on the CC at /adminconsole . It will show what jobs
> are running and which
> NCs are still registered with the CC.
>
> As for rewriting the TPC-H queries, I'm pretty sure we have these (but
> the ones in the tests folder aren't, they're toned-down versions). So
> maybe you don't have to rewrite anything, hopefully?
> @ Pouria, do you happen to have them handy?
>
> In general though, for more serious use, the HTTP api may be a better
> option, as it is more amenable to scripting.
>
>> As I cannot tell if the queries are still
>> running, or if an error occured that is just not displayed, is there a
>> possibility to monitor the state of the system/query execution?
> The adminconsole will show if there's a job still running, but not
> exceptional states that cause the job to hang.
> The exceptions should be in the CC log always. Where that is depends
> on your managix configuration (but it's always called cc.log).
>
> >From time to time the system even crashes and I cannot even shut it down
>> using managix stop.
> Yes, unfortunately 'managix stop' just goes through the list of NC's
> and requests 'kill' on the process, not 'kill' and then 'kill -9'. In
> the cases where an NC fails to exit politely, it has to be done by
> hand for managix.
>
> By the way, what environment are you running everything on? Is it a cluster?
>
> Thanks,
> -Ian
>
> On Wed, Oct 28, 2015 at 9:22 AM,  <sc...@informatik.hu-berlin.de> wrote:
>> Hello,
>>
>> by now I was able to successfully install AsterixDB from the latest master
>> branch which also made it possible for me to load external data files into
>> the system. Thanks a lot for the help!
>>
>> I am currently trying to rewrite some of the TPCH Queries to run them as
>> examples on AsterixDB. So far I was able to run some of the queries using
>> the web client, but others do not work yet. Unfortunatly sometimes I do
>> not get any error messages or results. On execution the output area of the
>> webclient turns blank as it usually does, but after some time I should get
>> the results shown (or an error). As I cannot tell if the queries are still
>> running, or if an error occured that is just not displayed, is there a
>> possibility to monitor the state of the system/query execution?
>>
>>  From time to time the system even crashes and I cannot even shut it down
>> using managix stop.
>>
>> Regards, Max
>>
>>
>>> Your assumption is correct, for the latest AsterixDB master we usually
>>> depend on the development version of Hyracks
>>> (https://github.com/apache/incubator-asterixdb-hyracks/). 'mvn install
>>> -DskipTests' with Hyracks should do the trick.
>>>
>>> Thanks,
>>> -Ian
>>>


Re: AsterixDB - query status / system status

Posted by Pouria Pirzadeh <po...@gmail.com>.
Yingyi is correct about the ones under regression tests. Some of them are
modified to work fine on tiny scales of data.
You can find the actual TPCH queries, in AQL, under:

https://github.com/apache/incubator-asterixdb/tree/master/asterix-benchmarks/src/main/resources/tpc-h/queries

Let me know if you need more help about them.

Pouria

On Wed, Oct 28, 2015 at 10:34 AM, <sc...@informatik.hu-berlin.de> wrote:

> Hi Ian,
>
> thanks a lot for your help! Currently I prepare some of those tests on my
> local computer. Starting on Monday I will have access to 8 nodes of a
> cluster for another week.
>
> Regards, Max
>
> > Hi Max,
> >
> > Let me respond inline...
> >>I am currently trying to rewrite some of the TPCH Queries to run them as
> >>examples on AsterixDB. So far I was able to run some of the queries using
> >>the web client, but others do not work yet. Unfortunatly sometimes I do
> >>not get any error messages or results.
> >
> > I agree, the WebUI leaves something to be desired, especially for
> > queries that take a long time.
> > There is one minor amelioration to this however, that I find useful,
> > the Hyracks admin console.
> > It's on port 8888 on the CC at /adminconsole . It will show what jobs
> > are running and which
> > NCs are still registered with the CC.
> >
> > As for rewriting the TPC-H queries, I'm pretty sure we have these (but
> > the ones in the tests folder aren't, they're toned-down versions). So
> > maybe you don't have to rewrite anything, hopefully?
> > @ Pouria, do you happen to have them handy?
> >
> > In general though, for more serious use, the HTTP api may be a better
> > option, as it is more amenable to scripting.
> >
> >>As I cannot tell if the queries are still
> >>running, or if an error occured that is just not displayed, is there a
> >>possibility to monitor the state of the system/query execution?
> >
> > The adminconsole will show if there's a job still running, but not
> > exceptional states that cause the job to hang.
> > The exceptions should be in the CC log always. Where that is depends
> > on your managix configuration (but it's always called cc.log).
> >
> >>From time to time the system even crashes and I cannot even shut it down
> >>using managix stop.
> >
> > Yes, unfortunately 'managix stop' just goes through the list of NC's
> > and requests 'kill' on the process, not 'kill' and then 'kill -9'. In
> > the cases where an NC fails to exit politely, it has to be done by
> > hand for managix.
> >
> > By the way, what environment are you running everything on? Is it a
> > cluster?
> >
> > Thanks,
> > -Ian
> >
> > On Wed, Oct 28, 2015 at 9:22 AM,  <sc...@informatik.hu-berlin.de>
> > wrote:
> >> Hello,
> >>
> >> by now I was able to successfully install AsterixDB from the latest
> >> master
> >> branch which also made it possible for me to load external data files
> >> into
> >> the system. Thanks a lot for the help!
> >>
> >> I am currently trying to rewrite some of the TPCH Queries to run them as
> >> examples on AsterixDB. So far I was able to run some of the queries
> >> using
> >> the web client, but others do not work yet. Unfortunatly sometimes I do
> >> not get any error messages or results. On execution the output area of
> >> the
> >> webclient turns blank as it usually does, but after some time I should
> >> get
> >> the results shown (or an error). As I cannot tell if the queries are
> >> still
> >> running, or if an error occured that is just not displayed, is there a
> >> possibility to monitor the state of the system/query execution?
> >>
> >> From time to time the system even crashes and I cannot even shut it down
> >> using managix stop.
> >>
> >> Regards, Max
> >>
> >>
> >>> Your assumption is correct, for the latest AsterixDB master we usually
> >>> depend on the development version of Hyracks
> >>> (https://github.com/apache/incubator-asterixdb-hyracks/). 'mvn install
> >>> -DskipTests' with Hyracks should do the trick.
> >>>
> >>> Thanks,
> >>> -Ian
> >>>
> >>
> >
>
>
>

Re: AsterixDB - query status / system status

Posted by sc...@informatik.hu-berlin.de.
Hi Ian,

thanks a lot for your help! Currently I prepare some of those tests on my
local computer. Starting on Monday I will have access to 8 nodes of a
cluster for another week.

Regards, Max

> Hi Max,
>
> Let me respond inline...
>>I am currently trying to rewrite some of the TPCH Queries to run them as
>>examples on AsterixDB. So far I was able to run some of the queries using
>>the web client, but others do not work yet. Unfortunatly sometimes I do
>>not get any error messages or results.
>
> I agree, the WebUI leaves something to be desired, especially for
> queries that take a long time.
> There is one minor amelioration to this however, that I find useful,
> the Hyracks admin console.
> It's on port 8888 on the CC at /adminconsole . It will show what jobs
> are running and which
> NCs are still registered with the CC.
>
> As for rewriting the TPC-H queries, I'm pretty sure we have these (but
> the ones in the tests folder aren't, they're toned-down versions). So
> maybe you don't have to rewrite anything, hopefully?
> @ Pouria, do you happen to have them handy?
>
> In general though, for more serious use, the HTTP api may be a better
> option, as it is more amenable to scripting.
>
>>As I cannot tell if the queries are still
>>running, or if an error occured that is just not displayed, is there a
>>possibility to monitor the state of the system/query execution?
>
> The adminconsole will show if there's a job still running, but not
> exceptional states that cause the job to hang.
> The exceptions should be in the CC log always. Where that is depends
> on your managix configuration (but it's always called cc.log).
>
>>From time to time the system even crashes and I cannot even shut it down
>>using managix stop.
>
> Yes, unfortunately 'managix stop' just goes through the list of NC's
> and requests 'kill' on the process, not 'kill' and then 'kill -9'. In
> the cases where an NC fails to exit politely, it has to be done by
> hand for managix.
>
> By the way, what environment are you running everything on? Is it a
> cluster?
>
> Thanks,
> -Ian
>
> On Wed, Oct 28, 2015 at 9:22 AM,  <sc...@informatik.hu-berlin.de>
> wrote:
>> Hello,
>>
>> by now I was able to successfully install AsterixDB from the latest
>> master
>> branch which also made it possible for me to load external data files
>> into
>> the system. Thanks a lot for the help!
>>
>> I am currently trying to rewrite some of the TPCH Queries to run them as
>> examples on AsterixDB. So far I was able to run some of the queries
>> using
>> the web client, but others do not work yet. Unfortunatly sometimes I do
>> not get any error messages or results. On execution the output area of
>> the
>> webclient turns blank as it usually does, but after some time I should
>> get
>> the results shown (or an error). As I cannot tell if the queries are
>> still
>> running, or if an error occured that is just not displayed, is there a
>> possibility to monitor the state of the system/query execution?
>>
>> From time to time the system even crashes and I cannot even shut it down
>> using managix stop.
>>
>> Regards, Max
>>
>>
>>> Your assumption is correct, for the latest AsterixDB master we usually
>>> depend on the development version of Hyracks
>>> (https://github.com/apache/incubator-asterixdb-hyracks/). 'mvn install
>>> -DskipTests' with Hyracks should do the trick.
>>>
>>> Thanks,
>>> -Ian
>>>
>>
>



Re: AsterixDB - query status / system status

Posted by Ian Maxon <im...@uci.edu>.
Hi Max,

Let me respond inline...
>I am currently trying to rewrite some of the TPCH Queries to run them as
>examples on AsterixDB. So far I was able to run some of the queries using
>the web client, but others do not work yet. Unfortunatly sometimes I do
>not get any error messages or results.

I agree, the WebUI leaves something to be desired, especially for
queries that take a long time.
There is one minor amelioration to this however, that I find useful,
the Hyracks admin console.
It's on port 8888 on the CC at /adminconsole . It will show what jobs
are running and which
NCs are still registered with the CC.

As for rewriting the TPC-H queries, I'm pretty sure we have these (but
the ones in the tests folder aren't, they're toned-down versions). So
maybe you don't have to rewrite anything, hopefully?
@ Pouria, do you happen to have them handy?

In general though, for more serious use, the HTTP api may be a better
option, as it is more amenable to scripting.

>As I cannot tell if the queries are still
>running, or if an error occured that is just not displayed, is there a
>possibility to monitor the state of the system/query execution?

The adminconsole will show if there's a job still running, but not
exceptional states that cause the job to hang.
The exceptions should be in the CC log always. Where that is depends
on your managix configuration (but it's always called cc.log).

>From time to time the system even crashes and I cannot even shut it down
>using managix stop.

Yes, unfortunately 'managix stop' just goes through the list of NC's
and requests 'kill' on the process, not 'kill' and then 'kill -9'. In
the cases where an NC fails to exit politely, it has to be done by
hand for managix.

By the way, what environment are you running everything on? Is it a cluster?

Thanks,
-Ian

On Wed, Oct 28, 2015 at 9:22 AM,  <sc...@informatik.hu-berlin.de> wrote:
> Hello,
>
> by now I was able to successfully install AsterixDB from the latest master
> branch which also made it possible for me to load external data files into
> the system. Thanks a lot for the help!
>
> I am currently trying to rewrite some of the TPCH Queries to run them as
> examples on AsterixDB. So far I was able to run some of the queries using
> the web client, but others do not work yet. Unfortunatly sometimes I do
> not get any error messages or results. On execution the output area of the
> webclient turns blank as it usually does, but after some time I should get
> the results shown (or an error). As I cannot tell if the queries are still
> running, or if an error occured that is just not displayed, is there a
> possibility to monitor the state of the system/query execution?
>
> From time to time the system even crashes and I cannot even shut it down
> using managix stop.
>
> Regards, Max
>
>
>> Your assumption is correct, for the latest AsterixDB master we usually
>> depend on the development version of Hyracks
>> (https://github.com/apache/incubator-asterixdb-hyracks/). 'mvn install
>> -DskipTests' with Hyracks should do the trick.
>>
>> Thanks,
>> -Ian
>>
>

AsterixDB - query status / system status

Posted by sc...@informatik.hu-berlin.de.
Hello,

by now I was able to successfully install AsterixDB from the latest master
branch which also made it possible for me to load external data files into
the system. Thanks a lot for the help!

I am currently trying to rewrite some of the TPCH Queries to run them as
examples on AsterixDB. So far I was able to run some of the queries using
the web client, but others do not work yet. Unfortunatly sometimes I do
not get any error messages or results. On execution the output area of the
webclient turns blank as it usually does, but after some time I should get
the results shown (or an error). As I cannot tell if the queries are still
running, or if an error occured that is just not displayed, is there a
possibility to monitor the state of the system/query execution?

>From time to time the system even crashes and I cannot even shut it down
using managix stop.

Regards, Max


> Your assumption is correct, for the latest AsterixDB master we usually
> depend on the development version of Hyracks
> (https://github.com/apache/incubator-asterixdb-hyracks/). 'mvn install
> -DskipTests' with Hyracks should do the trick.
>
> Thanks,
> -Ian
>


Re: AsterixDB build error

Posted by Ian Maxon <im...@uci.edu>.
Your assumption is correct, for the latest AsterixDB master we usually
depend on the development version of Hyracks
(https://github.com/apache/incubator-asterixdb-hyracks/). 'mvn install
-DskipTests' with Hyracks should do the trick.

Thanks,
-Ian

On Mon, Oct 26, 2015 at 9:14 AM,  <sc...@informatik.hu-berlin.de> wrote:
> Hi Ian,
>
> thanks for the help. I cloned from the google code repository, as it was
> mentioned on the download page http://asterixdb.ics.uci.edu/
>
> Unfortunatly even cloning from the repository you mentioned I got another
> error running 'mvn clean install -DskipTests'
>
> [INFO] asterix ............................................ SUCCESS [
> 7.197 s]
> [INFO] asterix-test-framework ............................. SUCCESS [
> 2.280 s]
> [INFO] asterix-common ..................................... FAILURE [
> 5.876 s]
>
> ...
>
> [ERROR] Failed to execute goal on project asterix-common: Could not
> resolve dependencies for project
> org.apache.asterix:asterix-common:jar:0.8.8-SNAPSHOT: The following
> artifacts could not be resolved:
> org.apache.hyracks:algebricks-compiler:jar:0.2.17-SNAPSHOT,
> org.apache.hyracks:hyracks-dataflow-std:jar:0.2.17-SNAPSHOT,
> org.apache.hyracks:hyracks-storage-am-lsm-common:jar:0.2.17-SNAPSHOT,
> org.apache.hyracks:hyracks-storage-am-common:jar:0.2.17-SNAPSHOT,
> org.apache.hyracks:hyracks-api:jar:0.2.17-SNAPSHOT,
> org.apache.hyracks:hyracks-storage-am-lsm-btree:jar:0.2.17-SNAPSHOT,
> org.apache.hyracks:hyracks-storage-am-lsm-invertedindex:jar:0.2.17-SNAPSHOT,
> org.apache.hyracks:hyracks-storage-am-lsm-rtree:jar:0.2.17-SNAPSHOT: Could
> not find artifact
> org.apache.hyracks:algebricks-compiler:jar:0.2.17-SNAPSHOT in
> asterix-public
> (http://obelix.ics.uci.edu/nexus/content/groups/asterix-public/) -> [Help
> 1]
>
> Is my assumption correct, that I have to download and build hyracks first?
> If so, where can I do that from?
>
> Regards, Max
>
>
>> Hi Max,
>> Two things:
>> 1) The google code repository is really old and deprecated, so I
>> wouldn't clone from there. It's read only now and hasn't been updated
>> in months. https://github.com/apache/incubator-asterixdb is a mirror
>> of the official ASF repo now.
>>
>> 2) 'mvn clean install -DskipTests' should work. 'mvn clean compile'
>> won't work because it doesn't install the plugin mentioned in the
>> error even though it was built just before the error happened.
>>
>> Thanks,
>> -Ian
>>
>> On Mon, Oct 26, 2015 at 4:50 AM,  <sc...@informatik.hu-berlin.de>
>> wrote:
>>> Hello,
>>>
>>> as you suggested I tried to install AsterixDB using the latest master
>>> version as I beforehand only used the official download packages from
>>> the
>>> website. I copied the git repository using
>>>
>>> git clone https://code.google.com/p/asterixdb/
>>>
>>> and afterwards tried to compile the code using
>>>
>>> mvn clean compile
>>>
>>> which threw the following error
>>>
>>> [INFO] asterix ............................................ SUCCESS [
>>> 0.170 s]
>>> [INFO] asterix-test-framework ............................. SUCCESS [
>>> 10.347 s]
>>> [INFO] asterix-common ..................................... SUCCESS
>>> [01:47
>>> min]
>>> [INFO] asterix-maven-plugins .............................. SUCCESS [
>>> 0.002 s]
>>> [INFO] record-manager-generator-maven-plugin .............. SUCCESS [
>>> 1.736 s]
>>> [INFO] asterix-transactions ............................... FAILURE [
>>> 0.003 s]
>>>
>>> ...
>>>
>>> [ERROR] Failed to parse plugin descriptor for
>>> edu.uci.ics.asterix:record-manager-generator-maven-plugin:0.8.7-SNAPSHOT
>>> (/home/mcs1408/git/asterixdb/asterix-maven-plugins/record-manager-generator-maven-plugin/target/classes):
>>> No plugin descriptor found at META-INF/maven/plugin.xml -> [Help 1]
>>>
>>> can I fix this somehow?
>>>
>>> Regards, Max
>>>
>>>
>>>> No problem :) I would definitely try the latest master version then.
>>>> Asterix 0.8.6 uses Hadoop 0.20.2, which is really ancient. You will
>>>> probably be best off checking out from source and changing the Hadoop
>>>> dependency in the top-level Asterix pom to 2.6.0. from 2.2.0.
>>>>
>>>> On Tue, Oct 20, 2015 at 3:31 PM,  <sc...@informatik.hu-berlin.de>
>>>> wrote:
>>>>> I am using AsterixDB 0.8.6 and Hadoop 2.6.0.
>>>>>
>>>>> Thanks for the help,
>>>>> Max
>>>>>
>>>>>
>>>>>> Hi Max,
>>>>>> Which version of AsterixDB are you running? The old stable release
>>>>>> uses a really old version of Hadoop dependencies, so that might be
>>>>>> it.
>>>>>> What's the version your HDFS cluster has? The latest master is using
>>>>>> 2.2.0 by default, but 2.4.0 or 2.6.0 should work as well.
>>>>>>
>>>>>> Thanks,
>>>>>> -Ian
>>>>>>
>>>>>> On Tue, Oct 20, 2015 at 5:40 AM,  <sc...@informatik.hu-berlin.de>
>>>>>> wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I have done a cluster setup of AsterixDB on four nodes. Everyhing is
>>>>>>> running fine and I want to load some data into the system to run sum
>>>>>>> bigger examples. However I am unable to do so using the description
>>>>>>> at
>>>>>>>
>>>>>>> https://asterixdb.ics.uci.edu/documentation/aql/externaldata.html
>>>>>>>
>>>>>>> I created a dataverse, a datatype and a dataset as follows:
>>>>>>>
>>>>>>> create dataverse tpch;
>>>>>>>
>>>>>>> use dataverse tpch
>>>>>>> create type LineitemType as closed {
>>>>>>>       orderkey:int32,
>>>>>>>       partkey: int32,
>>>>>>>       suppkey: int32,
>>>>>>>       linenumber: int32,
>>>>>>>       quantity: double,
>>>>>>>       extendedprice: double,
>>>>>>>       discount: double,
>>>>>>>       tax: double,
>>>>>>>       returnflag: string,
>>>>>>>       linestatus: string,
>>>>>>>       shipdate: string,
>>>>>>>       commitdate: string,
>>>>>>>       receiptdate: string,
>>>>>>>       shipinstruct: string,
>>>>>>>       shipmode: string,
>>>>>>>       comment: string}
>>>>>>>
>>>>>>> create dataset lineitem(LineitemType) if not exists primary key
>>>>>>> orderkey,
>>>>>>> linenumber
>>>>>>>
>>>>>>> as described on the homepage linked above there are two ways to load
>>>>>>> data
>>>>>>> from, using either a reachable HDFS or the localFS. I have a running
>>>>>>> HDFS
>>>>>>> within the same network containing the data I want to access and
>>>>>>> tried
>>>>>>> to
>>>>>>> reach it like this:
>>>>>>>
>>>>>>> load dataset lineitem using hdfs
>>>>>>> (("hdfs"="hdfs://192.168.127.11:50040"),
>>>>>>> ("path"="/user/schultzem/lineitem.tbl"),
>>>>>>> ("input-format"="text-input-format"),
>>>>>>> ("format"="delimited-text"),
>>>>>>> ("delimiter"="|"));
>>>>>>>
>>>>>>> However I get an error message
>>>>>>>
>>>>>>> Unable to create adapter org.apache.hadoop.ipc.RemoteException:
>>>>>>> Server
>>>>>>> IPC
>>>>>>> version 9 cannot communicate with client version 3
>>>>>>> [AlgebricksException]
>>>>>>>
>>>>>>> all I found out about this was an old Issue from 2013 that
>>>>>>> recommends
>>>>>>> an
>>>>>>> older version of hadoop, which is not an option for me.
>>>>>>>
>>>>>>> https://code.google.com/p/asterixdb/issues/detail?id=521
>>>>>>>
>>>>>>> Is this somehow fixable?
>>>>>>>
>>>>>>> The other option to load data from the localFS also throws an error.
>>>>>>>
>>>>>>> load dataset lineitem using localfs
>>>>>>> (("path"="192.168.127.21:///home/schultzem/tpch/TPCH_data_10GB/lineitem.tbl"),
>>>>>>>     ("format"="delimited-text"),
>>>>>>>     ("delimiter"="|"));
>>>>>>>
>>>>>>> leads to
>>>>>>>
>>>>>>> No node controllers found at the address: 192.168.127.21
>>>>>>> [AsterixException]
>>>>>>>
>>>>>>> which is the same error as for 127.0.0.1.
>>>>>>>
>>>>>>> On the linked documentation about external datasets it is assumed
>>>>>>> that
>>>>>>> AsterixDB is used in local mode. Is this the problem why I cannot
>>>>>>> reach
>>>>>>> the cluster nodes?
>>>>>>>
>>>>>>> Did I make a mistake accessing the data? How can I load data into
>>>>>>> the
>>>>>>> database?
>>>>>>>
>>>>>>> Regards, Max
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>

Re: AsterixDB build error

Posted by sc...@informatik.hu-berlin.de.
Hi Ian,

thanks for the help. I cloned from the google code repository, as it was
mentioned on the download page http://asterixdb.ics.uci.edu/

Unfortunatly even cloning from the repository you mentioned I got another
error running 'mvn clean install -DskipTests'

[INFO] asterix ............................................ SUCCESS [
7.197 s]
[INFO] asterix-test-framework ............................. SUCCESS [
2.280 s]
[INFO] asterix-common ..................................... FAILURE [
5.876 s]

...

[ERROR] Failed to execute goal on project asterix-common: Could not
resolve dependencies for project
org.apache.asterix:asterix-common:jar:0.8.8-SNAPSHOT: The following
artifacts could not be resolved:
org.apache.hyracks:algebricks-compiler:jar:0.2.17-SNAPSHOT,
org.apache.hyracks:hyracks-dataflow-std:jar:0.2.17-SNAPSHOT,
org.apache.hyracks:hyracks-storage-am-lsm-common:jar:0.2.17-SNAPSHOT,
org.apache.hyracks:hyracks-storage-am-common:jar:0.2.17-SNAPSHOT,
org.apache.hyracks:hyracks-api:jar:0.2.17-SNAPSHOT,
org.apache.hyracks:hyracks-storage-am-lsm-btree:jar:0.2.17-SNAPSHOT,
org.apache.hyracks:hyracks-storage-am-lsm-invertedindex:jar:0.2.17-SNAPSHOT,
org.apache.hyracks:hyracks-storage-am-lsm-rtree:jar:0.2.17-SNAPSHOT: Could
not find artifact
org.apache.hyracks:algebricks-compiler:jar:0.2.17-SNAPSHOT in
asterix-public
(http://obelix.ics.uci.edu/nexus/content/groups/asterix-public/) -> [Help
1]

Is my assumption correct, that I have to download and build hyracks first?
If so, where can I do that from?

Regards, Max


> Hi Max,
> Two things:
> 1) The google code repository is really old and deprecated, so I
> wouldn't clone from there. It's read only now and hasn't been updated
> in months. https://github.com/apache/incubator-asterixdb is a mirror
> of the official ASF repo now.
>
> 2) 'mvn clean install -DskipTests' should work. 'mvn clean compile'
> won't work because it doesn't install the plugin mentioned in the
> error even though it was built just before the error happened.
>
> Thanks,
> -Ian
>
> On Mon, Oct 26, 2015 at 4:50 AM,  <sc...@informatik.hu-berlin.de>
> wrote:
>> Hello,
>>
>> as you suggested I tried to install AsterixDB using the latest master
>> version as I beforehand only used the official download packages from
>> the
>> website. I copied the git repository using
>>
>> git clone https://code.google.com/p/asterixdb/
>>
>> and afterwards tried to compile the code using
>>
>> mvn clean compile
>>
>> which threw the following error
>>
>> [INFO] asterix ............................................ SUCCESS [
>> 0.170 s]
>> [INFO] asterix-test-framework ............................. SUCCESS [
>> 10.347 s]
>> [INFO] asterix-common ..................................... SUCCESS
>> [01:47
>> min]
>> [INFO] asterix-maven-plugins .............................. SUCCESS [
>> 0.002 s]
>> [INFO] record-manager-generator-maven-plugin .............. SUCCESS [
>> 1.736 s]
>> [INFO] asterix-transactions ............................... FAILURE [
>> 0.003 s]
>>
>> ...
>>
>> [ERROR] Failed to parse plugin descriptor for
>> edu.uci.ics.asterix:record-manager-generator-maven-plugin:0.8.7-SNAPSHOT
>> (/home/mcs1408/git/asterixdb/asterix-maven-plugins/record-manager-generator-maven-plugin/target/classes):
>> No plugin descriptor found at META-INF/maven/plugin.xml -> [Help 1]
>>
>> can I fix this somehow?
>>
>> Regards, Max
>>
>>
>>> No problem :) I would definitely try the latest master version then.
>>> Asterix 0.8.6 uses Hadoop 0.20.2, which is really ancient. You will
>>> probably be best off checking out from source and changing the Hadoop
>>> dependency in the top-level Asterix pom to 2.6.0. from 2.2.0.
>>>
>>> On Tue, Oct 20, 2015 at 3:31 PM,  <sc...@informatik.hu-berlin.de>
>>> wrote:
>>>> I am using AsterixDB 0.8.6 and Hadoop 2.6.0.
>>>>
>>>> Thanks for the help,
>>>> Max
>>>>
>>>>
>>>>> Hi Max,
>>>>> Which version of AsterixDB are you running? The old stable release
>>>>> uses a really old version of Hadoop dependencies, so that might be
>>>>> it.
>>>>> What's the version your HDFS cluster has? The latest master is using
>>>>> 2.2.0 by default, but 2.4.0 or 2.6.0 should work as well.
>>>>>
>>>>> Thanks,
>>>>> -Ian
>>>>>
>>>>> On Tue, Oct 20, 2015 at 5:40 AM,  <sc...@informatik.hu-berlin.de>
>>>>> wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I have done a cluster setup of AsterixDB on four nodes. Everyhing is
>>>>>> running fine and I want to load some data into the system to run sum
>>>>>> bigger examples. However I am unable to do so using the description
>>>>>> at
>>>>>>
>>>>>> https://asterixdb.ics.uci.edu/documentation/aql/externaldata.html
>>>>>>
>>>>>> I created a dataverse, a datatype and a dataset as follows:
>>>>>>
>>>>>> create dataverse tpch;
>>>>>>
>>>>>> use dataverse tpch
>>>>>> create type LineitemType as closed {
>>>>>>       orderkey:int32,
>>>>>>       partkey: int32,
>>>>>>       suppkey: int32,
>>>>>>       linenumber: int32,
>>>>>>       quantity: double,
>>>>>>       extendedprice: double,
>>>>>>       discount: double,
>>>>>>       tax: double,
>>>>>>       returnflag: string,
>>>>>>       linestatus: string,
>>>>>>       shipdate: string,
>>>>>>       commitdate: string,
>>>>>>       receiptdate: string,
>>>>>>       shipinstruct: string,
>>>>>>       shipmode: string,
>>>>>>       comment: string}
>>>>>>
>>>>>> create dataset lineitem(LineitemType) if not exists primary key
>>>>>> orderkey,
>>>>>> linenumber
>>>>>>
>>>>>> as described on the homepage linked above there are two ways to load
>>>>>> data
>>>>>> from, using either a reachable HDFS or the localFS. I have a running
>>>>>> HDFS
>>>>>> within the same network containing the data I want to access and
>>>>>> tried
>>>>>> to
>>>>>> reach it like this:
>>>>>>
>>>>>> load dataset lineitem using hdfs
>>>>>> (("hdfs"="hdfs://192.168.127.11:50040"),
>>>>>> ("path"="/user/schultzem/lineitem.tbl"),
>>>>>> ("input-format"="text-input-format"),
>>>>>> ("format"="delimited-text"),
>>>>>> ("delimiter"="|"));
>>>>>>
>>>>>> However I get an error message
>>>>>>
>>>>>> Unable to create adapter org.apache.hadoop.ipc.RemoteException:
>>>>>> Server
>>>>>> IPC
>>>>>> version 9 cannot communicate with client version 3
>>>>>> [AlgebricksException]
>>>>>>
>>>>>> all I found out about this was an old Issue from 2013 that
>>>>>> recommends
>>>>>> an
>>>>>> older version of hadoop, which is not an option for me.
>>>>>>
>>>>>> https://code.google.com/p/asterixdb/issues/detail?id=521
>>>>>>
>>>>>> Is this somehow fixable?
>>>>>>
>>>>>> The other option to load data from the localFS also throws an error.
>>>>>>
>>>>>> load dataset lineitem using localfs
>>>>>> (("path"="192.168.127.21:///home/schultzem/tpch/TPCH_data_10GB/lineitem.tbl"),
>>>>>>     ("format"="delimited-text"),
>>>>>>     ("delimiter"="|"));
>>>>>>
>>>>>> leads to
>>>>>>
>>>>>> No node controllers found at the address: 192.168.127.21
>>>>>> [AsterixException]
>>>>>>
>>>>>> which is the same error as for 127.0.0.1.
>>>>>>
>>>>>> On the linked documentation about external datasets it is assumed
>>>>>> that
>>>>>> AsterixDB is used in local mode. Is this the problem why I cannot
>>>>>> reach
>>>>>> the cluster nodes?
>>>>>>
>>>>>> Did I make a mistake accessing the data? How can I load data into
>>>>>> the
>>>>>> database?
>>>>>>
>>>>>> Regards, Max
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>



Re: AsterixDB build error

Posted by Ian Maxon <im...@uci.edu>.
Hi Max,
Two things:
1) The google code repository is really old and deprecated, so I
wouldn't clone from there. It's read only now and hasn't been updated
in months. https://github.com/apache/incubator-asterixdb is a mirror
of the official ASF repo now.

2) 'mvn clean install -DskipTests' should work. 'mvn clean compile'
won't work because it doesn't install the plugin mentioned in the
error even though it was built just before the error happened.

Thanks,
-Ian

On Mon, Oct 26, 2015 at 4:50 AM,  <sc...@informatik.hu-berlin.de> wrote:
> Hello,
>
> as you suggested I tried to install AsterixDB using the latest master
> version as I beforehand only used the official download packages from the
> website. I copied the git repository using
>
> git clone https://code.google.com/p/asterixdb/
>
> and afterwards tried to compile the code using
>
> mvn clean compile
>
> which threw the following error
>
> [INFO] asterix ............................................ SUCCESS [
> 0.170 s]
> [INFO] asterix-test-framework ............................. SUCCESS [
> 10.347 s]
> [INFO] asterix-common ..................................... SUCCESS [01:47
> min]
> [INFO] asterix-maven-plugins .............................. SUCCESS [
> 0.002 s]
> [INFO] record-manager-generator-maven-plugin .............. SUCCESS [
> 1.736 s]
> [INFO] asterix-transactions ............................... FAILURE [
> 0.003 s]
>
> ...
>
> [ERROR] Failed to parse plugin descriptor for
> edu.uci.ics.asterix:record-manager-generator-maven-plugin:0.8.7-SNAPSHOT
> (/home/mcs1408/git/asterixdb/asterix-maven-plugins/record-manager-generator-maven-plugin/target/classes):
> No plugin descriptor found at META-INF/maven/plugin.xml -> [Help 1]
>
> can I fix this somehow?
>
> Regards, Max
>
>
>> No problem :) I would definitely try the latest master version then.
>> Asterix 0.8.6 uses Hadoop 0.20.2, which is really ancient. You will
>> probably be best off checking out from source and changing the Hadoop
>> dependency in the top-level Asterix pom to 2.6.0. from 2.2.0.
>>
>> On Tue, Oct 20, 2015 at 3:31 PM,  <sc...@informatik.hu-berlin.de>
>> wrote:
>>> I am using AsterixDB 0.8.6 and Hadoop 2.6.0.
>>>
>>> Thanks for the help,
>>> Max
>>>
>>>
>>>> Hi Max,
>>>> Which version of AsterixDB are you running? The old stable release
>>>> uses a really old version of Hadoop dependencies, so that might be it.
>>>> What's the version your HDFS cluster has? The latest master is using
>>>> 2.2.0 by default, but 2.4.0 or 2.6.0 should work as well.
>>>>
>>>> Thanks,
>>>> -Ian
>>>>
>>>> On Tue, Oct 20, 2015 at 5:40 AM,  <sc...@informatik.hu-berlin.de>
>>>> wrote:
>>>>> Hello,
>>>>>
>>>>> I have done a cluster setup of AsterixDB on four nodes. Everyhing is
>>>>> running fine and I want to load some data into the system to run sum
>>>>> bigger examples. However I am unable to do so using the description at
>>>>>
>>>>> https://asterixdb.ics.uci.edu/documentation/aql/externaldata.html
>>>>>
>>>>> I created a dataverse, a datatype and a dataset as follows:
>>>>>
>>>>> create dataverse tpch;
>>>>>
>>>>> use dataverse tpch
>>>>> create type LineitemType as closed {
>>>>>       orderkey:int32,
>>>>>       partkey: int32,
>>>>>       suppkey: int32,
>>>>>       linenumber: int32,
>>>>>       quantity: double,
>>>>>       extendedprice: double,
>>>>>       discount: double,
>>>>>       tax: double,
>>>>>       returnflag: string,
>>>>>       linestatus: string,
>>>>>       shipdate: string,
>>>>>       commitdate: string,
>>>>>       receiptdate: string,
>>>>>       shipinstruct: string,
>>>>>       shipmode: string,
>>>>>       comment: string}
>>>>>
>>>>> create dataset lineitem(LineitemType) if not exists primary key
>>>>> orderkey,
>>>>> linenumber
>>>>>
>>>>> as described on the homepage linked above there are two ways to load
>>>>> data
>>>>> from, using either a reachable HDFS or the localFS. I have a running
>>>>> HDFS
>>>>> within the same network containing the data I want to access and tried
>>>>> to
>>>>> reach it like this:
>>>>>
>>>>> load dataset lineitem using hdfs
>>>>> (("hdfs"="hdfs://192.168.127.11:50040"),
>>>>> ("path"="/user/schultzem/lineitem.tbl"),
>>>>> ("input-format"="text-input-format"),
>>>>> ("format"="delimited-text"),
>>>>> ("delimiter"="|"));
>>>>>
>>>>> However I get an error message
>>>>>
>>>>> Unable to create adapter org.apache.hadoop.ipc.RemoteException: Server
>>>>> IPC
>>>>> version 9 cannot communicate with client version 3
>>>>> [AlgebricksException]
>>>>>
>>>>> all I found out about this was an old Issue from 2013 that recommends
>>>>> an
>>>>> older version of hadoop, which is not an option for me.
>>>>>
>>>>> https://code.google.com/p/asterixdb/issues/detail?id=521
>>>>>
>>>>> Is this somehow fixable?
>>>>>
>>>>> The other option to load data from the localFS also throws an error.
>>>>>
>>>>> load dataset lineitem using localfs
>>>>> (("path"="192.168.127.21:///home/schultzem/tpch/TPCH_data_10GB/lineitem.tbl"),
>>>>>     ("format"="delimited-text"),
>>>>>     ("delimiter"="|"));
>>>>>
>>>>> leads to
>>>>>
>>>>> No node controllers found at the address: 192.168.127.21
>>>>> [AsterixException]
>>>>>
>>>>> which is the same error as for 127.0.0.1.
>>>>>
>>>>> On the linked documentation about external datasets it is assumed that
>>>>> AsterixDB is used in local mode. Is this the problem why I cannot
>>>>> reach
>>>>> the cluster nodes?
>>>>>
>>>>> Did I make a mistake accessing the data? How can I load data into the
>>>>> database?
>>>>>
>>>>> Regards, Max
>>>>>
>>>>
>>>
>>>
>>
>
>

AsterixDB build error

Posted by sc...@informatik.hu-berlin.de.
Hello,

as you suggested I tried to install AsterixDB using the latest master
version as I beforehand only used the official download packages from the
website. I copied the git repository using

git clone https://code.google.com/p/asterixdb/

and afterwards tried to compile the code using

mvn clean compile

which threw the following error

[INFO] asterix ............................................ SUCCESS [ 
0.170 s]
[INFO] asterix-test-framework ............................. SUCCESS [
10.347 s]
[INFO] asterix-common ..................................... SUCCESS [01:47
min]
[INFO] asterix-maven-plugins .............................. SUCCESS [ 
0.002 s]
[INFO] record-manager-generator-maven-plugin .............. SUCCESS [ 
1.736 s]
[INFO] asterix-transactions ............................... FAILURE [ 
0.003 s]

...

[ERROR] Failed to parse plugin descriptor for
edu.uci.ics.asterix:record-manager-generator-maven-plugin:0.8.7-SNAPSHOT
(/home/mcs1408/git/asterixdb/asterix-maven-plugins/record-manager-generator-maven-plugin/target/classes):
No plugin descriptor found at META-INF/maven/plugin.xml -> [Help 1]

can I fix this somehow?

Regards, Max


> No problem :) I would definitely try the latest master version then.
> Asterix 0.8.6 uses Hadoop 0.20.2, which is really ancient. You will
> probably be best off checking out from source and changing the Hadoop
> dependency in the top-level Asterix pom to 2.6.0. from 2.2.0.
>
> On Tue, Oct 20, 2015 at 3:31 PM,  <sc...@informatik.hu-berlin.de>
> wrote:
>> I am using AsterixDB 0.8.6 and Hadoop 2.6.0.
>>
>> Thanks for the help,
>> Max
>>
>>
>>> Hi Max,
>>> Which version of AsterixDB are you running? The old stable release
>>> uses a really old version of Hadoop dependencies, so that might be it.
>>> What's the version your HDFS cluster has? The latest master is using
>>> 2.2.0 by default, but 2.4.0 or 2.6.0 should work as well.
>>>
>>> Thanks,
>>> -Ian
>>>
>>> On Tue, Oct 20, 2015 at 5:40 AM,  <sc...@informatik.hu-berlin.de>
>>> wrote:
>>>> Hello,
>>>>
>>>> I have done a cluster setup of AsterixDB on four nodes. Everyhing is
>>>> running fine and I want to load some data into the system to run sum
>>>> bigger examples. However I am unable to do so using the description at
>>>>
>>>> https://asterixdb.ics.uci.edu/documentation/aql/externaldata.html
>>>>
>>>> I created a dataverse, a datatype and a dataset as follows:
>>>>
>>>> create dataverse tpch;
>>>>
>>>> use dataverse tpch
>>>> create type LineitemType as closed {
>>>>       orderkey:int32,
>>>>       partkey: int32,
>>>>       suppkey: int32,
>>>>       linenumber: int32,
>>>>       quantity: double,
>>>>       extendedprice: double,
>>>>       discount: double,
>>>>       tax: double,
>>>>       returnflag: string,
>>>>       linestatus: string,
>>>>       shipdate: string,
>>>>       commitdate: string,
>>>>       receiptdate: string,
>>>>       shipinstruct: string,
>>>>       shipmode: string,
>>>>       comment: string}
>>>>
>>>> create dataset lineitem(LineitemType) if not exists primary key
>>>> orderkey,
>>>> linenumber
>>>>
>>>> as described on the homepage linked above there are two ways to load
>>>> data
>>>> from, using either a reachable HDFS or the localFS. I have a running
>>>> HDFS
>>>> within the same network containing the data I want to access and tried
>>>> to
>>>> reach it like this:
>>>>
>>>> load dataset lineitem using hdfs
>>>> (("hdfs"="hdfs://192.168.127.11:50040"),
>>>> ("path"="/user/schultzem/lineitem.tbl"),
>>>> ("input-format"="text-input-format"),
>>>> ("format"="delimited-text"),
>>>> ("delimiter"="|"));
>>>>
>>>> However I get an error message
>>>>
>>>> Unable to create adapter org.apache.hadoop.ipc.RemoteException: Server
>>>> IPC
>>>> version 9 cannot communicate with client version 3
>>>> [AlgebricksException]
>>>>
>>>> all I found out about this was an old Issue from 2013 that recommends
>>>> an
>>>> older version of hadoop, which is not an option for me.
>>>>
>>>> https://code.google.com/p/asterixdb/issues/detail?id=521
>>>>
>>>> Is this somehow fixable?
>>>>
>>>> The other option to load data from the localFS also throws an error.
>>>>
>>>> load dataset lineitem using localfs
>>>> (("path"="192.168.127.21:///home/schultzem/tpch/TPCH_data_10GB/lineitem.tbl"),
>>>>     ("format"="delimited-text"),
>>>>     ("delimiter"="|"));
>>>>
>>>> leads to
>>>>
>>>> No node controllers found at the address: 192.168.127.21
>>>> [AsterixException]
>>>>
>>>> which is the same error as for 127.0.0.1.
>>>>
>>>> On the linked documentation about external datasets it is assumed that
>>>> AsterixDB is used in local mode. Is this the problem why I cannot
>>>> reach
>>>> the cluster nodes?
>>>>
>>>> Did I make a mistake accessing the data? How can I load data into the
>>>> database?
>>>>
>>>> Regards, Max
>>>>
>>>
>>
>>
>



Re: unable to load external data

Posted by Ian Maxon <im...@uci.edu>.
No problem :) I would definitely try the latest master version then.
Asterix 0.8.6 uses Hadoop 0.20.2, which is really ancient. You will
probably be best off checking out from source and changing the Hadoop
dependency in the top-level Asterix pom to 2.6.0. from 2.2.0.

On Tue, Oct 20, 2015 at 3:31 PM,  <sc...@informatik.hu-berlin.de> wrote:
> I am using AsterixDB 0.8.6 and Hadoop 2.6.0.
>
> Thanks for the help,
> Max
>
>
>> Hi Max,
>> Which version of AsterixDB are you running? The old stable release
>> uses a really old version of Hadoop dependencies, so that might be it.
>> What's the version your HDFS cluster has? The latest master is using
>> 2.2.0 by default, but 2.4.0 or 2.6.0 should work as well.
>>
>> Thanks,
>> -Ian
>>
>> On Tue, Oct 20, 2015 at 5:40 AM,  <sc...@informatik.hu-berlin.de>
>> wrote:
>>> Hello,
>>>
>>> I have done a cluster setup of AsterixDB on four nodes. Everyhing is
>>> running fine and I want to load some data into the system to run sum
>>> bigger examples. However I am unable to do so using the description at
>>>
>>> https://asterixdb.ics.uci.edu/documentation/aql/externaldata.html
>>>
>>> I created a dataverse, a datatype and a dataset as follows:
>>>
>>> create dataverse tpch;
>>>
>>> use dataverse tpch
>>> create type LineitemType as closed {
>>>       orderkey:int32,
>>>       partkey: int32,
>>>       suppkey: int32,
>>>       linenumber: int32,
>>>       quantity: double,
>>>       extendedprice: double,
>>>       discount: double,
>>>       tax: double,
>>>       returnflag: string,
>>>       linestatus: string,
>>>       shipdate: string,
>>>       commitdate: string,
>>>       receiptdate: string,
>>>       shipinstruct: string,
>>>       shipmode: string,
>>>       comment: string}
>>>
>>> create dataset lineitem(LineitemType) if not exists primary key
>>> orderkey,
>>> linenumber
>>>
>>> as described on the homepage linked above there are two ways to load
>>> data
>>> from, using either a reachable HDFS or the localFS. I have a running
>>> HDFS
>>> within the same network containing the data I want to access and tried
>>> to
>>> reach it like this:
>>>
>>> load dataset lineitem using hdfs
>>> (("hdfs"="hdfs://192.168.127.11:50040"),
>>> ("path"="/user/schultzem/lineitem.tbl"),
>>> ("input-format"="text-input-format"),
>>> ("format"="delimited-text"),
>>> ("delimiter"="|"));
>>>
>>> However I get an error message
>>>
>>> Unable to create adapter org.apache.hadoop.ipc.RemoteException: Server
>>> IPC
>>> version 9 cannot communicate with client version 3 [AlgebricksException]
>>>
>>> all I found out about this was an old Issue from 2013 that recommends an
>>> older version of hadoop, which is not an option for me.
>>>
>>> https://code.google.com/p/asterixdb/issues/detail?id=521
>>>
>>> Is this somehow fixable?
>>>
>>> The other option to load data from the localFS also throws an error.
>>>
>>> load dataset lineitem using localfs
>>> (("path"="192.168.127.21:///home/schultzem/tpch/TPCH_data_10GB/lineitem.tbl"),
>>>     ("format"="delimited-text"),
>>>     ("delimiter"="|"));
>>>
>>> leads to
>>>
>>> No node controllers found at the address: 192.168.127.21
>>> [AsterixException]
>>>
>>> which is the same error as for 127.0.0.1.
>>>
>>> On the linked documentation about external datasets it is assumed that
>>> AsterixDB is used in local mode. Is this the problem why I cannot reach
>>> the cluster nodes?
>>>
>>> Did I make a mistake accessing the data? How can I load data into the
>>> database?
>>>
>>> Regards, Max
>>>
>>
>
>

Re: unable to load external data

Posted by sc...@informatik.hu-berlin.de.
I am using AsterixDB 0.8.6 and Hadoop 2.6.0.

Thanks for the help,
Max


> Hi Max,
> Which version of AsterixDB are you running? The old stable release
> uses a really old version of Hadoop dependencies, so that might be it.
> What's the version your HDFS cluster has? The latest master is using
> 2.2.0 by default, but 2.4.0 or 2.6.0 should work as well.
>
> Thanks,
> -Ian
>
> On Tue, Oct 20, 2015 at 5:40 AM,  <sc...@informatik.hu-berlin.de>
> wrote:
>> Hello,
>>
>> I have done a cluster setup of AsterixDB on four nodes. Everyhing is
>> running fine and I want to load some data into the system to run sum
>> bigger examples. However I am unable to do so using the description at
>>
>> https://asterixdb.ics.uci.edu/documentation/aql/externaldata.html
>>
>> I created a dataverse, a datatype and a dataset as follows:
>>
>> create dataverse tpch;
>>
>> use dataverse tpch
>> create type LineitemType as closed {
>>       orderkey:int32,
>>       partkey: int32,
>>       suppkey: int32,
>>       linenumber: int32,
>>       quantity: double,
>>       extendedprice: double,
>>       discount: double,
>>       tax: double,
>>       returnflag: string,
>>       linestatus: string,
>>       shipdate: string,
>>       commitdate: string,
>>       receiptdate: string,
>>       shipinstruct: string,
>>       shipmode: string,
>>       comment: string}
>>
>> create dataset lineitem(LineitemType) if not exists primary key
>> orderkey,
>> linenumber
>>
>> as described on the homepage linked above there are two ways to load
>> data
>> from, using either a reachable HDFS or the localFS. I have a running
>> HDFS
>> within the same network containing the data I want to access and tried
>> to
>> reach it like this:
>>
>> load dataset lineitem using hdfs
>> (("hdfs"="hdfs://192.168.127.11:50040"),
>> ("path"="/user/schultzem/lineitem.tbl"),
>> ("input-format"="text-input-format"),
>> ("format"="delimited-text"),
>> ("delimiter"="|"));
>>
>> However I get an error message
>>
>> Unable to create adapter org.apache.hadoop.ipc.RemoteException: Server
>> IPC
>> version 9 cannot communicate with client version 3 [AlgebricksException]
>>
>> all I found out about this was an old Issue from 2013 that recommends an
>> older version of hadoop, which is not an option for me.
>>
>> https://code.google.com/p/asterixdb/issues/detail?id=521
>>
>> Is this somehow fixable?
>>
>> The other option to load data from the localFS also throws an error.
>>
>> load dataset lineitem using localfs
>> (("path"="192.168.127.21:///home/schultzem/tpch/TPCH_data_10GB/lineitem.tbl"),
>>     ("format"="delimited-text"),
>>     ("delimiter"="|"));
>>
>> leads to
>>
>> No node controllers found at the address: 192.168.127.21
>> [AsterixException]
>>
>> which is the same error as for 127.0.0.1.
>>
>> On the linked documentation about external datasets it is assumed that
>> AsterixDB is used in local mode. Is this the problem why I cannot reach
>> the cluster nodes?
>>
>> Did I make a mistake accessing the data? How can I load data into the
>> database?
>>
>> Regards, Max
>>
>



Re: unable to load external data

Posted by Ian Maxon <im...@uci.edu>.
Hi Max,
Which version of AsterixDB are you running? The old stable release
uses a really old version of Hadoop dependencies, so that might be it.
What's the version your HDFS cluster has? The latest master is using
2.2.0 by default, but 2.4.0 or 2.6.0 should work as well.

Thanks,
-Ian

On Tue, Oct 20, 2015 at 5:40 AM,  <sc...@informatik.hu-berlin.de> wrote:
> Hello,
>
> I have done a cluster setup of AsterixDB on four nodes. Everyhing is
> running fine and I want to load some data into the system to run sum
> bigger examples. However I am unable to do so using the description at
>
> https://asterixdb.ics.uci.edu/documentation/aql/externaldata.html
>
> I created a dataverse, a datatype and a dataset as follows:
>
> create dataverse tpch;
>
> use dataverse tpch
> create type LineitemType as closed {
>       orderkey:int32,
>       partkey: int32,
>       suppkey: int32,
>       linenumber: int32,
>       quantity: double,
>       extendedprice: double,
>       discount: double,
>       tax: double,
>       returnflag: string,
>       linestatus: string,
>       shipdate: string,
>       commitdate: string,
>       receiptdate: string,
>       shipinstruct: string,
>       shipmode: string,
>       comment: string}
>
> create dataset lineitem(LineitemType) if not exists primary key orderkey,
> linenumber
>
> as described on the homepage linked above there are two ways to load data
> from, using either a reachable HDFS or the localFS. I have a running HDFS
> within the same network containing the data I want to access and tried to
> reach it like this:
>
> load dataset lineitem using hdfs
> (("hdfs"="hdfs://192.168.127.11:50040"),
> ("path"="/user/schultzem/lineitem.tbl"),
> ("input-format"="text-input-format"),
> ("format"="delimited-text"),
> ("delimiter"="|"));
>
> However I get an error message
>
> Unable to create adapter org.apache.hadoop.ipc.RemoteException: Server IPC
> version 9 cannot communicate with client version 3 [AlgebricksException]
>
> all I found out about this was an old Issue from 2013 that recommends an
> older version of hadoop, which is not an option for me.
>
> https://code.google.com/p/asterixdb/issues/detail?id=521
>
> Is this somehow fixable?
>
> The other option to load data from the localFS also throws an error.
>
> load dataset lineitem using localfs
> (("path"="192.168.127.21:///home/schultzem/tpch/TPCH_data_10GB/lineitem.tbl"),
>     ("format"="delimited-text"),
>     ("delimiter"="|"));
>
> leads to
>
> No node controllers found at the address: 192.168.127.21 [AsterixException]
>
> which is the same error as for 127.0.0.1.
>
> On the linked documentation about external datasets it is assumed that
> AsterixDB is used in local mode. Is this the problem why I cannot reach
> the cluster nodes?
>
> Did I make a mistake accessing the data? How can I load data into the
> database?
>
> Regards, Max
>