You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@asterixdb.apache.org by "Malarout, Namrata (398M-Affiliate)" <Na...@jpl.nasa.gov> on 2015/09/22 20:26:52 UTC

Few Questions

Hi,
I have a few questions:

  *   Can I insert more data into a dataset by importing a .adm file? In the documentation I saw the insert statement but I wanted to know if I could insert it using a file and the syntax for it. I tried to do it but I got syntax errors.
  *   I would like to know if there is a way I can increase the amount of data Hyracks can handle. Currently I’m using version 0.8.6. The error I get when I try to load my data is: Field accessor is not defined for values of type null [AlgebricksException].  In 0.8.7 Snapshot the error is: Unable to allocate frame larger than:255 bytes [HyracksDataException]. I am sure that the error are because of the size of array-based data because when I deleted a lot of values the load operation works. If anyone is interested in looking at the data, here’s a sample record: https://drive.google.com/file/d/0B6wmo4_-H0P2UUhJcFpVdWdZT1k/view?usp=sharing. I saw the conf/asterix-configuration.xml file and was wondering if changing some values would make a difference.
  *   What is a field accessor? I couldn’t understand what the error”Field accessor is not defined for values of type null” means. Any idea why it was triggered? I am assuming that it has something to do with the size of data only because I didn’t get the error on reducing it. But from the wording I couldn’t really understand it.

Thanks for the help.
Regards,
Namrata


Re: Few Questions

Posted by "Malarout, Namrata (398M-Affiliate)" <Na...@jpl.nasa.gov>.
Hi Ian,
Thanks for your answers. The details you provided helped me understand
things better. I would like to know when the Beta version of 0.8.7 will be
available.
Thanks,
Namrata

On 9/22/15, 1:12 PM, "Ian Maxon" <im...@uci.edu> wrote:

>Hi Namrata,
>Let me try and address your questions inline...
>
>>Can I insert more data into a dataset by importing a .adm file? In the
>>documentation I saw the insert statement but I wanted to know if I could
>>insert it using a file and the syntax for it. I tried to do it but I got
>>syntax errors.
>Yes, but it's a little more complex than that. The 'load dataset
>Foo...' syntax performs a bulk load, which is something that is only
>done once. Generally the reason we do this is because building a BTree
>with a large run of sorted data is very fast compared to doing
>individual inserts, with the caveat that it needs to be an empty tree.
>This is more of a technical detail however, I believe the eventual
>hope is to have 'load' work on previously loaded datasets, but it will
>just be slower due to not being able to use the aforementioned trick.
>
>The way to do this today, is to create an external dataset (e.g.
>create external dataset Bar using localfs...) on the new data, and
>then insert every new record from that to the existing data (e.g.
>insert into dataset Foo( for $x in Bar return $x) ). However, as a
>word of caution, this may not work very well with the default
>parameters, as they aren't tuned to ingesting data. The best thing to
>do is to increase the in-memory component budget and max heap size in
>the asterix-configuration.xml before trying this
>(storage.memorycomponent.numpages and
>storage.memorycomponent.pagesize, as well as the -Xmx parameter in
>nc.java.opts)
>
>>I would like to know if there is a way I can increase the amount of data
>>Hyracks can handle. Currently I¹m using version 0.8.6. The error I get
>>when I try to load my data is: Field accessor is not defined for values
>>of type null [AlgebricksException].  In 0.8.7 Snapshot the error is:
>>Unable to allocate frame larger than:255 bytes [HyracksDataException]. I
>>am sure that the error are because of the size of array-based data
>>because when I deleted a lot of values the load operation works. If
>>anyone is interested in looking at the data, here¹s a sample record:
>>https://drive.google.com/file/d/0B6wmo4_-H0P2UUhJcFpVdWdZT1k/view?usp=sha
>>ring. I saw the conf/asterix-configuration.xml file and was wondering if
>>changing some values would make a difference.
>
>The things you are seeing are unfortunately bugs. Given the type of
>data, in version 0.8.6 I am not surprised by that behavior, because
>back then there were many limitations regarding large records/objects
>that have since been corrected.
>However on the new version, I am surprised that you got the error that
>you did. I will try out the sample data and try to see what the issue
>might be, but in general Hyracks now has support for transporting
>records which are larger than the default frame size. The main
>limitations we have come across lately in this area have been from the
>object model (e.g. 65k limit on string size) or from the storage layer
>(objects cannot be bigger than half a page).
>
>>What is a field accessor? I couldn¹t understand what the error²Field
>>accessor is not defined for values of type null² means. Any idea why it
>>was triggered? I am assuming that it has something to do with the size
>>of data only because I didn¹t get the error on reducing it. But from the
>>wording I couldn¹t really understand it.
>
>A field accessor is basically what deserializes the data from bytes
>within a frame (which is all Hyracks sees) to whatever AsterixDB needs
>to see (String, int, double, collection, etc.). However this is an
>implementation detail, at a user level it shouldn't be of concern
>
>Hopefully that helps! If anything is unclear please feel free to ask.
>Also if you'd like to Skype or chat over IRC to try discussing in more
>realtime, I'm available for that as well.
>
>Thanks,
>-Ian
>
>On Tue, Sep 22, 2015 at 11:26 AM, Malarout, Namrata (398M-Affiliate)
><Na...@jpl.nasa.gov> wrote:
>> Hi,
>> I have a few questions:
>>
>> Can I insert more data into a dataset by importing a .adm file? In the
>> documentation I saw the insert statement but I wanted to know if I could
>> insert it using a file and the syntax for it. I tried to do it but I got
>> syntax errors.
>> I would like to know if there is a way I can increase the amount of data
>> Hyracks can handle. Currently I¹m using version 0.8.6. The error I get
>>when
>> I try to load my data is: Field accessor is not defined for values of
>>type
>> null [AlgebricksException].  In 0.8.7 Snapshot the error is: Unable to
>> allocate frame larger than:255 bytes [HyracksDataException]. I am sure
>>that
>> the error are because of the size of array-based data because when I
>>deleted
>> a lot of values the load operation works. If anyone is interested in
>>looking
>> at the data, here¹s a sample record:
>> 
>>https://drive.google.com/file/d/0B6wmo4_-H0P2UUhJcFpVdWdZT1k/view?usp=sha
>>ring.
>> I saw the conf/asterix-configuration.xml file and was wondering if
>>changing
>> some values would make a difference.
>> What is a field accessor? I couldn¹t understand what the error²Field
>> accessor is not defined for values of type null² means. Any idea why it
>>was
>> triggered? I am assuming that it has something to do with the size of
>>data
>> only because I didn¹t get the error on reducing it. But from the
>>wording I
>> couldn¹t really understand it.
>>
>> Thanks for the help.
>> Regards,
>> Namrata
>>


Re: Few Questions

Posted by Ian Maxon <im...@uci.edu>.
Hi Namrata,
Let me try and address your questions inline...

>Can I insert more data into a dataset by importing a .adm file? In the documentation I saw the insert statement but I wanted to know if I could insert it using a file and the syntax for it. I tried to do it but I got syntax errors.
Yes, but it's a little more complex than that. The 'load dataset
Foo...' syntax performs a bulk load, which is something that is only
done once. Generally the reason we do this is because building a BTree
with a large run of sorted data is very fast compared to doing
individual inserts, with the caveat that it needs to be an empty tree.
This is more of a technical detail however, I believe the eventual
hope is to have 'load' work on previously loaded datasets, but it will
just be slower due to not being able to use the aforementioned trick.

The way to do this today, is to create an external dataset (e.g.
create external dataset Bar using localfs...) on the new data, and
then insert every new record from that to the existing data (e.g.
insert into dataset Foo( for $x in Bar return $x) ). However, as a
word of caution, this may not work very well with the default
parameters, as they aren't tuned to ingesting data. The best thing to
do is to increase the in-memory component budget and max heap size in
the asterix-configuration.xml before trying this
(storage.memorycomponent.numpages and
storage.memorycomponent.pagesize, as well as the -Xmx parameter in
nc.java.opts)

>I would like to know if there is a way I can increase the amount of data Hyracks can handle. Currently I’m using version 0.8.6. The error I get when I try to load my data is: Field accessor is not defined for values of type null [AlgebricksException].  In 0.8.7 Snapshot the error is: Unable to allocate frame larger than:255 bytes [HyracksDataException]. I am sure that the error are because of the size of array-based data because when I deleted a lot of values the load operation works. If anyone is interested in looking at the data, here’s a sample record: https://drive.google.com/file/d/0B6wmo4_-H0P2UUhJcFpVdWdZT1k/view?usp=sharing. I saw the conf/asterix-configuration.xml file and was wondering if changing some values would make a difference.

The things you are seeing are unfortunately bugs. Given the type of
data, in version 0.8.6 I am not surprised by that behavior, because
back then there were many limitations regarding large records/objects
that have since been corrected.
However on the new version, I am surprised that you got the error that
you did. I will try out the sample data and try to see what the issue
might be, but in general Hyracks now has support for transporting
records which are larger than the default frame size. The main
limitations we have come across lately in this area have been from the
object model (e.g. 65k limit on string size) or from the storage layer
(objects cannot be bigger than half a page).

>What is a field accessor? I couldn’t understand what the error”Field accessor is not defined for values of type null” means. Any idea why it was triggered? I am assuming that it has something to do with the size of data only because I didn’t get the error on reducing it. But from the wording I couldn’t really understand it.

A field accessor is basically what deserializes the data from bytes
within a frame (which is all Hyracks sees) to whatever AsterixDB needs
to see (String, int, double, collection, etc.). However this is an
implementation detail, at a user level it shouldn't be of concern

Hopefully that helps! If anything is unclear please feel free to ask.
Also if you'd like to Skype or chat over IRC to try discussing in more
realtime, I'm available for that as well.

Thanks,
-Ian

On Tue, Sep 22, 2015 at 11:26 AM, Malarout, Namrata (398M-Affiliate)
<Na...@jpl.nasa.gov> wrote:
> Hi,
> I have a few questions:
>
> Can I insert more data into a dataset by importing a .adm file? In the
> documentation I saw the insert statement but I wanted to know if I could
> insert it using a file and the syntax for it. I tried to do it but I got
> syntax errors.
> I would like to know if there is a way I can increase the amount of data
> Hyracks can handle. Currently I’m using version 0.8.6. The error I get when
> I try to load my data is: Field accessor is not defined for values of type
> null [AlgebricksException].  In 0.8.7 Snapshot the error is: Unable to
> allocate frame larger than:255 bytes [HyracksDataException]. I am sure that
> the error are because of the size of array-based data because when I deleted
> a lot of values the load operation works. If anyone is interested in looking
> at the data, here’s a sample record:
> https://drive.google.com/file/d/0B6wmo4_-H0P2UUhJcFpVdWdZT1k/view?usp=sharing.
> I saw the conf/asterix-configuration.xml file and was wondering if changing
> some values would make a difference.
> What is a field accessor? I couldn’t understand what the error”Field
> accessor is not defined for values of type null” means. Any idea why it was
> triggered? I am assuming that it has something to do with the size of data
> only because I didn’t get the error on reducing it. But from the wording I
> couldn’t really understand it.
>
> Thanks for the help.
> Regards,
> Namrata
>